HelpWizard Pages Documents HTML/en

HTML Documents

Reading/Parsing[Bearbeiten]

Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.

Elementary Smalltalk code can get a DOM tree via:

HTML::HtmlParser parse:aStringOrStream

or

HTML::HtmlParser parseFile:aFilename

The resulting document node (DOM) can then be processed:

aDocNode head
aDocNode body

xPath like access:

aDocNode / tagName
aDocNode // tagName

attribute access:

aDocNode @ attribName

For example, to get all anchors:

|doc anchors|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.

To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix):

|doc anchors anchorsMatching|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.
anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc']

Use the class browser to see all methods provided by HTML elements and the parser. Use an inspector on the results, for what can be done with an element.

HelpWizard Pages Documents HTML/en

Reading/Parsing[Bearbeiten]

Navigationsmenü

Meine Werkzeuge

Namensräume

Varianten

Ansichten

Mehr

Suche

Navigation

Werkzeuge

Drucken/exportieren