HelpWizard Pages Documents HTML/en
Zur Navigation springen
Zur Suche springen
HTML Documents
Reading/Parsing[Bearbeiten]
Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.
Elementary Smalltalk code can get a DOM tree via:
HTML::HtmlParser parse:aStringOrStream
or
HTML::HtmlParser parseFile:aFilename
The resulting document node (DOM) can then be processed:
aDocNode head aDocNode body
xPath like access:
aDocNode / tagName aDocNode // tagName
attribute access:
aDocNode @ attribName
For example, to get all anchors:
|doc anchors| doc := HTML::HTMLParser parseFile:'myFile.html'. anchors := doc // 'a'.
To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix):
|doc anchors anchorsMatching| doc := HTML::HTMLParser parseFile:'myFile.html'. anchors := doc // 'a'. anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc']
Use the class browser to see all methods provided by HTML elements and the parser.
Use an inspector on the results, for what can be done with an element.