HelpWizard Pages Documents HTML/en
Version vom 29. August 2022, 11:16 Uhr von Cg (Diskussion | Beiträge)
HTML Documents
Reading/Parsing[Bearbeiten]
Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.
Elementary Smalltalk code can get a DOM tree via:
HTML::HtmlParser parse:aStringOrStream
or
HTML::HtmlParser parseFile:aFilename
The resulting document node (DOM) can then be processed:
aDocNode head aDocNode body
xPath like access:
aDocNode / tagName aDocNode // tagName
For example, to get all anchors:
|doc| doc := HTML::HTMLParser parseFile:'myFile.html'. anchors := doc // 'a'.
Use the class browser to see all methods provided by HTML elements and the parser. Use an inspector on the results, for what can be done with an element.