HelpWizard Pages Documents HTML/en: Unterschied zwischen den Versionen
Zur Navigation springen
Zur Suche springen
Cg (Diskussion | Beiträge) |
Cg (Diskussion | Beiträge) |
||
Zeile 22: | Zeile 22: | ||
''aDocNode'' // ''tagName'' |
''aDocNode'' // ''tagName'' |
||
attribute access: |
|||
''aDocNode'' @ ''attribName'' |
|||
For example, to get all anchors: |
For example, to get all anchors: |
||
|doc| |
|doc anchors| |
||
doc := HTML::HTMLParser parseFile:'myFile.html'. |
doc := HTML::HTMLParser parseFile:'myFile.html'. |
||
anchors := doc // 'a'. |
anchors := doc // 'a'. |
||
To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix): |
|||
|doc anchors anchorsMatching| |
|||
doc := HTML::HTMLParser parseFile:'myFile.html'. |
|||
anchors := doc // 'a'. |
|||
anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc'] |
|||
Use the class browser to see all methods provided by HTML elements and the parser. |
Use the class browser to see all methods provided by HTML elements and the parser. |
Aktuelle Version vom 29. August 2022, 11:21 Uhr
HTML Documents
Reading/Parsing[Bearbeiten]
Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.
Elementary Smalltalk code can get a DOM tree via:
HTML::HtmlParser parse:aStringOrStream
or
HTML::HtmlParser parseFile:aFilename
The resulting document node (DOM) can then be processed:
aDocNode head aDocNode body
xPath like access:
aDocNode / tagName aDocNode // tagName
attribute access:
aDocNode @ attribName
For example, to get all anchors:
|doc anchors| doc := HTML::HTMLParser parseFile:'myFile.html'. anchors := doc // 'a'.
To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix):
|doc anchors anchorsMatching| doc := HTML::HTMLParser parseFile:'myFile.html'. anchors := doc // 'a'. anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc']
Use the class browser to see all methods provided by HTML elements and the parser.
Use an inspector on the results, for what can be done with an element.