HelpWizard Pages Documents HTML/en

Aus expecco Wiki (Version 2.x)
Zur Navigation springen Zur Suche springen

HTML Documents

back (Back to Documents)

Reading/Parsing[Bearbeiten]

Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.

Elementary Smalltalk code can get a DOM tree via:

HTML::HtmlParser parse:aStringOrStream

or

HTML::HtmlParser parseFile:aFilename

The resulting document node (DOM) can then be processed:

aDocNode head
aDocNode body

xPath like access:

aDocNode / tagName
aDocNode // tagName

attribute access:

aDocNode @ attribName

For example, to get all anchors:

|doc anchors|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.

To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix):

|doc anchors anchorsMatching|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.
anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc']


Use the class browser to see all methods provided by HTML elements and the parser. Use an inspector on the results, for what can be done with an element.



Copyright © 2014-2024 eXept Software AG