HelpWizard Pages Documents HTML/en: Unterschied zwischen den Versionen

Aktuelle Version vom 29. August 2022, 11:21 Uhr

HTML Documents

Reading/Parsing[Bearbeiten]

Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.

Elementary Smalltalk code can get a DOM tree via:

HTML::HtmlParser parse:aStringOrStream

or

HTML::HtmlParser parseFile:aFilename

The resulting document node (DOM) can then be processed:

aDocNode head
aDocNode body

xPath like access:

aDocNode / tagName
aDocNode // tagName

attribute access:

aDocNode @ attribName

For example, to get all anchors:

|doc anchors|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.

To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix):

|doc anchors anchorsMatching|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.
anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc']

Use the class browser to see all methods provided by HTML elements and the parser. Use an inspector on the results, for what can be done with an element.

@@ Zeile 1: / Zeile 1: @@
-<strong>Documents</strong>
+<strong>HTML Documents</strong>
 [[Datei:arrowleft.png|link=HelpWizard_Pages_Start Documents/en|back]]
@@ Zeile 5: / Zeile 5: @@
 <br>
+== Reading/Parsing==
-Expecco can read and write a number of common document formats.
+Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated).
-Some is found in the standard library, others are provided by additional (plugin) libraries.
+Actions to parse are found in the standard library.
-Notice, that in addition to action blocks from libraries, a lot of functionality is provided by the underlying framework (i.e. class libraries), which can be called from new elementary action blocks.
+Elementary Smalltalk code can get a DOM tree via:
+ HTML::HtmlParser parse:''aStringOrStream''
+or
+ HTML::HtmlParser parseFile:''aFilename''
+The resulting document node (DOM) can then be processed:
+ ''aDocNode'' head
+ ''aDocNode'' body
+xPath like access:
+ ''aDocNode'' / ''tagName''
+ ''aDocNode'' // ''tagName''
+attribute access:
-* [[HelpWizard Pages Documents HTML/en | HTML]]&nbsp;(web pages)
+ ''aDocNode'' @ ''attribName''
-* [[HelpWizard Pages Documents XML/en| XML]]&nbsp;
-* [[HelpWizard Pages Documents JSON/en| JSON]]&nbsp;
+For example, to get all anchors:
-* [[HelpWizard Pages Documents PDF/en| PDF]]&nbsp;
+ |doc anchors|
-* [[HelpWizard Pages Documents CSV/en| CSV]]&nbsp;(comma separated values / Excel)
+ doc := HTML::HTMLParser parseFile:'myFile.html'.
-* [[HelpWizard Pages Documents Word/ODF/en| Word/ODF]]&nbsp;(open document format)
+ anchors := doc // 'a'.
-* [[HelpWizard Pages Documents ZIP/en| ZIP]]&nbsp;(ZIP archives)
+To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix):
+ |doc anchors anchorsMatching|
+ doc := HTML::HTMLParser parseFile:'myFile.html'.
+ anchors := doc // 'a'.
+ anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc']
-Provided by plugins:
-* [[HelpWizard Pages Documents Edifact/en| Edifact]]&nbsp;(B2B Documents)
-* [[HelpWizard Pages Documents Swift/en| Swift]]&nbsp;(Swift Messages)
+Use the class browser to see all methods provided by HTML elements and the parser.
+Use an inspector on the results, for what can be done with an element.
 [[Category: HelpWizard]]

HelpWizard Pages Documents HTML/en: Unterschied zwischen den Versionen

Aktuelle Version vom 29. August 2022, 11:21 Uhr

Reading/Parsing[Bearbeiten]

Navigationsmenü

Meine Werkzeuge

Namensräume

Varianten

Ansichten

Mehr

Suche

Navigation

Werkzeuge

Drucken/exportieren