HelpWizard Pages Documents HTML/en: Unterschied zwischen den Versionen

Aus expecco Wiki (Version 2.x)
Zur Navigation springen Zur Suche springen
(Die Seite wurde neu angelegt: „<strong>Documents</strong> link=HelpWizard_Pages_Start Documents/en|back (Back to Documents) <…“)
 
Zeile 1: Zeile 1:
<strong>Documents</strong>
<strong>HTML Documents</strong>


[[Datei:arrowleft.png|link=HelpWizard_Pages_Start Documents/en|back]]
[[Datei:arrowleft.png|link=HelpWizard_Pages_Start Documents/en|back]]
Zeile 5: Zeile 5:
<br>
<br>


== Reading/Parsing==
Expecco can read and write a number of common document formats.
Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated).
Some is found in the standard library, others are provided by additional (plugin) libraries.
Actions to parse are found in the standard library.
Notice, that in addition to action blocks from libraries, a lot of functionality is provided by the underlying framework (i.e. class libraries), which can be called from new elementary action blocks.
* [[HelpWizard Pages Documents HTML/en | HTML]]&nbsp;(web pages)
* [[HelpWizard Pages Documents XML/en| XML]]&nbsp;
* [[HelpWizard Pages Documents JSON/en| JSON]]&nbsp;
* [[HelpWizard Pages Documents PDF/en| PDF]]&nbsp;
* [[HelpWizard Pages Documents CSV/en| CSV]]&nbsp;(comma separated values / Excel)
* [[HelpWizard Pages Documents Word/ODF/en| Word/ODF]]&nbsp;(open document format)
* [[HelpWizard Pages Documents ZIP/en| ZIP]]&nbsp;(ZIP archives)


Elementary Smalltalk code can get a DOM tree via:
Provided by plugins:
HTML::HtmlParser parse:''aStringOrStream''
* [[HelpWizard Pages Documents Edifact/en| Edifact]]&nbsp;(B2B Documents)
or
* [[HelpWizard Pages Documents Swift/en| Swift]]&nbsp;(Swift Messages)
HTML::HtmlParser parseFile:''aFilename''

The resulting document node (DOM) can then be processed:
''aDocNode'' head
''aDocNode'' body

xPath like access:
''aDocNode'' / ''tagName''
''aDocNode'' // ''tagName''
For example, to get all anchors:
|doc|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.


Use the class browser to see all methods provided by HTML elements and the parser.
Use an inspector on the results, for what can be done with an element.


[[Category: HelpWizard]]
[[Category: HelpWizard]]

Version vom 29. August 2022, 11:16 Uhr

HTML Documents

back (Back to Documents)

Reading/Parsing[Bearbeiten]

Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.

Elementary Smalltalk code can get a DOM tree via:

HTML::HtmlParser parse:aStringOrStream

or

HTML::HtmlParser parseFile:aFilename

The resulting document node (DOM) can then be processed:

aDocNode head
aDocNode body

xPath like access:

aDocNode / tagName
aDocNode // tagName

For example, to get all anchors:

|doc|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.

Use the class browser to see all methods provided by HTML elements and the parser. Use an inspector on the results, for what can be done with an element.



Copyright © 2014-2024 eXept Software AG