HelpWizard Pages Documents HTML/en: Unterschied zwischen den Versionen
Zur Navigation springen
Zur Suche springen
Cg (Diskussion | Beiträge) (Die Seite wurde neu angelegt: „<strong>Documents</strong> link=HelpWizard_Pages_Start Documents/en|back (Back to Documents) <…“) |
Cg (Diskussion | Beiträge) |
||
Zeile 1: | Zeile 1: | ||
<strong>Documents</strong> |
<strong>HTML Documents</strong> |
||
[[Datei:arrowleft.png|link=HelpWizard_Pages_Start Documents/en|back]] |
[[Datei:arrowleft.png|link=HelpWizard_Pages_Start Documents/en|back]] |
||
Zeile 5: | Zeile 5: | ||
<br> |
<br> |
||
== Reading/Parsing== |
|||
Expecco can read and write a number of common document formats. |
|||
Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). |
|||
Some is found in the standard library, others are provided by additional (plugin) libraries. |
|||
Actions to parse are found in the standard library. |
|||
Notice, that in addition to action blocks from libraries, a lot of functionality is provided by the underlying framework (i.e. class libraries), which can be called from new elementary action blocks. |
|||
⚫ | |||
* [[HelpWizard Pages Documents HTML/en | HTML]] (web pages) |
|||
* [[HelpWizard Pages Documents XML/en| XML]] |
|||
* [[HelpWizard Pages Documents JSON/en| JSON]] |
|||
* [[HelpWizard Pages Documents PDF/en| PDF]] |
|||
* [[HelpWizard Pages Documents CSV/en| CSV]] (comma separated values / Excel) |
|||
* [[HelpWizard Pages Documents Word/ODF/en| Word/ODF]] (open document format) |
|||
* [[HelpWizard Pages Documents ZIP/en| ZIP]] (ZIP archives) |
|||
Elementary Smalltalk code can get a DOM tree via: |
|||
Provided by plugins: |
|||
HTML::HtmlParser parse:''aStringOrStream'' |
|||
* [[HelpWizard Pages Documents Edifact/en| Edifact]] (B2B Documents) |
|||
or |
|||
* [[HelpWizard Pages Documents Swift/en| Swift]] (Swift Messages) |
|||
HTML::HtmlParser parseFile:''aFilename'' |
|||
The resulting document node (DOM) can then be processed: |
|||
''aDocNode'' head |
|||
''aDocNode'' body |
|||
xPath like access: |
|||
''aDocNode'' / ''tagName'' |
|||
''aDocNode'' // ''tagName'' |
|||
⚫ | |||
For example, to get all anchors: |
|||
|doc| |
|||
doc := HTML::HTMLParser parseFile:'myFile.html'. |
|||
anchors := doc // 'a'. |
|||
Use the class browser to see all methods provided by HTML elements and the parser. |
|||
Use an inspector on the results, for what can be done with an element. |
|||
[[Category: HelpWizard]] |
[[Category: HelpWizard]] |
Version vom 29. August 2022, 11:16 Uhr
HTML Documents
Reading/Parsing[Bearbeiten]
Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.
Elementary Smalltalk code can get a DOM tree via:
HTML::HtmlParser parse:aStringOrStream
or
HTML::HtmlParser parseFile:aFilename
The resulting document node (DOM) can then be processed:
aDocNode head aDocNode body
xPath like access:
aDocNode / tagName aDocNode // tagName
For example, to get all anchors:
|doc| doc := HTML::HTMLParser parseFile:'myFile.html'. anchors := doc // 'a'.
Use the class browser to see all methods provided by HTML elements and the parser. Use an inspector on the results, for what can be done with an element.