HelpWizard Pages Documents HTML/en: Unterschied zwischen den Versionen

Aus expecco Wiki (Version 2.x)
Zur Navigation springen Zur Suche springen
(Die Seite wurde neu angelegt: „<strong>Documents</strong> link=HelpWizard_Pages_Start Documents/en|back (Back to Documents) <…“)
 
 
(Eine dazwischenliegende Version desselben Benutzers wird nicht angezeigt)
Zeile 1: Zeile 1:
<strong>Documents</strong>
<strong>HTML Documents</strong>


[[Datei:arrowleft.png|link=HelpWizard_Pages_Start Documents/en|back]]
[[Datei:arrowleft.png|link=HelpWizard_Pages_Start Documents/en|back]]
Zeile 5: Zeile 5:
<br>
<br>


== Reading/Parsing==
Expecco can read and write a number of common document formats.
Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated).
Some is found in the standard library, others are provided by additional (plugin) libraries.
Actions to parse are found in the standard library.
Notice, that in addition to action blocks from libraries, a lot of functionality is provided by the underlying framework (i.e. class libraries), which can be called from new elementary action blocks.

Elementary Smalltalk code can get a DOM tree via:
HTML::HtmlParser parse:''aStringOrStream''
or
HTML::HtmlParser parseFile:''aFilename''

The resulting document node (DOM) can then be processed:
''aDocNode'' head
''aDocNode'' body

xPath like access:
''aDocNode'' / ''tagName''
''aDocNode'' // ''tagName''
attribute access:
* [[HelpWizard Pages Documents HTML/en | HTML]]&nbsp;(web pages)
''aDocNode'' @ ''attribName''
* [[HelpWizard Pages Documents XML/en| XML]]&nbsp;

* [[HelpWizard Pages Documents JSON/en| JSON]]&nbsp;
For example, to get all anchors:
* [[HelpWizard Pages Documents PDF/en| PDF]]&nbsp;
|doc anchors|
* [[HelpWizard Pages Documents CSV/en| CSV]]&nbsp;(comma separated values / Excel)
doc := HTML::HTMLParser parseFile:'myFile.html'.
* [[HelpWizard Pages Documents Word/ODF/en| Word/ODF]]&nbsp;(open document format)
anchors := doc // 'a'.
* [[HelpWizard Pages Documents ZIP/en| ZIP]]&nbsp;(ZIP archives)

To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix):
|doc anchors anchorsMatching|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.
anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc']


Provided by plugins:
* [[HelpWizard Pages Documents Edifact/en| Edifact]]&nbsp;(B2B Documents)
* [[HelpWizard Pages Documents Swift/en| Swift]]&nbsp;(Swift Messages)


Use the class browser to see all methods provided by HTML elements and the parser.
Use an inspector on the results, for what can be done with an element.


[[Category: HelpWizard]]
[[Category: HelpWizard]]

Aktuelle Version vom 29. August 2022, 11:21 Uhr

HTML Documents

back (Back to Documents)

Reading/Parsing[Bearbeiten]

Typically, documents are parsed into a so called DOM, which is then processed (searched or manipulated). Actions to parse are found in the standard library.

Elementary Smalltalk code can get a DOM tree via:

HTML::HtmlParser parse:aStringOrStream

or

HTML::HtmlParser parseFile:aFilename

The resulting document node (DOM) can then be processed:

aDocNode head
aDocNode body

xPath like access:

aDocNode / tagName
aDocNode // tagName

attribute access:

aDocNode @ attribName

For example, to get all anchors:

|doc anchors|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.

To extract only anchors to a URL which matches a check function (in this case, which refer to a URL which starts with a prefix):

|doc anchors anchorsMatching|
doc := HTML::HTMLParser parseFile:'myFile.html'.
anchors := doc // 'a'.
anchorsMatching := anchors select:[:a | (a @ 'HREF') startsWith:'misc']


Use the class browser to see all methods provided by HTML elements and the parser. Use an inspector on the results, for what can be done with an element.



Copyright © 2014-2024 eXept Software AG