XML Parser and DOM Tree Library/en: Unterschied zwischen den Versionen
Cg (Diskussion | Beiträge) |
Cg (Diskussion | Beiträge) |
||
(40 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt) | |||
Zeile 10: | Zeile 10: | ||
* [[ XMLDocument From File | '''XMLDocument [ From File ]''' ]]<br/> reads XML from a file and provides a DOM tree, representing that document<!--liest XML aus einer Datei; liefert einen DOM-Baum, welcher das Dokument repräsentiert.--> |
* [[ XMLDocument From File | '''XMLDocument [ From File ]''' ]]<br/> reads XML from a file and provides a DOM tree, representing that document<!--liest XML aus einer Datei; liefert einen DOM-Baum, welcher das Dokument repräsentiert.--> |
||
* [[ XMLDocument From Stream | '''XMLDocument [ From Stream ]''' ]]<br/> reads XML from a stream and provides a DOM tree, representing that document<!--liest XML aus einem Stream; liefert einen DOM-Baum, welcher das Dokument repräsentiert.--> |
* [[ XMLDocument From Stream | '''XMLDocument [ From Stream ]''' ]]<br/> reads XML from a stream and provides a DOM tree, representing that document<!--liest XML aus einem Stream; liefert einen DOM-Baum, welcher das Dokument repräsentiert.--> |
||
* [[ XMLDocument From String | '''XMLDocument [ From |
* [[ XMLDocument From String | '''XMLDocument [ From String ]''' ]]<br/> reads XML from a string and provides a DOM tree, representing that document<!--liest XML aus einem String; liefert einen DOM-Baum, welcher das Dokument repräsentiert.--> |
||
==== Printing ==== |
==== Printing ==== |
||
Zeile 40: | Zeile 40: | ||
==== XPath Access ==== |
==== XPath Access ==== |
||
For XPath syntax, please refer to the [[XPath cheat sheet]]. |
|||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
* [[ XMLDocument XPath Get Element CDATA | '''XMLDocument [ XPath Get Element CDATA ]''' ]]<br/> Retrieve a single matching element's CDATA, given an XPath match expression. |
* [[ XMLDocument XPath Get Element CDATA | '''XMLDocument [ XPath Get Element CDATA ]''' ]]<br/> Retrieve a single matching element's CDATA, given an XPath match expression. |
||
* [[ XMLDocument XPath Get Element nonEmpty CDATA | '''XMLDocument [ XPath Get Element nonEmpty CDATA ]''' ]]<br/> Retrieve a single matching element's non-empty CDATA, given an XPath match expression. |
* [[ XMLDocument XPath Get Element nonEmpty CDATA | '''XMLDocument [ XPath Get Element nonEmpty CDATA ]''' ]]<br/> Retrieve a single matching element's non-empty CDATA, given an XPath match expression. |
||
Zeile 69: | Zeile 71: | ||
====== Step 1: Look at the File (how to manually detect ZIP file type) ====== |
====== Step 1: Look at the File (how to manually detect ZIP file type) ====== |
||
[[Bild:XML_Example_01.png|thumb| |
[[Bild:XML_Example_01.png|thumb|250px|Zip Archive in FileBrowser]] |
||
Many files are zip archives, even if the file has no ".zip" extension. To manually check, perform the following: |
Many files are zip archives, even if the file has no ".zip" extension. To manually check, perform the following: |
||
* open an expecco FileBrowser |
* open an expecco FileBrowser ("''Extras''" → "''Tools''" → "''File Browser...''") and select the file. |
||
* right click (in the file name list) and select the "''Properties''") menu item |
* right click (in the file name list) and select the "''Properties''") menu item |
||
* if it is a ZIP file, there will be a line like: "MIME: ... contents:application/zip" |
* if it is a ZIP file, there will be a line like: "MIME: ... contents:application/zip" |
||
Zeile 77: | Zeile 79: | ||
====== Step2 (optional): Inspect the Files in the expecco FileBrowser ====== |
====== Step2 (optional): Inspect the Files in the expecco FileBrowser ====== |
||
⚫ | |||
⚫ | |||
[[ |
[[Bild:XML_Example_02.png|thumb|250px|Zip Archive in FileBrowser]] |
||
⚫ | |||
⚫ | |||
====== Step 3 (optional): Inspect the XML Contents of an Extracted File. ====== |
====== Step 3 (optional): Inspect the XML Contents of an Extracted File. ====== |
||
⚫ | |||
[[ |
[[Bild:XML_Example_03.png|thumb|250px|Zip Archive in FileBrowser]] |
||
⚫ | |||
====== Step 4: Automating the ZIP Archive Processing ====== |
====== Step 4: Automating the ZIP Archive Processing ====== |
||
⚫ | First, you have to get the filenames from the archive. Use the ''"[ZIP File] List Contents"'' action from the standard library. This action has two output pins: one where all file names appear as one (possibly big) collection, another where each filename appears in sequence (i.e. this second pin delivers multiple values, one for each file component. This second pin is perfect for streaming: you can use this to feed a processing action, which deals with every file. |
||
⚫ | |||
[[ |
[[Bild:XML_Example_01a.png|thumb|250px|Schema of Extract Action (to begin with)]] |
||
[[Datei:XML_Example_01b.png|600px]] |
|||
⚫ | First, you have to get the filenames from the archive. Use the ''"[ZIP File] List Contents"'' action from the standard library. This action has two output pins: one where all file names appear as one (possibly big) collection, another where each filename appears in sequence (i.e. this second pin delivers multiple values, one for each file component). This second pin is perfect for streaming: you can use this to feed a processing action, which deals with every file. |
||
⚫ | |||
====== Step 5: Filter ====== |
====== Step 5: Filter ====== |
||
⚫ | Feed the |
||
Notice that the filter gives you another collection. If you want to process the files later in sequence, add a "Collection Enumerate" action as a last step of the processing chain, and connect its output to the output of the compound action. Lets call this compound "Archive Extractor Action". |
|||
⚫ | Feed the output into a ''"Filter [ Matching ]"'' action. Its input gets each file name in the archive file, its output only provides matching names. The pattern is a GLOB pattern (i.e. what you would also use as a command line argument to a shell or batch command).<br>For example, a pattern like "*/foo/*.xml" would give you all files which are contained in a subfolder named "foo", which have a suffix ".xml". |
||
Then finally, feed the filter's output (which are the archive names) to a ''"Zip File [Extract File]"'' step. Its output will then be the real filenames are extracted onto your temp folder (that is what "$(TmpDirectory)" stands for). Notice that the Temp folder is removed after your expecco session. If you need to preserve those files, extract them to another folder (into your home directory or the "Documents" folder, maybe) |
|||
====== Step 6: Processing the files (each individually) ====== |
====== Step 6: Processing the files (each individually) ====== |
||
[[Bild:XML_Example_01c.png|thumb|250px|Implementation of Extract Action (with output pin)]] |
|||
This is best done in a new action. Both for easier testability and better reusability. So the filter and processing actions can later be individually reused. |
This is best done in a separate new action. Both for easier testability and better reusability. So the filter and processing actions can later be individually reused. Thus, the filtered output is passed to a new output pin and the extract action is renamed to ''"Extract Matching from Zip Archive"''. |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
Hint: To add the XML action, press "CTRL-n" in the diagram editor, type "xml" or even "xml;file" into the filter, then the action is shown near the top of the list. Select it and press OK, or double click on it in the list. |
|||
====== Step 7: Extracting individiual values from the XML Document ====== |
====== Step 7: Extracting individiual values from the XML Document ====== |
||
Of course, this step is now very task specific, and your milage will vary here. |
Of course, this step is now very task specific, and your milage will vary here. |
||
⚫ | |||
For the demo, we assume that the XML document contains a parts list, consisting of PART/SUBPART/COMPONENT, with multiple PART instances, each containing multiple SUBPARTs and each of them again multiple COMPONENTS.<br>And that we have to extract a value found inside the COMPONENT. |
|||
⚫ | |||
[[Bild:XML_Example_01x6.png|thumb|250px|Element Extraction]] |
|||
Do whatever is needed with those values. Notice, that the action provides a collection of VALUES, in case multiple object match the xpath. |
Do whatever is needed with those values. Notice, that the action provides a collection of VALUES, in case multiple object match the xpath. |
||
Place the second part into an action named "XML Processing". You may even want to split the document part from the part which deals with individual elements, and feed the xpath access strings via input pins. For maximum reusability. |
Place the second part into an action named "XML Processing". You may even want to split the document part from the part which deals with individual elements, and feed the xpath access strings via input pins. For maximum reusability. |
||
====== Steps In Between: Add Little Tests ====== |
|||
Please always make use of the "Test/Demo" pages, to immediately verify that you diagram works as expected. |
|||
So here is the test/demo of ''"Process XML File"'' and a run (by purpose with a bug initially - do you see what's wrong?). |
|||
[[Bild:XML_Example_01x7.png|thumb|250px|Test/Demo (with Bug)]] |
|||
[[Bild:XML_Example_01x8.png|thumb|250px|Test Run - obviously with a bad input]] |
|||
[[Bild:XML_Example_01x9.png|thumb|250px|XPath Fixed]] |
|||
[[Bild:XML_Example_01x10.png|thumb|250px|Test Fixed]] |
|||
[[Bild:XML_Example_01x11.png|thumb|250px|Test Run OK]] |
|||
====== Useful tools: ====== |
====== Useful tools: ====== |
||
Once you have the XML document or an element from it at hand (i.e. a DOM element), use the expecco inspector (click on a pin value after a run). The inspector will show an additional tab named "DOM", in which you can try xpathes to see which elements match. Xpath is quite powerful, and you may want to read the xpath cheat sheet to make best use of it. |
|||
Happy extracting. |
Happy extracting. |
Aktuelle Version vom 29. Januar 2025, 13:57 Uhr
Inhaltsverzeichnis
- 1 Introduction
- 2 Library Reference
- 2.1 Parsing
- 2.2 Printing
- 2.3 Element Extraction
- 2.4 XPath Access
- 2.5 XML Inspector
- 2.6 Examples
- 2.6.1 Example 1
- 2.6.2 Example 2
- 2.6.2.1 Step 1: Look at the File (how to manually detect ZIP file type)
- 2.6.2.2 Step2 (optional): Inspect the Files in the expecco FileBrowser
- 2.6.2.3 Step 3 (optional): Inspect the XML Contents of an Extracted File.
- 2.6.2.4 Step 4: Automating the ZIP Archive Processing
- 2.6.2.5 Step 5: Filter
- 2.6.2.6 Step 6: Processing the files (each individually)
- 2.6.2.7 Step 7: Extracting individiual values from the XML Document
- 2.6.2.8 Steps In Between: Add Little Tests
- 2.6.2.9 Useful tools:
Introduction[Bearbeiten]
This library contains action blocks to read and manipulate XML documents.
Library Reference[Bearbeiten]
Please import the library and take a look at the documentation and Test/Demo examples.
Parsing[Bearbeiten]
- XMLDocument [ From File ]
reads XML from a file and provides a DOM tree, representing that document - XMLDocument [ From Stream ]
reads XML from a stream and provides a DOM tree, representing that document - XMLDocument [ From String ]
reads XML from a string and provides a DOM tree, representing that document
Printing[Bearbeiten]
- Convert [ XMLDocument-to-String ]
Generates a printed string representation from a DOM tree.
Element Extraction[Bearbeiten]
- XML [ Get Root Element ]
retrieves the root element of a DOM document
- XML [ Get Sub Elements ]
retrieves the direct DOM child elements from a given DOM element - XML [ Get Sub Elements Recursive ]
retrieves all DOM child elements (recursively) of a given DOM element - XML [ Enumerate Sub Elements ]
enumerates the direct child elements of a given element - XML [ Enumerate Sub Elements Recursive ]
enumerates all child elements (recursively) of a given element - XML [ Find Sub Elements ]
Find the immediate subelements of a given element by tag or attribute - XML [ Find Sub Elements Recursive ]
Enumerates all subelements of a given element by tag or attribute.
- XML [ Get CData Collection ]
Retrieves all of an element's CDATA. - XML [ Get CData Collection ]
Retrieves an element's single CDATA.
- XML [ Get Attribute Keys ]
Retrieves an element's attribute names. - XML [ Get Attribute Value ]
Retrieves an element's single attribute value. - XML [ Compare Attribute Value ]
Compare an element's attribute. Return a boolean. - XML [ Check for Attribute Value ]
Compare an element's attribute. Two-way output.
- XML [ Get Tag ]
Retrieves an element's tag. - XML [ Set Tag ]
Changes an element's tag.
XPath Access[Bearbeiten]
For XPath syntax, please refer to the XPath cheat sheet.
- XMLDocument [ XPath Get Element Set ]
Retrieve a set of matching elements, given an XPath match expression. These can also be conveniently created in the attachment editor's XML/DOM viewer. - XMLDocument [ XPath Get Element ]
Retrieve a single matching element, given an XPath match expression. (see also the attachment editor's XML/DOM viewer menu). - XMLDocument [ XPath Get Element CDATA ]
Retrieve a single matching element's CDATA, given an XPath match expression. - XMLDocument [ XPath Get Element nonEmpty CDATA ]
Retrieve a single matching element's non-empty CDATA, given an XPath match expression. - XMLDocument [ XPath Set Element ]
Changes a single matching element, given an XPath match expression. - XMLDocument [ XPath Set Element CDATA ]
Changes a single matching element's CDATA, given an XPath match expression.
XML Inspector[Bearbeiten]
- XMLDocument [ Inspect DOM ]
Opens a graphical inspector on a DOM tree - XMLDocument [ Inspect CML String ]
Opens a graphical inspector on a parsed XML string
Examples[Bearbeiten]
Example 1[Bearbeiten]
Fetch an XML document (using HTTP) and extract some value from it. In this example, a document containing ISO standard currency codes is retrieved, and the data of a currency is retrieved.
Both an example using action blocks from the standard library and an example using JavaScript elementary code is presented (the later for interested readers - you are not required to do any programming to solve this task).
Picture follows soon...
Example 2[Bearbeiten]
A common task eg. in facility or factory management is to extract data from an inventory list, which is given as XML document. For example to extract part numbers, measurement values or test descriptions.
In the following example, such a document is given as a zip archive which contains many individual XML documents, from which fields need to be extracted.
Step 1: Look at the File (how to manually detect ZIP file type)[Bearbeiten]
Many files are zip archives, even if the file has no ".zip" extension. To manually check, perform the following:
- open an expecco FileBrowser ("Extras" → "Tools" → "File Browser...") and select the file.
- right click (in the file name list) and select the "Properties") menu item
- if it is a ZIP file, there will be a line like: "MIME: ... contents:application/zip"
- as an alternative, look at the file's contents at the bottom; if it starts with "PK", it is also likely to be a zip file.
Step2 (optional): Inspect the Files in the expecco FileBrowser[Bearbeiten]
Add an archiver tab to the Filebrowser ("File" → "Add Archiver Page"), and select the file.
The zip-archive's contents should now be listed at the bottom. Double click on an entry to see its contents.
Alternatively, you can unzip the archive via one of the standard unzip tools (for example: 7zip).
Step 3 (optional): Inspect the XML Contents of an Extracted File.[Bearbeiten]
The FileBrowser is also able to present the XML contents after parsing it into a DOM tree. It will also generate unique xpath access strings when you select an element. This will later be useful, when the whole process is automated.
Step 4: Automating the ZIP Archive Processing[Bearbeiten]
First, you have to get the filenames from the archive. Use the "[ZIP File] List Contents" action from the standard library. This action has two output pins: one where all file names appear as one (possibly big) collection, another where each filename appears in sequence (i.e. this second pin delivers multiple values, one for each file component). This second pin is perfect for streaming: you can use this to feed a processing action, which deals with every file. For now, we assume that you will first have to use a filter on the names, to process only a subset of the files.
Step 5: Filter[Bearbeiten]
Feed the output into a "Filter [ Matching ]" action. Its input gets each file name in the archive file, its output only provides matching names. The pattern is a GLOB pattern (i.e. what you would also use as a command line argument to a shell or batch command).
For example, a pattern like "*/foo/*.xml" would give you all files which are contained in a subfolder named "foo", which have a suffix ".xml".
Then finally, feed the filter's output (which are the archive names) to a "Zip File [Extract File]" step. Its output will then be the real filenames are extracted onto your temp folder (that is what "$(TmpDirectory)" stands for). Notice that the Temp folder is removed after your expecco session. If you need to preserve those files, extract them to another folder (into your home directory or the "Documents" folder, maybe)
Step 6: Processing the files (each individually)[Bearbeiten]
This is best done in a separate new action. Both for easier testability and better reusability. So the filter and processing actions can later be individually reused. Thus, the filtered output is passed to a new output pin and the extract action is renamed to "Extract Matching from Zip Archive".
For the processing, we use the "XMLDocument [from File]" action, which is found in the XML Library. This needs to be imported first, as those actions are not in the standard library.
Make a new compound, with an input pin of type "FilenameOrString", and place & connect the "XMLDocument [from File]" inside, connecting corresponding pins. This provides the DOM tree: a hierarchical object tree as represented by the XML.
Hint: To add the XML action, press "CTRL-n" in the diagram editor, type "xml" or even "xml;file" into the filter, then the action is shown near the top of the list. Select it and press OK, or double click on it in the list.
Step 7: Extracting individiual values from the XML Document[Bearbeiten]
Of course, this step is now very task specific, and your milage will vary here.
For the demo, we assume that the XML document contains a parts list, consisting of PART/SUBPART/COMPONENT, with multiple PART instances, each containing multiple SUBPARTs and each of them again multiple COMPONENTS.
And that we have to extract a value found inside the COMPONENT.
For that, use an "XMLDocument [ XPath Get Element]" xaction and an xpath like "//COMPONENT/VALUE".
Do whatever is needed with those values. Notice, that the action provides a collection of VALUES, in case multiple object match the xpath. Place the second part into an action named "XML Processing". You may even want to split the document part from the part which deals with individual elements, and feed the xpath access strings via input pins. For maximum reusability.
Steps In Between: Add Little Tests[Bearbeiten]
Please always make use of the "Test/Demo" pages, to immediately verify that you diagram works as expected. So here is the test/demo of "Process XML File" and a run (by purpose with a bug initially - do you see what's wrong?).
Useful tools:[Bearbeiten]
Once you have the XML document or an element from it at hand (i.e. a DOM element), use the expecco inspector (click on a pin value after a run). The inspector will show an additional tab named "DOM", in which you can try xpathes to see which elements match. Xpath is quite powerful, and you may want to read the xpath cheat sheet to make best use of it.
Happy extracting.