XML Parser and DOM Tree Library/en: Unterschied zwischen den Versionen
Cg (Diskussion | Beiträge) |
Cg (Diskussion | Beiträge) |
||
Zeile 76: | Zeile 76: | ||
[[Datei:XML_Example_01.png|600px]] |
[[Datei:XML_Example_01.png|600px]] |
||
====== Step2 (optional): |
====== Step2 (optional): inspect the files in the expecco FileBrowser ====== |
||
<br>Add an archiver tab to the Filebrowser ("''File''" - "''Add Archiver Page''"), and select the file. The zip-archive's contents should now be listed at the bottom. Double click on an entry to see its contents. |
<br>Add an archiver tab to the Filebrowser ("''File''" - "''Add Archiver Page''"), and select the file. The zip-archive's contents should now be listed at the bottom. Double click on an entry to see its contents. |
||
Alternatively, you can unzip the archive via one of the standard unzip tools (for example: 7zip). |
Alternatively, you can unzip the archive via one of the standard unzip tools (for example: 7zip). |
Version vom 21. Juni 2023, 10:25 Uhr
Inhaltsverzeichnis
- 1 Introduction
- 2 Library Reference
- 2.1 Parsing
- 2.2 Printing
- 2.3 Element Extraction
- 2.4 XPath Access
- 2.5 XML Inspector
- 2.6 Examples
- 2.6.1 Example 1
- 2.6.2 Example 2
- 2.6.2.1 Step 1: Look at the File (how to manually detect ZIP file type)
- 2.6.2.2 Step2 (optional): inspect the files in the expecco FileBrowser
- 2.6.2.3 Step 3 (optional): inspect the XML contents of an extracted file.
- 2.6.2.4 Step 4: Automating the ZIP archive processing
- 2.6.2.5 Step 5: Filter
- 2.6.2.6 Step 6: Processing the files (each individually)
- 2.6.2.7 Step 7: Extracting individiual values from the XML Document
- 2.6.2.8 Useful tools:
Introduction[Bearbeiten]
This library contains action blocks to read and manipulate XML documents.
Library Reference[Bearbeiten]
Please import the library and take a look at the documentation and Test/Demo examples.
Parsing[Bearbeiten]
- XMLDocument [ From File ]
reads XML from a file and provides a DOM tree, representing that document - XMLDocument [ From Stream ]
reads XML from a stream and provides a DOM tree, representing that document - XMLDocument [ From Stream ]
reads XML from a string and provides a DOM tree, representing that document
Printing[Bearbeiten]
- Convert [ XMLDocument-to-String ]
Generates a printed string representation from a DOM tree.
Element Extraction[Bearbeiten]
- XML [ Get Root Element ]
retrieves the root element of a DOM document
- XML [ Get Sub Elements ]
retrieves the direct DOM child elements from a given DOM element - XML [ Get Sub Elements Recursive ]
retrieves all DOM child elements (recursively) of a given DOM element - XML [ Enumerate Sub Elements ]
enumerates the direct child elements of a given element - XML [ Enumerate Sub Elements Recursive ]
enumerates all child elements (recursively) of a given element - XML [ Find Sub Elements ]
Find the immediate subelements of a given element by tag or attribute - XML [ Find Sub Elements Recursive ]
Enumerates all subelements of a given element by tag or attribute.
- XML [ Get CData Collection ]
Retrieves all of an element's CDATA. - XML [ Get CData Collection ]
Retrieves an element's single CDATA.
- XML [ Get Attribute Keys ]
Retrieves an element's attribute names. - XML [ Get Attribute Value ]
Retrieves an element's single attribute value. - XML [ Compare Attribute Value ]
Compare an element's attribute. Return a boolean. - XML [ Check for Attribute Value ]
Compare an element's attribute. Two-way output.
- XML [ Get Tag ]
Retrieves an element's tag. - XML [ Set Tag ]
Changes an element's tag.
XPath Access[Bearbeiten]
- XMLDocument [ XPath Get Element Set ]
Retrieve a set of matching elements, given an XPath match expression. - XMLDocument [ XPath Get Element ]
Retrieve a single matching element, given an XPath match expression. - XMLDocument [ XPath Get Element CDATA ]
Retrieve a single matching element's CDATA, given an XPath match expression. - XMLDocument [ XPath Get Element nonEmpty CDATA ]
Retrieve a single matching element's non-empty CDATA, given an XPath match expression. - XMLDocument [ XPath Set Element ]
Changes a single matching element, given an XPath match expression. - XMLDocument [ XPath Set Element CDATA ]
Changes a single matching element's CDATA, given an XPath match expression.
XML Inspector[Bearbeiten]
- XMLDocument [ Inspect DOM ]
Opens a graphical inspector on a DOM tree - XMLDocument [ Inspect CML String ]
Opens a graphical inspector on a parsed XML string
Examples[Bearbeiten]
Example 1[Bearbeiten]
Fetch an XML document (using HTTP) and extract some value from it. In this example, a document containing ISO standard currency codes is retrieved, and the data of a currency is retrieved.
Both an example using action blocks from the standard library and an example using JavaScript elementary code is presented (the later for interested readers - you are not required to do any programming to solve this task).
Picture follows soon...
Example 2[Bearbeiten]
A common task eg. in facility or factory management is to extract data from an inventory list, which is given as XML document. For example to extract part numbers, measurement values or test descriptions.
In the following example, such a document is given as a zip archive which contains many individual XML documents, from which fields need to be extracted.
Step 1: Look at the File (how to manually detect ZIP file type)[Bearbeiten]
Many files are zip archives, even if the file has no ".zip" extension. To manually check, perform the following:
- open an expecco FileBrowser and select the file ("Extras" - "Tools" - "File Browser...").
- right click (in the file name list) and select the "Properties") menu item
- if it is a ZIP file, there will be a line like: "MIME: ... contents:application/zip"
- as an alternative, look at the file's contents at the bottom; if it starts with "PK", it is also likely to be a zip file.
Step2 (optional): inspect the files in the expecco FileBrowser[Bearbeiten]
Add an archiver tab to the Filebrowser ("File" - "Add Archiver Page"), and select the file. The zip-archive's contents should now be listed at the bottom. Double click on an entry to see its contents.
Alternatively, you can unzip the archive via one of the standard unzip tools (for example: 7zip).
Step 3 (optional): inspect the XML contents of an extracted file.[Bearbeiten]
The FileBrowser is also able to present the XML contents after parsing it into a DOM tree. It will also generate unique xpath access strings when you select an element. This will later be useful, when the whole process is automated.
Step 4: Automating the ZIP archive processing[Bearbeiten]
First, you have to get the filenames from the archive. Use the "ZIP List contents" action from the standard library. This action has two output pins: one where all file names appear as one (possibly big) collection, another where each filename appears in sequence (i.e. this second pin delivers multiple values, one for each file component. This second pin is perfect for streaming: you can use this to feed a processing action, which deals with every file.
For now, we assume that you will first have to use a filter on the names, to process only a subset of the files.
Step 5: Filter[Bearbeiten]
Feed the whole collection into a "Select by Match Pattern" action. Its input gets the collection containing all archive files, its output provides a new collection containing the subset, where the name matches. The pattern is a GLOB pattern (i.e. what you would also use as a command line argument to a shell or batch command. For example, a pattern like "*/foo/*.xml" would give you all files which are contained in a subfolder named "foo", which have a suffix ".xml". Notice that the filter gives you another collection. If you want to process the files later in sequence, add a "Collection Enumerate" action as a last step of the processing chain, and connect its output to the output of the compound action. Lets call this compound "Archive Extractor Action".
Step 6: Processing the files (each individually)[Bearbeiten]
This is best done in a new action. Both for easier testability and better reusability. So the filter and processing actions can later be individually reused.
For the processing, we use the "XML Document from File" action, which is found in the "XML Library". This needs to be imported first, as those actions are not in the standard library.
Make a new compound, with an input pin of type "FilenameOrString", and connect the "XML Document from File" inside, connecting corresponding pins. This provides the DOM tree: a hierarchical object tree as represented by the XML.
Step 7: Extracting individiual values from the XML Document[Bearbeiten]
Of course, this step is now very task specific, and your milage will vary here. For the demo, we assume that the XML document contains a parts list, consisting of PART/SUBPART/COMPONENT, with multiple PART instances, each containing multiple SUBPARTs and each of them again multiple COMPONENTS. And that we have to extract a value found inside the COMPONENT. For that, use an "XML Extract Element by XPATH" xaction and an xpath like "//COMPONENT/VALUE". Do whatever is needed with those values. Notice, that the action provides a collection of VALUES, in case multiple object match the xpath. Place the second part into an action named "XML Processing". You may even want to split the document part from the part which deals with individual elements, and feed the xpath access strings via input pins. For maximum reusability.
Useful tools:[Bearbeiten]
once you have the XML document or an element from it, use the expecco inspector (click on a pin value after a run). The inspector will show an additional tab named "DOM", in which you can try xpathes to see which elements match. Xpath is quite powerful, and you may want to read the xpath cheat sheet to make best use of it.
Happy extracting.