XML Parser and DOM Tree Library/en

Aus expecco Wiki (Version 2.x)
Zur Navigation springen Zur Suche springen

Introduction[Bearbeiten]

This library contains action blocks to read and manipulate XML documents.

Library Reference[Bearbeiten]

Please import the library and take a look at the documentation and Test/Demo examples.

Parsing[Bearbeiten]

Printing[Bearbeiten]

Element Extraction[Bearbeiten]

XPath Access[Bearbeiten]

XML Inspector[Bearbeiten]

Examples[Bearbeiten]

Example 1[Bearbeiten]

Fetch an XML document (using HTTP) and extract some value from it. In this example, a document containing ISO standard currency codes is retrieved, and the data of a currency is retrieved.

Both an example using action blocks from the standard library and an example using JavaScript elementary code is presented (the later for interested readers - you are not required to do any programming to solve this task).

Picture follows soon...

Example 2[Bearbeiten]

A common task eg. in facility or factory management is to extract data from an inventory list, which is given as XML document. For example to extract part numbers, measurement values or test descriptions.

In the following example, such a document is given as a zip archive which contains many individual XML documents, from which fields need to be extracted.

Step 1: Look at the File (how to manually detect ZIP file type)[Bearbeiten]
Zip Archive in FileBrowser

Many files are zip archives, even if the file has no ".zip" extension. To manually check, perform the following:

  • open an expecco FileBrowser and select the file ("Extras" - "Tools" - "File Browser...").
  • right click (in the file name list) and select the "Properties") menu item
  • if it is a ZIP file, there will be a line like: "MIME: ... contents:application/zip"
as an alternative, look at the file's contents at the bottom; if it starts with "PK", it is also likely to be a zip file.
Step2 (optional): Inspect the Files in the expecco FileBrowser[Bearbeiten]
Zip Archive in FileBrowser

Add an archiver tab to the Filebrowser ("File" - "Add Archiver Page"), and select the file. The zip-archive's contents should now be listed at the bottom. Double click on an entry to see its contents. Alternatively, you can unzip the archive via one of the standard unzip tools (for example: 7zip).

Step 3 (optional): Inspect the XML Contents of an Extracted File.[Bearbeiten]
Zip Archive in FileBrowser

The FileBrowser is also able to present the XML contents after parsing it into a DOM tree. It will also generate unique xpath access strings when you select an element. This will later be useful, when the whole process is automated.

Step 4: Automating the ZIP Archive Processing[Bearbeiten]
Schema of Extract Action (to begin with)

First, you have to get the filenames from the archive. Use the "[ZIP File] List Contents" action from the standard library. This action has two output pins: one where all file names appear as one (possibly big) collection, another where each filename appears in sequence (i.e. this second pin delivers multiple values, one for each file component. This second pin is perfect for streaming: you can use this to feed a processing action, which deals with every file. For now, we assume that you will first have to use a filter on the names, to process only a subset of the files.

Step 5: Filter[Bearbeiten]

Feed the output into a "Filter [ Matching ]" action. Its input gets each file name in the archive file, its output only provides matching names. The pattern is a GLOB pattern (i.e. what you would also use as a command line argument to a shell or batch command.
For example, a pattern like "*/foo/*.xml" would give you all files which are contained in a subfolder named "foo", which have a suffix ".xml".

Then finally, feed the filter's output (which are the archive names) to a "Zip File [Extract File]" step. Its output will then be the real filenames are extracted onto your temp folder (that is what "$(TmpDirectory)" stands for). Notice that the Temp folder is removed after your expecco session. If you need to preserve those files, extract them to another folder (into your home directory or the "Documents" folder, maybe)

Step 6: Processing the files (each individually)[Bearbeiten]
Implementation of Extract Action (with output pin)

This is best done in a separate new action. Both for easier testability and better reusability. So the filter and processing actions can later be individually reused. Thus, the filtered output is passed to a new output pin and the extract action is renamed to "Extract Matching from Zip Archive".

For the processing, we use the "XMLDocument [from File]" action, which is found in the XML Library. This needs to be imported first, as those actions are not in the standard library.
Make a new compound, with an input pin of type "FilenameOrString", and place & connect the "XMLDocument [from File]" inside, connecting corresponding pins. This provides the DOM tree: a hierarchical object tree as represented by the XML.

Hint: To add the XML action, press "CTRL-n" in the diagram editor, type "xml" or even "xml;file" into the filter, then the action is shown near the top of the list. Select it and press OK, or double click on it in the list.

Step 7: Extracting individiual values from the XML Document[Bearbeiten]

Of course, this step is now very task specific, and your milage will vary here.

For the demo, we assume that the XML document contains a parts list, consisting of PART/SUBPART/COMPONENT, with multiple PART instances, each containing multiple SUBPARTs and each of them again multiple COMPONENTS.
And that we have to extract a value found inside the COMPONENT.

For that, use an "XMLDocument [ XPath Get Element]" xaction and an xpath like "//COMPONENT/VALUE".

Element Extraction

Do whatever is needed with those values. Notice, that the action provides a collection of VALUES, in case multiple object match the xpath. Place the second part into an action named "XML Processing". You may even want to split the document part from the part which deals with individual elements, and feed the xpath access strings via input pins. For maximum reusability.

Steps In Between: Add Little Tests[Bearbeiten]

Please always make use of the "Test/Demo" pages, to immediately verify that you diagram works as expected. So here is the test/demo of "Process XML File" and a run (by purpose with a bug initially - do you see what's wrong?).

Test/Demo (with Bug)
Test Run - obviously with a bad input
XPath Fixed
Test Fixed
Test Run OK
Useful tools:[Bearbeiten]

Once you have the XML document or an element from it at hand (i.e. a DOM element), use the expecco inspector (click on a pin value after a run). The inspector will show an additional tab named "DOM", in which you can try xpathes to see which elements match. Xpath is quite powerful, and you may want to read the xpath cheat sheet to make best use of it.

Happy extracting.



Copyright © 2014-2024 eXept Software AG