Writing extracted data to XML

Overview

Oftentimes once you've extracted data from a page you'll want to write it out to an XML file. screen-scraper contains a special XmlWriter class that makes this a snap.

This script uses objects and methods that are only available in the enterprise edition of screen-scraper.

To use the XmlWriter class you'll generally follow these steps:

  1. Create an instance of XmlWriter in a script, storing it in a session variable.
  2. Extract data.
  3. In a script, get a reference to the XmlWriter object stored in step one, then call addElement or addElements to write out XML nodes.
  4. Repeat steps 2 and 3 as many times as you'd like.
  5. In a script, get a reference to the XmlWriter class, then call the close method on it.

The trickiest part is understanding which of the various addElement and addElements methods to call.

Examples

If you're scripting in Interpreted Java, the script in step 1 might look something like this:

// Create an instance of the XmlWriter class.
// Note the forward slash (as opposed to a back slash after
// the "C:". This is a more Java-friendly way of handling the
// directory delimiter.
xmlWriter = new com.screenscraper.xml.XmlWriter( "C:/my_xml_file.xml", "root_element", "This is the root element" );

// Save the XmlWriter object in a session variable.
session.setVariable( "XML_WRITER", xmlWriter );

In subsequent scripts, you can get a reference to that same XmlWriter object like this:

xmlWriter = session.getVariable( "XML_WRITER" );

You could then add elements and such to the XML file. The following three examples demonstrate the various ways to go about that. Each of the scripts are self-contained in that they create, add to, then close the XmlWriter object. Bear in mind that this process could be spread across multiple scripts, as described above.

Example 1

// Import the class we'll need.
import com.screenscraper.xml.XmlWriter;

// Instantiate a writer with a root node named "simple-root".
XmlWriter xmlWriter = new XmlWriter("./simple.xml", "simple-root");

// Create four identical tags with different inner text.
for (int i = 0; i < 4; i++) {
 // Appends to root element.  No attributes.
 xmlWriter.addElement( "one child", Integer.toString(i) );
}

// Close up the XML file.
xmlWriter.close();

This script would produce the following XML file:

<simple-root>
   <one_child>0</one_child>
   <one_child>1</one_child>
   <one_child>2</one_child>
   <one_child>3</one_child>
</simple-root>

Example 2

// Import the classes we'll need.
import java.util.Hashtable;
import com.screenscraper.xml.XmlWriter;

// First set up the various attributes.
Hashtable attributes = new Hashtable();
attributes.put("attrib1", "1");
attributes.put("attrib2", "2");
attributes.put("attrib3", "3");

// These are the children we'll be adding.
Hashtable children = new Hashtable();
children.put("child1", "1");
children.put("child2", "2");
children.put("child3", "3");
children.put("child4", "4");
children.put("child5", "5");

// Instantiate a writer with a root node named "difficult-root".
XmlWriter xmlWriter = new XmlWriter("./difficult.xml", "difficult-root");

firstElement = xmlWriter.addElement("first child", "first child text", attributes);

// Add more info to the first element.
secondElement = xmlWriter.addElements(firstElement, "second child", "second child text", children);

// Add more elements to root.  This time add text, attributes, and children.
thirdElement = xmlWriter.addElements("third child", "third child text", attributes, children);

// Illegal Example: Cannot add elements to the second Element
// since it was closed when thirdElement was added to the root.
// fourth = xmlWriter.addElement(secondElement, "wrong");

// Adds hashtable to attributes.  Appends to root element.
fifth = xmlWriter.addElement("another", "test", attributes );

// Adds hashtable to children elements, appends to the fifth element.
sixth = xmlWriter.addElements(fifth, "other", "test2", children );

// Adds attributes and children.  Appends to the sixth element.
seventh = xmlWriter.addElements(sixth, "complex", "example", attributes, children);

// Adds hashtable to attributes with children.  Appends to root element.
eighth = xmlWriter.addElements("eight", "ocho", attributes, children );

// Close up the XML file.
xmlWriter.close();

This script would produce the following XML file:

<difficult-root>
   <first_child attrib3="3" attrib2="2" attrib1="1">
      first child text
      <second_child>
         second child text
         <child5>5</child5>
         <child4>4</child4>
         <child3>3</child3>
         <child2>2</child2>
         <child1>1</child1>
      </second_child>
   </first_child>
   <third_child attrib3="3" attrib2="2" attrib1="1">
      third child text
      <child5>5</child5>
      <child4>4</child4>
      <child3>3</child3>
      <child2>2</child2>
      <child1>1</child1>
   </third_child>
   <another attrib3="3" attrib2="2" attrib1="1">
      test
      <other>
         test2
         <child5>5</child5>
         <child4>4</child4>
         <child3>3</child3>
         <child2>2</child2>
         <child1>1</child1>
         <complex attrib3="3" attrib2="2" attrib1="1">
            example
            <child5>5</child5>
            <child4>4</child4>
            <child3>3</child3>
            <child2>2</child2>
            <child1>1</child1>
         </complex>
      </other>
   </another>
   <eight attrib3="3" attrib2="2" attrib1="1">
      ocho
      <child5>5</child5>
      <child4>4</child4>
      <child3>3</child3>
      <child2>2</child2>
      <child1>1</child1>
   </eight>
</difficult-root>

Example 3

// Import the classes we'll need.
import java.util.Hashtable;
import com.screenscraper.xml.XmlWriter;

Hashtable attributes = new Hashtable();
attributes.put("attrib1", "1");
attributes.put("attrib2", "2");
attributes.put("attrib3", "3");

// Create a new file (complex.xml) with a root element
 // of 'complex-root' and text 'complex text'.
XmlWriter xmlWriter = new XmlWriter("./complex.xml", "complex-root", "complex text", attributes);

DataSet dataSet = new DataSet();

DataRecord dataRecord = null;

// Create 5 datarecords with different data.
for (int i = 0; i < 5; i++){
 dataRecord = new DataRecord();

 for (int j = 0; j < 5; j++) {
 dataRecord.put("tag" + Integer.toString(j), Integer.toString(i * j));
 }

 dataSet.addDataRecord(dataRecord);
}

// Writes the data set to xml.  The datarecords are surrounded by the tag
// defined by 'data set container'.  Notice that the tag automatically
 // reformats to: data_set_container, since xml tag names cannot have spaces.
xmlWriter.addElements("data set container", dataSet);

// Must be called after all writing is done.  Will close the file and any
// open tags in the xml.<br />
xmlWriter.close();

This script would produce the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<complex-root attrib3="3" attrib2="2" attrib1="1">
   complex text
   <data_set_container>
      <tag4>0</tag4>
      <tag3>0</tag3>
      <tag2>0</tag2>
      <tag1>0</tag1>
      <tag0>0</tag0>
   </data_set_container>
   <data_set_container>
      <tag4>4</tag4>
      <tag3>3</tag3>
      <tag2>2</tag2>
      <tag1>1</tag1>
      <tag0>0</tag0>
   </data_set_container>
   <data_set_container>
      <tag4>8</tag4>
      <tag3>6</tag3>
      <tag2>4</tag2>
      <tag1>2</tag1>
      <tag0>0</tag0>
   </data_set_container>
   <data_set_container>
      <tag4>12</tag4>
      <tag3>9</tag3>
      <tag2>6</tag2>
      <tag1>3</tag1>
      <tag0>0</tag0>
   </data_set_container>
   <data_set_container>
      <tag4>16</tag4>
      <tag3>12</tag3>
      <tag2>8</tag2>
      <tag1>4</tag1>
      <tag0>0</tag0>
   </data_set_container>
</complex-root>