Output

Overview

There are many ways to output scraped data from screen-scraper. Below are sample scripts of some common ways.

Prepare For Output--Fix Phone

The following script contains a method that you may instead wish to call from within your "Write to CSV" script. The purpose of the script is to put phone numbers into a standard format (123-456-7890 x 1234) prior to output. Note: Be careful when using this script to work with non-U.S. phone numbers, since other countries may have more or fewer digits.

 String fixPhone(String phone){

    if (phone!=null && phone!=void){
        session.log("+++Dealing with phone formated: " + phone);
        // Replace non-digits with nothing
        // Note: "\\D" is a regular expression that means "not a digit"
        phone = phone.replaceAll("\\D", "");

        // If there is a leading 1, remove it
        if (phone.startsWith("1")){
            session.log("+++Starts with a one, so removing.");
            phone = phone.substring(1,phone.length());
        }

         // Reformat the phone to the format: "123-456-7890"
        if (phone.length()>=10){
            area = phone.substring(0,3);
            prefix = phone.substring(3,6);
            number = phone.substring(6,10);

            newPhone = "(" + area + ") " + prefix + "-" + number;
        }
        else{
            session.log("---Error: phone number hasn't enough digits");
        }

        // Deal with phone extensions
        if (phone.length()>10){
            newPhone += " x";
            newPhone += phone.substring(10,phone.length());
        }
    }
    return "\"" + ((newPhone==null || newPhone==void)? "" : newPhone ) + "\"";
}

Prepare For Output--Parse Full Name (including suffixes)

The following script proves useful in most cases when there is a need to separate a full name into first name, middle name, surname, and suffixes (if applicable). The suffixes include JR, SR, I, II, III, 3rd, IV, V, VI, VII. The script is also set up to work with names in the "LASTNAME, FIRSTNAME SUFFIX" format.

// Determines whether or not the string represents
// a suffix.
boolean isSuffix( String value )
{
    session.log( "Determining whether or not this is a suffix: " + value );
    value = value.toUpperCase();
    returnVal = (
                              value.indexOf( "JR" )!=-1
                              ||
                              value.indexOf( "SR" )!=-1
                              ||
                              value.equals( "I" )
                              ||
                              value.equals( "II" )
                              ||
                              value.equals( "III" )
                              ||
                              value.equals( "3RD" )
                              ||
                              value.equals( "IV" )
                              ||
                              value.equals( "V" )
                              ||
                              value.equals( "VI" )
                              ||
                              value.equals( "VII" )
                          );
    session.log( "Suffix test returning: " + returnVal );
    return returnVal;
}

fixName( String name )
{
    name = name.replaceAll(",", "").trim();
    return name;
}

name = dataRecord.get("NAME");
name = name.replaceAll(" ", " ");
name = name.replaceAll("\\.", "");
name = name.replaceAll(";", " ");
name = name.replaceAll("[ ]{2,}", " ").trim();
lastName = "";
firstName = "";
middleName = "";
suffix = "";

session.log( "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@" );
session.log( "NAME: " + name );
session.log( "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@" );

// Split on the space.
nameParts = name.split( " " );

// Send the parts to the log.
for( int i = 0; i < nameParts.length; i++ )
{
    session.log( "Name part #" + i + ": " + nameParts[i] );
}

// If the first part has a comma we assume it's the last name.
if( nameParts.length > 1 && nameParts[0].indexOf( "," )!=-1 )
{
    session.log( "The first name part has a comma." );

    // If we have two elements we assume the order is
    // last name, first name.
    if( nameParts.length==2 )
    {
        lastName = fixName(nameParts[0]);
        firstName = fixName(nameParts[1]);
    }

    // If we have three elements we assume the order is
    // either last name, first name, middle name or
    // last name, first name suffix.
    else if( nameParts.length==3 )
    {
        if( !isSuffix( nameParts[2] ) )
        {
            lastName = fixName(nameParts[0]);
            firstName = fixName(nameParts[1]);
            middleName = fixName(nameParts[2]);
        }
        else
        {
            lastName = fixName(nameParts[0]);
            firstName = fixName(nameParts[1]);
            suffix = fixName(nameParts[2]);
        }
    }

    else if( nameParts.length==4 )
    {
        // It will either be last name, first name middle name middle name or
        // last name, first name middle name suffix.
        if( !isSuffix( nameParts[3] ) )
        {
            lastName = fixName(nameParts[0]);
            firstName = fixName(nameParts[1]);
            middleName = fixName(nameParts[2]) + " " + fixName(nameParts[3]);
        }
        else
        {
            lastName = fixName(nameParts[0]);
            firstName = fixName(nameParts[1]);
            middleName = fixName(nameParts[2]);
            suffix = fixName(nameParts[3]);
        }
    }
 }
// If we have four parts and no comma it's either First Name Middle Name Middle Name Last Name or
// First Name Middle Name Last Name Suffix.
else if( nameParts.length==4 )
{
    session.log( "The name has four elements." );

    if( !isSuffix( nameParts[3] ) )
    {
        firstName = fixName(nameParts[0]);
        middleName = fixName(nameParts[1]) + " " + fixName(nameParts[2]);
        lastName = fixName(nameParts[3]);
    }
    else
    {
        firstName = fixName(nameParts[0]);
        middleName = fixName(nameParts[1]);
        lastName = fixName(nameParts[2]);
        suffix = fixName(nameParts[3]);
    }
}
// If we have three parts and no comma it's either First Name Middle Name Last Name or
// First Name Last Name Suffix.
else if( nameParts.length==3 )
{
    session.log( "The name has three elements." );

    if( !isSuffix( nameParts[2] ) )
    {
        firstName = fixName(nameParts[0]);
        middleName = fixName(nameParts[1]);
        lastName = fixName(nameParts[2]);
    }
    else
    {
        firstName = fixName(nameParts[0]);
        lastName = fixName(nameParts[1]);
        suffix = fixName(nameParts[2]);
    }
}
// If the first part doesn't have a comma we assume the first
// name is given first.
else
{
    // If we have only two parts we assume first name then last name.
    if( nameParts.length==2 )
    {
        firstName = fixName(nameParts[0]);
        middleName = "";
        lastName = fixName(nameParts[1]);
    }
}

session.log( "####################################################" );
session.log( "FIRST NAME: " + firstName );
session.log( "MIDDLE NAME: " + middleName );
session.log( "LAST NAME: " + lastName );
session.log( "SUFFIX: " + suffix );
session.log( "####################################################" );

dataRecord.put( "FNAME", firstName );
dataRecord.put( "MNAME", middleName );
dataRecord.put( "LNAME", lastName );
dataRecord.put( "SUFFIX", suffix );

Prepare For Output--Parse Zipcode

The following code is used to split zip codes from a pattern match. The code below takes a zip code and assigns the first five digits to the variable "ZIP". If the zip code is in the longer format (12345-6789), as opposed to the shorter format (12345), then the second part of the zip code, which comes after the "-" character, is assigned to the "ZIP4" variable (so named for the 4 digits following the "-" character). This script would be useful in cases where zip codes must be standardized.

try{
    // Local reference to variables
    String zip = dataRecord.get("ZIP");

    if(zip != null){
        // Split the zip code on the "-" character (for zip codes in the 12345-6789 format)
        String[]  zipParts = zip.split("-");

         // Put parts in dataRecord
        dataRecord.put("ZIP", zipParts[0]);

         // If we were able to split the zip into two pieces (for zip codes in the 12345-6789 format),
        // then we store the last four digits in the variable "ZIP4"
        if (zipParts.length == 2){
            dataRecord.put("ZIP4", zipParts[1]);
        }
    }
}
catch(Exception e){
    session.log("Error running Fix Zip Codes and Nulls");
}

Prepare For Output--Strip non-numbers

This is a simple script used from removing all non-numerical characters from numbers. This is particularly useful when attempting to normalize data before insertion into a database.

String [] variables = {"BUILDING_YEARBUILT", "BUILDING_YEARRENOVATED", "BUILDING_TOTAL_SF", "BUILDING_STORIES", "BUILDING_ELEVATORS", "LISTING_MAX_SF", "LISTING_MIN_SF"};

i = 0;

// Iterate through each variable in the array above
while (i < variables.length){

    //Get the variables to be fixed
    value = session.getVariable(variables[i]);

    //Log the UNFIXED values
    session.log("UNFIXED: " + variables[i] + " = " + value);

    if(value != null){
        //Remove non-numerical elements from number
        value = value.replaceAll("\\D","");

        // Set variables with new values
        dataRecord.put(variables[i], value);
        session.setVariable(variables[i], value);

        //Log the FIXED values
        session.log("FIXED " + variables[i] + " = " + session.getVariable(variables[i]));
        }
    i++;
}

Write to CSV

Probably the easiest way to write to a comma-seperated value (CSV) document is to use screen-scrapers included CsvWriter. If for some reason you can't/don't wish to use the CsvWriter the following code will also accomplish the task. CSV files are very useful for viewing in spreadsheets or inserting values into a database.

Also, you'll notice that the session variables are cleared out at the end of the script. This would be done when you don't want a session variable to persist into the next dataRecord. For more about scope and dataRecords please go here.

import java.util.Date;
import java.text.DateFormat;
import java.text.SimpleDateFormat;

// Date/time string to add to filename or column
String getDateTime()
{
        DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd_hhmmss");
        Date date = new Date();
        return dateFormat.format(date);
}

// Fix format issues, and return wrapped in quotes
String fixString(String value)
{
        if (value != null)
        {
                value = sutil.convertHTMLEntities(value);
                value = value.replaceAll("\"", "\'");
                value = value.replaceAll("\\s{2,}", " ");
                value = value.trim();
        }
        return "\"" + (value==null ? "" : value) + "\"";
}

String fixPhone(String phone)
{
        if (phone!=null)
        {
                phone = phone.replaceAll("\\D", "");
                // If there is a leading 1, remove it
                if (phone.startsWith("1"))
                {
                        session.log("+++Starts with a one, so removing.");
                        phone = phone.substring(1,phone.length());
                }
                       
                if (phone.length()>=10)
                {
                        area = phone.substring(0,3);
                        prefix = phone.substring(3,6);
                        number = phone.substring(6,10);
                        newPhone = "(" + area + ") " + prefix + "-" + number;                  
                }
               
                // Deal with extensions
                if (phone.length()>10)
                {
                        newPhone += " x";
                        newPhone += phone.substring(10,phone.length());
                }
        }
        return ((newPhone==null || newPhone==void) ? "" : newPhone) ;
}

// Set name of file to write to
// outputFile = "output/" + session.getName() + "_" + getDateTime() + ".csv";
outputFile = "output/" + session.getName() + ".csv";

// Set columns to write
// Will look for tokens of same name using usual naming convention
String[] names = {
        "Dealer",
        "Address1",
        "Address2",
        "City",
        "State",
        "Post code",
        "Country",
        "Phone",
        "Fax"
};

try
{
        File file = new File(outputFile);
        fileExists = file.exists();
       
        // Open up the file to be appended to
        out = new FileWriter(outputFile, true);
        session.log("Writing data to a file");
        if (!fileExists)
        {
                // Write headers
                for (i=0; i<names.length; i++)
                {
                        out.write(names[i]);
                        if (i<names.length-1)
                                out.write(",");
                }
                out.write("\n");
        }
               
        // Write columns
        for (i=0; i<names.length; i++)
        {
                var = names[i];
                var = var.toUpperCase();
                var = var.replaceAll("\\s", "_");
                out.write(fixString(dataRecord.get(var)));
                if (i<names.length-1)
                        out.write(",");
        }
        out.write( "\n" );

        // Close up the file
        out.close();
       
        // Add to controller
        session.addToNumRecordsScraped(1);
}

catch( Exception e )
{
        session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}

Writing extracted data to XML

Overview

Oftentimes once you've extracted data from a page you'll want to write it out to an XML file. screen-scraper contains a special XmlWriter class that makes this a snap.

This script uses objects and methods that are only available in the enterprise edition of screen-scraper.

To use the XmlWriter class you'll generally follow these steps:

  1. Create an instance of XmlWriter in a script, storing it in a session variable.
  2. Extract data.
  3. In a script, get a reference to the XmlWriter object stored in step one, then call addElement or addElements to write out XML nodes.
  4. Repeat steps 2 and 3 as many times as you'd like.
  5. In a script, get a reference to the XmlWriter class, then call the close method on it.

The trickiest part is understanding which of the various addElement and addElements methods to call.

Examples

If you're scripting in Interpreted Java, the script in step 1 might look something like this:

// Create an instance of the XmlWriter class.
// Note the forward slash (as opposed to a back slash after
// the "C:". This is a more Java-friendly way of handling the
// directory delimiter.
xmlWriter = new com.screenscraper.xml.XmlWriter( "C:/my_xml_file.xml", "root_element", "This is the root element" );

// Save the XmlWriter object in a session variable.
session.setVariable( "XML_WRITER", xmlWriter );

In subsequent scripts, you can get a reference to that same XmlWriter object like this:

xmlWriter = session.getVariable( "XML_WRITER" );

You could then add elements and such to the XML file. The following three examples demonstrate the various ways to go about that. Each of the scripts are self-contained in that they create, add to, then close the XmlWriter object. Bear in mind that this process could be spread across multiple scripts, as described above.

Example 1

// Import the class we'll need.
import com.screenscraper.xml.XmlWriter;

// Instantiate a writer with a root node named "simple-root".
XmlWriter xmlWriter = new XmlWriter("./simple.xml", "simple-root");

// Create four identical tags with different inner text.
for (int i = 0; i < 4; i++) {
 // Appends to root element.  No attributes.
 xmlWriter.addElement( "one child", Integer.toString(i) );
}

// Close up the XML file.
xmlWriter.close();

This script would produce the following XML file:

<simple-root>
   <one_child>0</one_child>
   <one_child>1</one_child>
   <one_child>2</one_child>
   <one_child>3</one_child>
</simple-root>

Example 2

// Import the classes we'll need.
import java.util.Hashtable;
import com.screenscraper.xml.XmlWriter;

// First set up the various attributes.
Hashtable attributes = new Hashtable();
attributes.put("attrib1", "1");
attributes.put("attrib2", "2");
attributes.put("attrib3", "3");

// These are the children we'll be adding.
Hashtable children = new Hashtable();
children.put("child1", "1");
children.put("child2", "2");
children.put("child3", "3");
children.put("child4", "4");
children.put("child5", "5");

// Instantiate a writer with a root node named "difficult-root".
XmlWriter xmlWriter = new XmlWriter("./difficult.xml", "difficult-root");

firstElement = xmlWriter.addElement("first child", "first child text", attributes);

// Add more info to the first element.
secondElement = xmlWriter.addElements(firstElement, "second child", "second child text", children);

// Add more elements to root.  This time add text, attributes, and children.
thirdElement = xmlWriter.addElements("third child", "third child text", attributes, children);

// Illegal Example: Cannot add elements to the second Element
// since it was closed when thirdElement was added to the root.
// fourth = xmlWriter.addElement(secondElement, "wrong");

// Adds hashtable to attributes.  Appends to root element.
fifth = xmlWriter.addElement("another", "test", attributes );

// Adds hashtable to children elements, appends to the fifth element.
sixth = xmlWriter.addElements(fifth, "other", "test2", children );

// Adds attributes and children.  Appends to the sixth element.
seventh = xmlWriter.addElements(sixth, "complex", "example", attributes, children);

// Adds hashtable to attributes with children.  Appends to root element.
eighth = xmlWriter.addElements("eight", "ocho", attributes, children );

// Close up the XML file.
xmlWriter.close();

This script would produce the following XML file:

<difficult-root>
   <first_child attrib3="3" attrib2="2" attrib1="1">
      first child text
      <second_child>
         second child text
         <child5>5</child5>
         <child4>4</child4>
         <child3>3</child3>
         <child2>2</child2>
         <child1>1</child1>
      </second_child>
   </first_child>
   <third_child attrib3="3" attrib2="2" attrib1="1">
      third child text
      <child5>5</child5>
      <child4>4</child4>
      <child3>3</child3>
      <child2>2</child2>
      <child1>1</child1>
   </third_child>
   <another attrib3="3" attrib2="2" attrib1="1">
      test
      <other>
         test2
         <child5>5</child5>
         <child4>4</child4>
         <child3>3</child3>
         <child2>2</child2>
         <child1>1</child1>
         <complex attrib3="3" attrib2="2" attrib1="1">
            example
            <child5>5</child5>
            <child4>4</child4>
            <child3>3</child3>
            <child2>2</child2>
            <child1>1</child1>
         </complex>
      </other>
   </another>
   <eight attrib3="3" attrib2="2" attrib1="1">
      ocho
      <child5>5</child5>
      <child4>4</child4>
      <child3>3</child3>
      <child2>2</child2>
      <child1>1</child1>
   </eight>
</difficult-root>

Example 3

// Import the classes we'll need.
import java.util.Hashtable;
import com.screenscraper.xml.XmlWriter;

Hashtable attributes = new Hashtable();
attributes.put("attrib1", "1");
attributes.put("attrib2", "2");
attributes.put("attrib3", "3");

// Create a new file (complex.xml) with a root element
 // of 'complex-root' and text 'complex text'.
XmlWriter xmlWriter = new XmlWriter("./complex.xml", "complex-root", "complex text", attributes);

DataSet dataSet = new DataSet();

DataRecord dataRecord = null;

// Create 5 datarecords with different data.
for (int i = 0; i < 5; i++){
 dataRecord = new DataRecord();

 for (int j = 0; j < 5; j++) {
 dataRecord.put("tag" + Integer.toString(j), Integer.toString(i * j));
 }

 dataSet.addDataRecord(dataRecord);
}

// Writes the data set to xml.  The datarecords are surrounded by the tag
// defined by 'data set container'.  Notice that the tag automatically
 // reformats to: data_set_container, since xml tag names cannot have spaces.
xmlWriter.addElements("data set container", dataSet);

// Must be called after all writing is done.  Will close the file and any
// open tags in the xml.<br />
xmlWriter.close();

This script would produce the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<complex-root attrib3="3" attrib2="2" attrib1="1">
   complex text
   <data_set_container>
      <tag4>0</tag4>
      <tag3>0</tag3>
      <tag2>0</tag2>
      <tag1>0</tag1>
      <tag0>0</tag0>
   </data_set_container>
   <data_set_container>
      <tag4>4</tag4>
      <tag3>3</tag3>
      <tag2>2</tag2>
      <tag1>1</tag1>
      <tag0>0</tag0>
   </data_set_container>
   <data_set_container>
      <tag4>8</tag4>
      <tag3>6</tag3>
      <tag2>4</tag2>
      <tag1>2</tag1>
      <tag0>0</tag0>
   </data_set_container>
   <data_set_container>
      <tag4>12</tag4>
      <tag3>9</tag3>
      <tag2>6</tag2>
      <tag1>3</tag1>
      <tag0>0</tag0>
   </data_set_container>
   <data_set_container>
      <tag4>16</tag4>
      <tag3>12</tag3>
      <tag2>8</tag2>
      <tag1>4</tag1>
      <tag0>0</tag0>
   </data_set_container>
</complex-root>

Working with MySQL databases

Consider using the SqlDataManager as an alternative way to interact with your JDBC-compliant databases.

This example is designed to give you an idea of how to interact with MySQL, a JDBC-compliant database, from within screen-scraper.

You will need to have MySQL already installed and the service running.

To start, download the JDBC Driver for MySQL connector Jar file and place it in the lib/ext folder where screen-scraper is installed.

Next, create a script wherein you set the different values used to connect to your database. It is recommended that you call this script from your scraping session before scraping session begins.

//The values below may be different for you
// depending on your set up.
session.setVariable("MYSQL_SERVER_URL","localhost");
session.setVariable("MYSQL_SERVER_PORT","3306");
session.setVariable("MYSQL_DATABASE","mydb");
session.setVariable("MYSQL_SERVER_USER","username");
session.setVariable("MYSQL_SERVER_PASSWORD","password");

Create another script to set up your connection and perform queries on your database. Note, it is necessary to include the connection to your database within the same script as your queries.

You will be calling this script after you have extracted data. Typically this will either be after a scrapeable file runs or after an extractor pattern's matches are applied.

//Import the entire java.sql package
import java.sql.*;

//Set up a connection and a drivermanager.
Class.forName("com.mysql.jdbc.Driver").newInstance();
      Connection conn;
      conn = DriverManager.getConnection("jdbc:mysql://" + session.getVariable("MYSQL_SERVER_URL") + ":"+session.getVariable("MYSQL_SERVER_PORT") + "/" +  session.getVariable("MYSQL_DATABASE"), session.getVariable("MYSQL_SERVER_USER"), session.getVariable("MYSQL_SERVER_PASSWORD"));

//Set extracted variables to local variables.
//Depending on when your script is executed
// you may have variables in session scope
// and others as dataRecords.
value1 = session.getVariable("value1");
value2 = session.getVariable("value2");
value3 = dataRecord.get("value3");
value4 = dataRecord.get("value4");

//Create statements and run queries
// on your database.
Statement stmt = null;
stmt = conn.createStatement();

      mysqlstring="INSERT IGNORE INTO TABLE_NAME (column1, column2, column3, column4) VALUES('"+value1+"','"+ value2 + "','"+value3+"','" + value4 +"')";
      stmt.executeUpdate(mysqlstring);

//Be sure to close up your
// statements and connection.
stmt.close();
conn.close();

Writing extracted data to a database

Overview

Oftentimes once you've extracted data from a page you'll want to write it to a database. Screen-scraper contains a special SqlDataManager class that makes this easy.

This script uses objects and methods that are only available in the professional and enterprise editions of screen-scraper.

To use the SqlDataManager class you'll generally follow these steps:

  1. To start, download the appropriate JDBC Driver connector Jar file for your particular database and place it in the lib/ext folder where screen-scraper is installed.
  2. Create an instance of SqlDataManager in a script.
  3. Build the database schema and any foreign key relations.
  4. Store the SqlDataManager in a session variable.
  5. Extract data.
  6. In a script, get a reference to the SqlDataManager object stored in step 3, then call addData to build the rows in your database tables.
  7. Once all data for a row has been added to the SqlDataManager object, call the commit method.
  8. After committing all data related to a record - which can include multiple rows across multiple tables, call the flush method to write the record to the database.
  9. Repeat steps 4 - 7 as many times as you'd like.
  10. In a script, get a reference to the SqlDataManager object, then call the close method on it.

The trickiest part is understanding when to call the commit method when writing to related tables.

Examples

If you're scripting in Interpreted Java and using a MySQL database, the script for steps 1-3 might look something like this:

// Create an instance of the SqlDataManger class.
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

String hostpath = "localhost"; //Location of the database
String database = "database_name"; //The name of the database
String username = "your_username"; //Put your database username here
String password = "your_password"; //Put the password here
String port = "3306";
String dbparams = "autoReconnect=true&useCompression=true";

SqlDataManager dm;

try
{
  // Connect to database using a BasicDataSource
  BasicDataSource ds = new BasicDataSource();
  ds.setDriverClassName( "com.mysql.jdbc.Driver" );
  ds.setUsername( username );
  ds.setPassword( password );
  ds.setUrl( "jdbc:mysql://" + hostpath + ":" + port + "/" + database + "?" + dbparams );
  ds.setMaxActive( 100 );
 
  // Create Data Manager
  dm = new SqlDataManager( ds, session );
  dm.setLoggingLevel( org.apache.log4j.Level.DEBUG );
 
  // Call this to have the data manager read information about the database, such as what tables exist,
  // what fields they have, and how they relate to other tables in the database.
  dm.buildSchemas();
 
  // Setup the foreign key relationships, if needed
  // If the database had this relationship built-in (InnoDB engine only), we wouldn't have to add this here.
  // buildSchemas() would have added it when it read the database structure.
  dm.addForeignKey( "child_table", "child_field", "parent_table", "parent_field" );
 
  // Set the data manager to overwrite values in the table but not write NULL over values that are already there.
  dm.setGlobalUpdateEnabled( true );
  dm.setGlobalMergeEnabled( true );
 
  // Save the SqlDataManager object in a session variable
  session.setVariable( "_DBMANAGER", dm );
}
catch (Exception e)
{
  session.logError( "Database initialization error" );
  session.stopScraping();
}

Note that if you are using a database other the MySQL, the only change to this script will be the String passed to the setUrl method of the BasicDataSource.

In subsequent scripts, you can get a reference to that same SqlDataManager object like this:

dm = session.getVariable( "_DBMANAGER" );

You could then add data to the data manager. The following examples demonstrate various ways to go about that. Each of the scripts assume you already created an SqlDataManager object in a previous script and saved it to the session variable _DBMANAGER.

Saving to a single table using a data record

// Import the classes we'll need.
import com.screenscraper.datamanager.sql.SqlDataManager;

// Get the data manager from the session variable it is stored in
SqlDataManager dm = session.getVariable( "_DBMANAGER" );

// Add the current data record to the table.
// All values whose key in the data record match (case in-sensitive) a column in the table will be saved.
dm.addData( "people", dataRecord );

// Once all data has been added to the table, it should be committed
dm.commit( "people" );

// Flush the data so it is written to the database
dm.flush();

If the data record saved above had key-value pairs:
NAME = John Doe
AGE = 37
WEIGHT = 160
and the table 'people' had columns 'name', 'age', and 'gender', the script above would produce the following row in the people table.

+----------+-----+--------+
| name     | age | gender |
+----------+-----+--------+
| John Doe | 37  | NULL   |
+----------+-----+--------+

Saving to a single table manually

// Import the classes we'll need.
import com.screenscraper.datamanager.sql.SqlDataManager;

// Get the data manager from the session variable it is stored in
SqlDataManager dm = session.getVariable( "_DBMANAGER" );

// Add data to the table.
dm.addData( "people", "name", "John Doe" );
dm.addData( "people", "age", "37" );
dm.addData( "people", "gender", session.getVariable("GENDER") );

// Once all data has been added to the table, it should be committed
dm.commit( "people" );

// Add another row of data to the table.
// As long as the previous data has been committed, this data will be added to a new row instead of overwriting previous data
dm.addData( "people", "name", "Sally Doe" );
dm.addData( "people", "gender", "female" );

// Flush the data so it is written to the database
dm.flush();

If the session variable GENDER had the value male and the table structure was the same as in the example above, this script would produce the following rows in the people table.

+-----------+------+--------+
| name      | age  | gender |
+-----------+------+--------+
| John Doe  | 37   | male   |
+-----------+------+--------+
| Sally Doe | NULL | female |
+-----------+------+--------+

Note that you can mix the two methods shown above. Data can be added from multiple data records and/or manually for the same row.

Saving to multiple tables that are related.

This example assumes that you have a table in the database named people with fields 'id' (primary key/autoincrement), 'name', and 'address', and another table named phones with fields 'person_id', 'phone_number'.

Also, there is a foreign key relation between person_id in phones and id in people. This can be setup either in the database or when setting up the datamanger and calling the addForeignKey method.

In order to make it easier to see inserted values, all calls to addData in this example will enter data manually. In many cases, however, adding a data record is much easier.

Also, remember that data does not have to be added and committed all at once. Usually tables with a parent/child relation will have one script called after each pattern match of an extractor pattern that adds and commits a row of child data, and then a separate script called elsewhere to add and commit the parent data.

// Import the classes we'll need.
import com.screenscraper.datamanger.sql.SqlDataManager;

// Get the data manager from the session variable it is stored in
SqlDataManager dm = session.getVariable( "_DBMANAGER" );

// Add multiple phone numbers that will relate to John Doe
// Maybe he has a cell phone, home phone, and work phone
// After adding each row of data (simply a phone number in this case),
// commit the data so we can begin adding a new row.
dm.addData( "phones", "phone_number", "(123) 456-7890" );
dm.commit( "phones" );
dm.addData( "phones", "phone_number", "(800) 555-7777" );
dm.commit( "phones" );
dm.addData( "phones", "phone_number", "(333) 987-6543" );
dm.commit( "phones" );

// Now add the parent table's data and commit it
dm.addData( "people", "name", "John Doe" );
dm.addData( "people", "address", "123 Someplace Drv, Cityville, WY 12345" );
dm.commit( "people" );

// Add multiple phone numbers that will relate to Sally Doe
dm.addData( "phones", "phone_number", "(321) 654-0987" );
dm.commit( "phones" );
dm.addData( "phones", "phone_number", "(333) 987-6543" );
dm.commit( "phones" );

// Now add the parent table's data and commit it
dm.addData( "people", "name", "Sally Doe" );
dm.addData( "people", "address", "123 Someplace Drv, Cityville, WY 12345" );
dm.commit( "people" );

// Flush the data so it is written to the database
dm.flush();

Note the order in which tables were committed. All data in child tables must be committed before the data in the parent table.

This script would produce the following rows in the database:

+---------------------------------------------------------+
|                         people                          |
+----+-----------+----------------------------------------+
| id | name      | address                                |
+----+-----------+----------------------------------------+
| 1  | John Doe  | 123 Someplace Drv, Cityville, WY 12345 |
+----+-----------+----------------------------------------+
| 2  | Sally Doe | 123 Someplace Drv, Cityville, WY 12345 |
+----+-----------+----------------------------------------+

+----------------------------+
|           phones           |
+-----------+----------------+
| person_id | phone_number   |
+-----------+----------------+
| 1         | (123) 456-7890 |
+-----------+----------------+
| 1         | (800) 555-7777 |
+-----------+----------------+
| 1         | (333) 987-6543 |
+-----------+----------------+
| 2         | (321) 654-0987 |
+-----------+----------------+
| 2         | (333) 987-6543 |
+-----------+----------------+

The SqlDataManager takes care of filling in the data for the related fields. We never had to add the data for the person_id column in the phones table. Since id in people is an autoincrement field, we didn't have to add data for that field either.

Close the data manager

Once all data has been written to the database, close the data manager like this:

// Get the stored data manager object
dm = session.getVariable( "_DBMANAGER" );

// Flushing the data here is optional, but if any data hasn't been written yet
// it will not be written when close() is called, and will be lost.
dm.flush();

// Close the datamanager
dm.close();

Automatically link many to many relations (Advanced)

The SqlDataManager can be set to automatically link data connected in a many-to-many relation. To enable this feature, use the following code:

dm.setAutoManyToMany( true );

When this setting is enabled, the data manager will attempt to relate data across multiple tables when possible. For example, if there is a people table, an address table, and a person_has_address table used to relate the other two tables, you would only need to insert data into the people and addresses tables. The data manager would then link the person_has_address table in since it has foreign keys relating it to both people and addresses. See the example below.

import com.screenscraper.datamanager.sql.SqlDataManager;

/*
Perform the setup of the SqlDataManager, as shown previously, and name the variable dm.
Also use a duplicate filter (see example below) to check for duplicate addresses
*/

// The setAutoManyToMany method must be called before any data is added to the data manager for the first time.
dm.setAutoManyToMany( true );

// Everything beyond this point might appear in a script other than the initialization script
dm.addData( "people", "name", "John" );
dm.addData( "addresses", "address", "123 Street" );
dm.commit( "addresses" );
dm.addData( "addresses", "address", "456 Drive" );
dm.commit( "addresses" );
dm.commit( "people" );

dm.addData( "people", "name", "Sally" );
dm.addData( "addresses", "address", "123 Street" );
dm.commit( "addresses" );
dm.commit( "people" );

This would produce the following result:

+-------------------+
|       people      |
+-----------+-------+
| person_id | name  |
+-----------+-------+
| 1         | John  |
+-----------+-------+
| 2         | Sally |
+-----------+-------+

+-------------------------+
|        addresses        |
+------------+------------+
| address_id | address    |
+------------+------------+
| 1          | 123 Street |
+------------+------------+
| 2          | 456 Drive  |
+------------+------------+

+------------------------+
|   person_has_address   |
+-----------+------------+
| person_id | address_id |
+-----------+------------+
| 1         | 1          |
+-----------+------------+
| 1         | 2          |
+-----------+------------+
| 2         | 1          |
+-----------+------------+

Filtering Duplicate Entries (Advanced)

When extracting data that will contain many duplicate entries, it can be useful to filter values so that duplicate entries are not written to the database multiple times. The data manager can use a duplicate filter to check data being added to the database against data that is added, and either update or ignore duplicates. This is accomplished with an SqlDuplicateFilter object. To create a duplicate filter, call the SqlDuplicateFilter.register method, set the parent table it checks for duplicates on, and then add the constraints that indicate a duplicate. See the code below for an example of how to filter duplicates on a person table.

import com.screenscraper.datamanager.sql.SqlDuplicateFilter;

/*
Perform the setup of the SqlDataManager, as shown previously, and name the variable dm.
*/


//register an SqlDuplicateFilter with the DataManager for the social security number
SqlDuplicateFilter ssnDuplicate = SqlDuplicateFilter.register( "person", dm );
ssnDuplicate.addConstraint( "person", "ssn" );

//register an SqlDuplicateFilter with the DataManager for the drivers license number
SqlDuplicateFilter licenseDuplicate = SqlDuplicateFilter.register( "person", dm );
licenseDuplicate.addConstraint( "person", "drivers_license" );

//register an SqlDuplicateFilter with the DataManager for the name/phone number
//where the person table has a child table named phone.
SqlDuplicateFilter namePhoneDuplicate = SqlDuplicateFilter.register( "person", dm );
namePhoneDuplicate.addConstraint( "person", "first_name" );
namePhoneDuplicate.addConstraint( "person", "last_name" );
namePhoneDuplicate.addConstraint( "phone", "phone_number" );

Duplicate filters are checked in the order they are added, so consider perfomance when creating duplicate filters. If, for instance, most duplicates will match on the social security number, create that filter before the others. Also make sure to add indexes into your database on those columns that you are selecting by or else performance will rapidly degrade as your database gets large.

Duplicates will be filtered by any one of the filters created. If multiple fields must all match for an entry to be a duplicate, create a single filter and add each of those fields as constraints, as shown in the third filter created above. In other words, constraints added to a single filter will be ANDed together, while seperate filters will be ORed.