API

Overview

When writing scripts within screen-scraper, there are a number of objects and methods available to you. The Using Scripts page provides an overview of working with scripts, where this page provides details on specific objects and methods you'll use when scripting within screen-scraper.

The API documentation emphasizes Interpreted Java as Java is the language in which screen-scraper proper is written. That should not deter you from using whatever language you desire; all the methods are available in what ever language you choose.

The examples given here assume you're using Interpreted Java as the scripting language, but there should be very little difference in syntax if you decide to use another language. For example, if you're scripting in VBScript, you would simply omit the semi-colon at the end of each line, and for methods that don't return a value you would precede them with the VBScript keyword Call (either that, or omit the parentheses around the method parameters).

screen-scraper Object APIs

The screen-scraper, internal API has been divided into three groups for convenience.

Scraping Engine: Request, parse, manipulate, and store data according to user defined processes.
- dataRecord
- dataSet
- log
- session
- scrapeableFile
- RunnableScrapingSession (com.screenscraper.scraper)
- sutil
Proxy Server: Manipulate browser-server interactions to filter, track, or otherwise control the experience of the user.
Utilities: Helpful screen-scraper objects for processing and storing data.
- CsvWriter (com.screenscraper.csv)
- DataManagerFactory (com.screenscraper.datamanager)
- ProxyServerPool (com.screenscraper.util)
- RetryPolicy (com.screenscraper.util.retry)
- SqlDataManager (com.screenscraper.datamanager)
- XmlWriter (com.screenscraper.xml)

The two main groups are the scraping engine and the proxy server. The various objects available in these sections are exclusive to running screen-scraper for in one of these two ways. The one exception is the RunnableScrapingSession which has been grouped with the scraping engine simply because it is unlikely to be needed or used with the proxy server.

The utilities are available to scripts run in either the scraping engine or the proxy server and have since been separated from both. These represent classes that we have written to simplify some common tasks that are performed with retrieved data.

Java Libraries/Classes of Note

There are many additional classes that are available through Java Libraries that we did not create/modify that are especially worthy of note. Regardless of the language that you are using to program in screen-scraper you can have access to these.

CSVReader (au.com.bytecode.opencsv)
Apache Lang Library (org.apache.commons.lang3)

Other screen-scraper APIs

There are a few other APIs to be aware of. They are particular to dealing with screen-scraper in certain ways or certain versions. Make sure that you understand the implications of using these APIs before you start playing with them.

REST Interface: Issue commands to screen-scraper through GET requests to the server.
Anonymization REST Interface: Configure and run anonymous scrapes through the REST Interface.
Alpha Version: Methods and objects that have been introduced to screen-scraper since the last stable release.

Scraping Engine API

Overview

The scraping engine is the backbone of screen-scraper and provides four built-in objects. These objects are: session, scrapeableFile, dataSet, and dataRecord. We have also included the RunnableScrapingSession class as it best pertains to the engine.

For details on which objects are available to scripts in the context of a scrape see the variable scope section of the documentation.

Objects

dataRecord: This gives access to the most recently extracted data record. This will most likely only be used in scripts that get accessed after each time an extractor pattern is applied. This object simply extends Hashtable, and documentation on the Hashtable's methods can be found in Java's documentation.

The dataRecord object is populated using the names of tokens from extractor patterns.
dataSet: The dataSet object holds all data records extracted by an extractor pattern after it has been applied as many times as possible to the HTML retrieved by a scrapeable file. A data set is analogous to a result or record set that would be returned from a database query. A data set contains any number of data records, which are analogous to rows in a database.
log: Methods used for logging information.
RunnableScrapingSession (com.screenscraper.scraper.RunnableScrapingSession): This is a class that can be instantiated within a script in order to run a scraping session. The Maximum number of concurrent running scraping sessions in the settings dialog box will control how many scraping sessions can be run simultaneously.
scrapeableFile: This refers to the scrapeable file that is currently being requested and analyzed.
session: This variable refers to the currently running scraping session.
sutil: General methods for checking and manipulating data.

dataRecord

Overview

This object gives access to the most recently extracted data record. This will most likely only be used in scripts that get accessed after each time an extractor pattern is applied. This object simply extends Hashtable (documentation on its methods can be found in Java's documentation).

The dataRecord is populated using the token names in the extractor patterns. You'll find a few of the most commonly used methods below. DataRecord objects can also be created from scratch, and subsequently added to DataSet objects using the addDataRecord method.

See example usage: Iterate over DataSets & DataRecords.

DataRecord

DataRecord DataRecord ( )

Description

Create a new DataRecord object.

Parameters

This method does not receive any parameters.

Return Values

Returns DataRecord object.

Change Log

Version	Description
4.5	Available for all editions.

Class Location

com.screenscraper.common.DataRecord

Examples

Create New DataRecord

// Create a new DataRecord object.
myDataRecord = new DataRecord();

// Populate it with a few fields.
myDataRecord.put( "CITY", "Los Angeles" );
myDataRecord.put( "ZIP", "90001" );
myDataRecord.put( "STATE", "CA" );

// Add it to an existing dataSet object.
dataSet.addDataRecord( myDataRecord );

See additional example usage: Iterate over DataSets & DataRecords.

get

Object dataRecord.get ( Object key )

Description

Get the value of a DataRecord field.

Parameters

key The name of the associative key, usually a string but, if you have manually added fields, it can be an integer, etc.

Return Values

Returns the value associated with the specified key. Usually it will be a string but, if you have manually added fields, it can be an integer, boolean, long, or other object.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Retrieve DataRecord Information

// Gets the value of the "CITY" field
// and outputs it to the log.

city = dataRecord.get( "CITY" );
session.log( "City: " + city );

put

Object dataRecord.put ( Object key, Object value )

Description

Add a new field to the DataRecord or update the value of an existing field.

Parameters

key The name of the associative key, usually a string but, if you have manually added fields, it can be an integer, etc.
value The new value to be associated with the key.

Return Values

Returns the value previously associated with the specified key. If the key did not exist then it will return null.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Add/Change DataRecord Field

// Adds a field called "CITY" with
// the value "Los Angeles".

dataRecord.put( "CITY", "Los Angeles" );

See additional example usage: Iterate over DataSets & DataRecords.

remove

Object dataRecord.remove ( Object key )

Description

Remove a field from the DataRecord.

Parameters

key The name of the associative key, usually a string but, if you have manually added fields, it can be an integer, etc.

Return Values

Returns the value previously associated with the specified key. If the key did not exist then it will return null.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Add/Change DataRecord Field

// Removes the "CITY" field from the dataRecord.
dataRecord.remove( "CITY" );

dataSet

Overview

The dataSet object holds all data records extracted by an extractor pattern after it has been applied as many times as possible to the HTML retrieved by a scrapeable file. A data set is analogous to a result or record set that would be returned from a database query. A data set contains any number of data records, which are analogous to rows in a database.

The dataSet object provides methods to aid in getting at the information that has been gathered.

See example usage: Iterate over DataSets & DataRecords.

DataSet

DataSet DataSet ( void )
DataSet DataSet ( ArrayList dataRecords )

Description

Manually create a DataSet.

Parameters

dataRecords (optional) Java ArrayList of DataRecord elements.

Return Values

Returns DataSet object.

Change Log

Version	Description
4.5	Available for all editions.

Class Location

com.screenscraper.common.DataSet

Examples

Manually Create DataSet

// Create DataSet
myDataSet = new DataSet();

// Create DataRecord
myDataRecord = new DataRecord();
myDataRecord.put( "STATE", "AZ");

// Add DataRecord to DataSet
myDataSet.addDataRecord( myDataRecord );

Create DataSet from Array List

// Create Array List
ArrayList dataRecords = new ArrayList();

// Create DataRecord
myDataRecord = new DataRecord();
myDataRecord.put( "STATE", "AZ");

// Add DataRecord to "dataRecords" DataSet
dataRecords.add( myDataRecord );

// Create DataSet From ArrayList.
myDataSet = new DataSet( dataRecords );

See additional example usage: Iterate over DataSets & DataRecords.

addDataRecord

void dataSet.addDataRecord ( DataRecord dataRecord )

Description

Add a DataRecord to a DataSet.

Parameters

dataRecord A DataRecord object.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Add Data Record to DataSet

// Create DataSet
myDataSet = DataSet();

// Create DataRecord
myDataRecord = new DataRecord();
myDataRecord.put( "STATE", "AZ");

// Add DataRecord to DataSet
myDataSet.addDataRecord( myDataRecord );

clearDataRecords

void dataSet.clearDataRecords ( )

Description

Remove all DataRecord objects from the DataSet.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Remove DataRecords from DataSet

// Removes all DataRecord objects from the dataSet object.
dataSet.clearDataRecords();

See additional example usage: Iterate over DataSets & DataRecords.

deleteDataRecord

void dataSet.deleteDataRecord ( int dataRecordNumber )

Description

Remove a DataRecord from the DataSet.

Parameters

dataRecordNumber Index of the DataRecord in the DataSet, as an integer. Remember that the DataRecords set is zero based and so the first DataRecord would be at the index of zero.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Remove one DataRecords from DataSet

// Deletes the third data record in the set. Remember that data sets
// are zero-based.

dataSet.deleteDataRecord( 2 );

findValue

Object dataSet.findValue ( String valueToFind, String columnToMatch, String columnToReturn )

Description

Retrieve a field's value in a data set based on another field.

Parameters

valueToFind Value being looked for, as a string.
columnToMatch Column/token name where the value is being searched for, as a string.
columnToReturn Column/token name whose value should be returned, as a string.

Return Values

Returns the value in the returned column, usually a string (unless records have been manually added). If no match is found, null is returned.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Get Value of Token based on Another Token

// Create new DataSet
DataSet myDataSet = new DataSet();

// Create DataRecords<
DataRecord john = new DataRecord();
john.put("FIRST_NAME", "John");
john.put("LAST_NAME", "Doe");

DataRecord jill = new DataRecord();
jill.put("FIRST_NAME", "Jill");
jill.put("LAST_NAME", "Smith");

// Add dataRecords to dataSet
myDataSet.addDataRecord(john);
myDataSet.addDataRecord(jill);

// Search dataSet for "John" in the "FIRST_NAME"
// field. Return the value of the "LAST_NAME" in
// the same record
String result = myDataSet.findValue("John", "FIRST_NAME", "LAST_NAME");

// Write result to log
session.log(result); // Logs "Doe"

get

Object dataSet.get ( int dataRecordNumber, String identifier )

Description

Get a single piece of data held by a DataRecord in the DataSet.

Parameters

dataRecordNumber Index of the DataRecord in the DataSet, as an integer. Remember that the DataRecords set is zero based and so the first DataRecord would be at the index of zero.
identifier The name of the element to retrieve from the DataRecord, as a string.

Return Values

Returns the value associated with the DataRecord identifier. It will be a string unless you have added values to the DataRecord whose values are not strings.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Get Token Value From DataRecord

// Gets the value "CITY_CODE" from the first data record in the
// data set.

firstCityCode = dataSet.get( 0, "CITY_CODE" );

getAllDataRecords

ArrayList dataSet.getAllDataRecords ( )

Description

Get all DataRecords in the DataSet.

Parameters

This method does not receive any parameters.

Return Values

Returns an ArrayList of DataRecord objects.

Change Log

Version	Description
4.5	Available for all editions.

This method is provided as a convenience, the recommended way to iterate over data records in a data set is to use getNumDataRecords and getDataRecord.

Examples

Loop Through DataRecords

// Stores all of the data records in the variable allData.
allData = dataSet.getAllDataRecords();

// Loop through each of the data records.
for( i = 0; i < allData.size(); i++ )
{
// Store the current data record in the variable myDataRecord.
myDataRecord = allData.get( i );

// Output the "PRODUCT_NAME" value from the data record to the log.
session.log( "Product name: " + myDataRecord.get( "PRODUCT_NAME" ) );
}

getCharacterSet

String dataSet.getCharacterSet ( )

Description

Get the character set being applied the scraped data.

Parameters

This method does not receive any parameters.

Return Values

Returns the character set applied to the scraped data, as a string. If a character set has not been specified then it will default to the character set specified in settings dialog box.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Get Character Set

// Get the character set of the dataSet
charSetValue = dataSet.getCharacterSet();

getDataRecord

DataRecord dataSet.getDataRecord ( int dataRecordNumber )

Description

Get one DataRecord in the DataSet.

Parameters

dataRecordNumber Index of the DataRecord in the DataSet, as an integer. Remember that the DataRecords set is zero based and so the first DataRecord would be at the index of zero.

Return Values

Returns a DataRecord (Hashtable object). If there is not a DataRecord at the specified index an error will be thrown.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Get DataRecords in a Loop

// Loop through each of the data records.
for( i = 0; i < dataSet.getNumDataRecords(); i++ )
{
// Store the current data record in the variable myDataRecord.
myDataRecord = dataSet.getDataRecord( i );

// Output the "PRODUCT_NAME" value from the data record to the log.
session.log( "Product name: " + myDataRecord.get( "PRODUCT_NAME" ) );
}

getFirstValueForKey

Object dataSet.getFirstValueForKey (String key )

Description

Get the first non-null value, in a data set, for a given token.

Parameters

key Name of the column whose value is returned, as a string.

Return Values

Returns the first non-null value in the column, usually a string (unless records have been manually added). If none is found, null is returned.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Get First Non-null Token Value

// Gets the value of the first "CITY_CODE" in the
// data set.

fieldValue = dataSet.getFirstValueForKey("CITY_CODE");

getNumDataRecords

int dataSet.getNumDataRecords ( )

Description

Get the number of DataRecords in the DataSet.

Parameters

This method does not receive any parameters.

Return Values

Returns the number of DataRecords in the DataSet, as an integer.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Get the Number of DataRecords in the DataSet

join

void dataSet.join ( DataSet dataSet )

Description

Merge data records from two data sets.

Parameters

dataSet Data set whose records are to be merged.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Merge DataRecords from DataSets

// Create dataSet
DataSet dataSet = new DataSet();

// Load dataSet with information
for (i = 0; i < 3; ++i)
{
DataRecord record = new DataRecord();
record.put("DATA_SET_ONE", i);
dataSet.addDataRecord(record);
}

// Create another dataSet
DataSet anotherDataSet = new DataSet();

// Load dataSet with information
for (i = 0; i < 2; ++i)
{
DataRecord record = new DataRecord();
record.put("DATA_SET_TWO", i);
anotherDataSet.addDataRecord(record);
}

// Join DataSets
dataSet.join(anotherDataSet);

// Write merged DataSet to Log (in dataRecords)
for (i = 0; i < dataSet.getNumDataRecords(); ++i)
{
DataRecord record = dataSet.getDataRecord(i);
session.log("DataRecord " + i + ": " + record.toString());
}

// Log Output:
// DataRecord 0: {DATA_SET_TWO=0, DATA_SET_ONE=0}
// DataRecord 1: {DATA_SET_TWO=1, DATA_SET_ONE=1}
// DataRecord 2: {DATA_SET_ONE=2}

setCharacterSet

void dataSet.setCharacterSet ( String characterSet )

Description

Set the character set to be used for rendering dataSet values.

Parameters

characterSet Java recognized character set, as a string. Java provides a list of supported character sets in its documentation.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

This will only change the character set on the current data set. If you want it to be changed for all data sets, you would need to change it in the settings dialog box or screen-scraper.properties file.

Examples

Set Character Set

// Set the character set of the dataSet
dataSet.setCharacterSet("UTF-8");

size

int dataSet.size ( )

Description

Get the number of DataRecords in the DataSet.

Parameters

This method does not receive any parameters.

Return Values

Returns the number of DataRecords in the DataSet, as an integer.

Change Log

Version	Description
6.0.3a	Available for all editions.

Examples

Get the Number of DataRecords in the DataSet

// Loop through each of the data records.
for( i = 0; i < dataSet.size(); i++ )
{
// Store the current data record in the variable myDataRecord.
myDataRecord = dataSet.getDataRecord( i );

// Output the "PRODUCT_NAME" value from the data record to the log.
log.info( "Product name: " + myDataRecord.get( "PRODUCT_NAME" ) );
}

writeToFile

void dataSet.writeToFile ( String fileName ) (professional and enterprise editions only)

Description

Write DataSet string and integer contents to a file. The fields will be tab-delimited and records hard-return delimited.

Parameters

fileName File path where the contents of the DataSet should be written. If the file already exists the contents will be appended to the file.

Return Values

Returns void. If the file cannot be written to then an error will be thrown.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Write DataSet Contents to a File

// Writes the data found in the current data set to the file
// "extracted_data.txt".

dataSet.writeToFile( "C:/site_data/extracted_data.txt" );

log

Overview

This object contains various methods used to log information about a running scraping session to log files, the workbench "Log" tab, and the web interface.

addAutoProgressBar

void log.addAutoProgressBar ( String name, String ... values ) (enterprise edition only)
void log.addAutoProgressBar ( String name, String[][] values ) (enterprise edition only)
void log.addAutoProgressBar ( String name, Collection<String> values ) (enterprise edition only)
void log.addAutoProgressBar ( String name, String[][] values, int keyIndex ) (enterprise edition only)
void log.addAutoProgressBar ( String name, DataSet values, String key ) (enterprise edition only)

Description

Creates an automatic progress bar and adds it to the progress bars. These progress bars match their progress to a value from a session variable and a list of values. When web messages are output with the webDebug, webInfo, webWarn, or webError methods, a progress bar will be drawn to give a visual representation of the current progress of the scrape.

Note that when using auto progress bars, it is advised to not use any manually monitored ones, as it can cause conflicts. Anytime an auto progress bar has no session variable set for its monitored key, it deletes itself and all children progress bars (including manual ones). As long as you keep that in mind, it should be safe to use both types together.

Parameters

name The name of the progress bar, which should match the session variable where the value for updating this bar will be stored
values The values this progress bar can have, in the order they will be queried. For example, if the session variable can be "1" or "2", the values should also be "1" and "2"
keyIndex (optional) The index in each inner array of the value that will be set in the session variable. This is only applicable when a 2D array is given. When a 2D array is given but no index is given, 0 is used.
key (optional) The key in the DataRecords that will be used for the session variable matching name. Used only with the DataSet method option

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.31a	Available in enterprise edition.
5.5.43a	Moved from session to log class.

Examples

Create an auto progress bar to track the search

// Searching over each letter of the alphabet
String[] letters = new String[26];
for(char c = 'a'; c <= 'z'; c++)
letters[ c - 'a' ] = "" + c;

// Using this approach is more convenient when values will get changed in various scripts
log.addAutoProgressBar("SEARCH_LETTER", letters);

for(int i = 0; i < letters.length; i++)
{
session.setVariable("SEARCH_LETTER", letters[ i ]);
session.scrapeFile("Search");
log.logMonitoredValues("Completed Letter");
}

addMonitoredPostfix

void log.addMonitoredPostfix ( String postfix ) (enterprise edition only)

Description

Watches for all session variables whose keys end with the postfix specified, and will output their values when monitored variables are logged.

Parameters

postfix The postfix to monitor

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.42a	Moved from session to log class.

Examples

Watch all variables ending with _PARAM and log their values

log.addMonitoredPostfix("_PARAM");

// Log the current value of all session variables whose name end with _PARAM
log.logMonitoredValues();

addMonitoredPrefix

void log.addMonitoredPrefix ( String prefix ) (enterprise edition only)

Description

Watches for all session variables whose keys begin with the prefix specified, and will output their values when monitored variables are logged.

Parameters

prefix The prefix to monitor

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.42a	Moved from session to log class.

Examples

Watch all variables starting with SEARCH_ and log their values

log.addMonitoredPrefix("SEARCH_");

// Log the current value of all session variables whose name starts with SEARCH_
log.logMonitoredValues();

addMonitoredValue

Object log.addMonitoredValue ( String name, Object value ) (enterprise edition only)

Description

Adds a specific name and value to be logged with the web messages methods or logMonitoredValues method

Parameters

name The name for the value being monitored
value The value to associate with the given name

Return Value

The previous value associated with the name, or null if there wasn't one

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.42a	Moved from session to log class.

Examples

Add and log a value

// Setting a value this way will persist it across scripts.
// That way a future script could log the set, and any other values set.
log.addMonitoredValue("The dataSet", dataSet);

// Each time this method is called, it will log the above dataSet
log.logMonitoredValues();

addMonitoredVariable

void log.addMonitoredVariable ( String key ) (enterprise edition only)

Description

Watches the value of a session variable, and will output it each time monitored variables are output

Parameters

key The key in the session corresponding to a value

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.42a	Moved from session to log class.

Examples

Watch a variable and log its value

log.addMonitoredVariable("NAME");

// Log the current value of NAME, as well as any other currently monitored values
log.logMonitoredVariables();

addProgressBar

ProgressBar log.addProgressBar ( String title ) (enterprise edition only)
ProgressBar log.addProgressBar ( String title, String total ) (enterprise edition only)
ProgressBar log.addProgressBar ( String title, double total ) (enterprise edition only)
ProgressBar log.addProgressBarIfNotStopped ( String title ) (enterprise edition only)
ProgressBar log.addProgressBarIfNotStopped ( String title, String total ) (enterprise edition only)
ProgressBar log.addProgressBarIfNotStopped ( String title, double total ) (enterprise edition only)

Description

Adds a new progress bar. If no progress bar exists, this will be set as the root, otherwise it will be the child of the lowest progress bar. When web messages are output with the webDebug, webInfo, webWarn, or webError methods, a progress bar will be drawn to give a visual representation of the current progress of the scrape. The addProgressBarIfNotStopped versions remove the progress bar if the scrape has not been stopped, which is useful for determining when a scrape was stopped.

Parameters

title The title for the new progress bar
total (optional) The total for the new progress bar. This should be the total number of things this is tracking the progress of. For example, if used when iterating over each letter of the alphabet for a search, the total would be 26 (one for each letter).

Return Value

This method returns a reference to the new progress bar, which can be used to update the current progress

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.31a	Available in enterprise edition.
5.5.43a	Moved from session to log class.

Examples

Track the progress of a search over the alphabet

appendStatusMessage

boolean log.appendStatusMessage ( String message ) (enterprise edition only)

Description

Appends a status message to be displayed in the web interface.

Parameters

message The message to be appended.

Return Values

None

Change Log

Version	Description
5.5.32a	Available in Enterprise edition.
5.5.43a	Moved from session to log class.

Examples

Append a status message

if( scrapeableFile.getExtractorPatternTimedOut() )
{
log.appendStatusMessage( "Extractor pattern timed out." );
}

cacheFile

File log.cacheFile ( String outputFilenameAndPath, File fileToCache ) (professional and enterprise editions only)

Description

Adds a file to the cache. This can be used to add anything to the cache, from a text file to an image that was downloaded, or any other file that would be useful.

Parameters

outputFilenameAndPath The name of the file in the cache, including any directory it should be placed in
fileToCache The file that should be cached. This cannot be a directory

Return Value

A File that represents the cached file.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.43a	Moved from session to log class.

Examples

Cache a file

// Set the path in the first parameter so it will show up in a subdirectory in the final output
log.cacheFile("images/products/" + dataRecord.get("PRODUCT_NAME") + ".jpg", new File("output/downloadedImage.jpg"));

cacheScrapeableFile

File log.cacheScrapeableFile ( ScrapeableFile scrapeableFile ) (professional and enterprise editions only)

Description

Caches the HTML and headers of the scrapeable file. This will include both the request and response headers.

Parameters

scrapeableFile The scrapeable file to cache.

Return Value

A File that represents the cached file.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.43a	Moved from session to log class.

Examples

Cache the current file

// Note that this will cause a duplicate file, as with caching enabled this will happen automatically.
// It may be useful in some cases if file manipulation is going to be performed on the returned File
log.cacheScrapeableFile(scrapeableFile);

cacheText

File log.cacheText ( String name, String content, String encoding ) (professional and enterprise editions only)
File log.cacheText ( String name, String content ) (professional and enterprise editions only)

Description

Adds text to the cache. This will create a new text file in the cache and store the given content in it.

Parameters

name The name of the file in the cache, including any directory it should be placed in
content The content to place in the cache
encoding The encoding to use for the text, or null to use the default encoding for the session

Return Value

A File that represents the cached file.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.43a	Moved from session to log class.

Examples

Cache the extracted section for a DataRecord

log.cacheText("Datarecord.html", dataRecord.get("DATARECORD"), "UTF-8");

debug

void log.debug ( Object message )

Description

Write message to the log.

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for all editions.

When the workbench is running, this will be found under the log tab for the scraping
session. When screen-scraper is running in server mode, the message will get sent to the corresponding .log file found in screen-scraper's log folder. When screen-scraper is invoked from the command
line, the message will get sent to standard out.

Examples

Write to Log

// Sends the message to the log.
log.debug( "Inserting extracted data into the database." );

enableCaching

void log.enableCaching ( String description, boolean saveLogs, boolean zipCachedFiles ) (professional and enterprise editions only)

Description

Enables caching for this scrape. When caching is enabled, each time a scrapeable file is downloaded it will be saved to the file system. Once the session is completed the cache will be either zipped or the directory renamed, depending on the conditions that were specified when the cache was enabled. Optionally this will save the log files to the cached location, and will save everything from the error.log file that was added while the cache was enabled.

Parameters

description A description to use in the cached file name
saveLogs True if logs should be included in the cache
zipCachedFiles True if the cached files should be zipped once the scrape ends.

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.32a	Renamed from enableCache to enableCaching
5.5.43a	Moved from session to log class.

Examples

Cache the pages requested by the scrape

// No special description is needed, but we want logs to be saved, and the output to be a zipped file
log.enableCaching("", true, true);

endCaching

void log.endCaching ( ) (professional and enterprise editions only)

Description

Ends the caching for the scrape. This method will be called once all the scripts and files are run/scraped. It can be called in a script to end the caching early (thereby only caching a portion of the scrape). This only deals with saving downloaded content to the file system, not with reading it back in during a scrape.

Parameters

This method takes no parameters

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.32a	Renamed from endCache to endCaching.
5.5.43a	Moved from session to log class.

Examples

Cache the pages requested by the scrape

// End the cache manually before the scrape ends
log.endCaching();

error

void log.error ( Object message )

Description

Write message to the log.

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for all editions.

Examples

Write to Log

// Sends the message to the log.
log.error( "Inserting extracted data into the database." );

getCachingEnabled

boolean log.getCachingEnabled ( ) (professional and enterprise editions only)

Description

Returns whether or not the cache is enabled for the scrape. When enabled, it simply means that each ScrapeableFile will save the content it downloads from the server to the file system so it can be viewed later, generally for debugging purposes.

Parameters

This method takes no parameters

Return Value

Returns true if caching is currently enabled for this session

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.32a	Available enterprise and professional editions (Returns false in basic edition, but doesn't throw an Exception). Renamed from getCacheEnabled to getCachingEnabled.
5.5.43a	Moved from session to log class.

Examples

Log the cache state

if(log.getCachingEnabled())
{
session.log("Currently caching the session.");
}

getProgressBar

ProgressBar log.getProgressBar ( int index ) (enterprise edition only)
ProgressBar log.getProgressBar ( String title ) (enterprise edition only)

Description

Returns the progress bar specified. If the index if given, returns the progress bar at that index (0 is the root, 1 is the first child, etc...). If the title is given, returns the most recently added progress bar with the given title

Parameters

index (optional) The desired ProgressBar's index
title (optional) The title to search for

Return Value

The ProgressBar indicated, or null if none was found matching the required criteria

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.31a	Available in enterprise edition.
5.5.43a	Moved from session to log class.

Examples

Track the progress of a search over the alphabet

import com.screenscraper.util.ProgressBar;

ProgressBar bar = log.addProgressBar("Letter", 26);
for(char c = 'a'; c <= 'z'; c++)
{
session.setVariable("SEARCH_LETTER", c);
session.scrapeFile("Search");
bar.add(1);

// For Professional and Enterprise Editions
log.webInfo("Completed Search on: " + c);

// For Basic Edition ** Note that this method is available in Professional and Enterprise editions also
log.logMonitoredValues();
}

// Now that we have completed the search, remove the progress bar
log.removeProgressBar(bar);

// Increment the value of the Category progress bar (created in a separate script).
// It is generally recommended to save a reference as a session variable rather than using this method
log.getProgressBar("Category").add(1);

info

void log.info ( Object message )

Description

Write message to the log.

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for all editions.

Examples

Write to Log

// Sends the message to the log.
log.info( "Inserting extracted data into the database." );

log

void log.log ( Object message )

Description

Write message to the log.

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for all editions.

Examples

Write to Log

// Sends the message to the log.
log.log( "Inserting extracted data into the database." );

logDataRecord

void log.logDataRecord ( DataRecord record )
void log.logDataRecord ( DataRecord record, int logLevel )
void log.logDataRecordDebug ( DataRecord record ) (professional and enterprise editions only)
void log.logDataRecordInfo ( DataRecord record ) (professional and enterprise editions only)
void log.logDataRecordWarn ( DataRecord record ) (professional and enterprise editions only)
void log.logDataRecordError ( DataRecord record ) (professional and enterprise editions only)

Description

Logs all the values in a Data Record to the log, with one line per value. If a value in the record is a List, Set, Map, Data Set, Scrapeable File, or Exception, it will have detailed output as well.

Parameters

record The Data Record to output to the log
logLevel (optional) The level to log the data record at, as an int
Values are 1-Debug, 2-Info, 3-Warn, 4-Error, or can be obtained from com.screenscraper.common.Notifiable.LEVEL_(DEBUG/INFO/WARN/ERROR)
When omitted, the log level used is the session logging level.

Return Values

This method returns nothing

Change Log

Version	Description
5.5.26a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Log a Data Record

// Log a scraped data record before saving it to a database
log.logDataRecord(dataRecord);

The output from the above call might look something like this:

DataRecord
--- A_FLOAT : 3.14159
--- A_LIST : List
------ Element 0 : Value 1
------ Element 1 : Value 2
------ Element 2 : Value 3
------ Element 3 : Set
--------- Element : A value
--------- Element : More value
--------- Element : Other stuff
--- A_MAP : Map
------ KEY_1 : 1
------ KEY_2 : 2
------ KEY_3 : 3
--- A_SET : Set Logged above as "------ Element 3 : "
--- A_STRING : Screen-Scraper
--- AN_INT : 5

logException

void log.logException ( Exception exception )

Description

Logs an Exception, with a full stack trace, at the Error level

Parameters

exception The Exception to log

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Log an exception

try
{
int result = Integer.parseInt(dataRecord.get("SCRAPED_VALUE"));
}
catch(Exception e)
{
log.logException(e);
}

logMonitoredValues

void log.logMonitoredValues ( Object message )
void log.logMonitoredValues ( Object message, int logLevel )
void log.logMonitoredValuesDebug ( Object message ) (professional and enterprise editions only)
void log.logMonitoredValuesInfo ( Object message ) (professional and enterprise editions only)
void log.logMonitoredValuesWarn ( Object message ) (professional and enterprise editions only)
void log.logMonitoredValuesError ( Object message ) (professional and enterprise editions only)

Description

Logs the values of all the currently monitored variables, the progress of the scrape, if known, and puts the message at the top. Also logs any additional values being watched. Logs values at the specified level.

Parameters

message A message to output as a header for this log entry
logLevel (optional) The level to log at

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Log the currently monitored values and progress bars

log.logMonitoredValues("Record Saved");

logMonitoredValuesClose

void log.logMonitoredValuesClose ( Object message ) (professional and enterprise editions only)

Description

Parameters

message A message to output as a header for this log entry

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Log the currently monitored values at the end of the scrape

log.logMonitoredValuesClose("Scrape Completed");

logObjectByType

void log.logObjectByType ( Object object )
void log.logObjectByType ( Object object, int logLevel )
void log.logObjectByTypeDebug ( Object object ) (professional and enterprise editions only)
void log.logObjectByTypeInfo ( Object object ) (professional and enterprise editions only)
void log.logObjectByTypeWarn ( Object object ) (professional and enterprise editions only)
void log.logObjectByTypeError ( Object object ) (professional and enterprise editions only)

Description

Logs the Object in a semi intelligent way. For example, Maps are logged as key-value pairs, lists are logged with one element per line, all elements of a set are logged, etc... Some objects will just log their value using String.valueOf() if it isn't a standard type of data set/list

Parameters

object The Object to write to the log
logLevel (optional) The level to log the data record at, as an int

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Log the dataSet

log.logObjectByType(dataSet);

logScreenScraperInformation

void log.logScreenScraperInformation ( )

Description

Logs useful information about the current instance of Screen-Scraper, as well as the Java VM and the General Utility version being used. Information will be logged as an info message in the web interface (when running in server mode) and the log.

Parameters

This method takes no parameters

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Log Current Info

log.logScreenScraperInformation();

removeMonitoredPostfix

void log.removeMonitoredPostfix ( String postfix ) (enterprise edition only)

Description

Stops watching for a postfix in session variables

Parameters

postfix Postfix to remove from monitoring

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Stop watching and logging the value of all session variables ending with _PARAM

log.removeMonitoredPostfix("_PARAM");

removeMonitoredPrefix

void log.removeMonitoredPrefix ( String prefix ) (enterprise edition only)

Description

Stops watching for a prefix in session variables

Parameters

prefix Prefix to remove from monitoring

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Stop watching and logging the value of all session variables starting with SEARCH_

log.removeMonitoredPrefix("SEARCH_");

removeMonitoredValue

Object log.removeMonitoredValue ( String name ) (enterprise edition only)

Description

Removes a specific name from the manually set values to be logged. Doesn't affect the value of session variables

Parameters

name The name for the value being monitored

Return Value

The previous value associated with the name, or null if there wasn't one

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Remove a value so it won't be logged by logMonitoredValues

log.removeMonitoredValue("The dataSet");

removeMonitoredVariable

void log.removeMonitoredVariable ( String key ) (enterprise edition only)

Description

Stops watching the specified variable

Parameters

key Key for the variable to stop watching

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Moved from session to log class.

Examples

Stop watching and logging the value of the session varaible NAME

log.removeMonitoredVariable("NAME");

removeProgressBar

void log.removeProgressBar ( ProgressBar progressBar ) (enterprise edition only)
void log.removeProgressBarIfNotStopped ( ProgressBar progressBar ) (enterprise edition only)

Description

Removes the specified progress bar. The removeProgressBarIfNotStopped version removes the progress bar if the scrape has not been stopped, which is useful for determining when a scrape was stopped.

Parameters

progressBar The ProgressBar to remove

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.31a	Available in enterprise edition.
5.5.43a	Moved from session to log class.

Examples

Track the progress of a search over the alphabet

import com.screenscraper.util.ProgressBar;

ProgressBar bar = log.addProgressBar("Letter", 26);
for(char c = 'a'; c <= 'z'; c++)
{
log.setVariable("SEARCH_LETTER", c);
session.scrapeFile("Search");
bar.add(1);

// For Professional and Enterprise Editions
log.webInfo("Completed Search on: " + c);

// For Basic Edition ** Note that this method is available in Professional and Enterprise editions also
log.logMonitoredValues();
}

// Now that we have completed the search, remove the progress bar
log.removeProgressBar(bar);

warn

void log.warn ( Object message )

Description

Write message to the log.

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for all editions.

Examples

Write to Log

// Sends the message to the log.
log.warn( "Inserting extracted data into the database." );

webClose

void log.webClose ( Object object ) (professional and enterprise editions only)

Description

Logs closing values to indicate the scrape is complete and what values were when everything finished. It will log at whatever the highest level logged to was. For instance, if a webWarn had been logged during the scrape, this will log at the warning level. When running in Professional edition, this simply outputs to the log.

Using this method is preferred over logMonitoredValuesClose (which only logs to the log), because if at a later point the scrape is run in server mode for enterprise edition, a useful message is output in the web interface without needing to modify the scrape.

Parameters

object The message to display as a header

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.43a	Moved from session to log class.

Examples

Log monitored variables at the end of the scrape

log.webClose("Scrape Completed");

webDebug

void log.webDebug ( Object object ) (professional and enterprise editions only)
void log.webDebug ( Object object, boolean saveMessage ) (professional and enterprise editions only)
void log.webDebug ( Object object, Object loggable ) (professional and enterprise editions only)
void log.webDebug ( Object object, boolean saveMessage, Object loggable ) (professional and enterprise editions only)

Description

Logs a debug message to the web interface status message area. Uses the message header as the top of the message, and then logs all currently monitored session variables underneath as well as the current progress (if known) of the scrape. Also outputs the message to the log. When running in Professional edition, this simply outputs to the log.

Using this method is preferred over logMonitoredValues (which only logs to the log), because if at a later point the scrape is run in server mode for enterprise edition, a useful message is output in the web interface without needing to modify the scrape.

Parameters

object The message to display as a header
saveMessage (optional) Whether or not to save this message and continue to display it below future web messages. By default debug messages are not saved.
loggable (optional) An additional object to log, most likely a DataRecord. This will only be logged with this message, and not 'monitored' like other values

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.43a	Moved from session to log class.

Examples

Log monitored variables and progress

log.webDebug("Record Saved");

webError

void log.webError ( Object object ) (professional and enterprise editions only)
void log.webError ( Object object, boolean saveMessage ) (professional and enterprise editions only)
void log.webError ( Object object, Object loggable ) (professional and enterprise editions only)
void log.webError ( Object object, boolean saveMessage, Object loggable ) (professional and enterprise editions only)

Description

Logs an error message to the web interface status message area. Uses the message header as the top of the message, and then logs all currently monitored session variables underneath as well as the current progress (if known) of the scrape. Also outputs the message to the log. When running in Professional edition, this simply outputs to the log.

Using this method is preferred over logMonitoredValues (which only logs to the log), because if at a later point the scrape is run in server mode for enterprise edition, a useful message is output in the web interface without needing to modify the scrape.

Parameters

object The message to display as a header
saveMessage (optional) Whether or not to save this message and continue to display it below future web messages. By default error messages are saved.
loggable (optional) An additional object to log, most likely a DataRecord. This will only be logged with this message, and not 'monitored' like other values

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.43a	Moved from session to log class.

Examples

Log monitored variables and progress

log.webError("Record Saved");

webInfo

void log.webInfo ( Object object ) (professional and enterprise editions only)
void log.webInfo ( Object object, boolean saveMessage ) (professional and enterprise editions only)
void log.webInfo ( Object object, Object loggable ) (professional and enterprise editions only)
void log.webInfo ( Object object, boolean saveMessage, Object loggable ) (professional and enterprise editions only)

Description

Logs an info message to the web interface status message area. Uses the message header as the top of the message, and then logs all currently monitored session variables underneath as well as the current progress (if known) of the scrape. Also outputs the message to the log. When running in Professional edition, this simply outputs to the log.

Using this method is preferred over logMonitoredValues (which only logs to the log), because if at a later point the scrape is run in server mode for enterprise edition, a useful message is output in the web interface without needing to modify the scrape.

Parameters

object The message to display as a header
saveMessage (optional) Whether or not to save this message and continue to display it below future web messages. By default info messages are not saved.
loggable (optional) An additional object to log, most likely a DataRecord. This will only be logged with this message, and not 'monitored' like other values

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.43a	Moved from session to log class.

Examples

Log monitored variables and progress

log.webInfo("Record Saved");

webWarn

void log.webWarn ( Object object ) (professional and enterprise editions only)
void log.webWarn ( Object object, boolean saveMessage ) (professional and enterprise editions only)
void log.webWarn ( Object object, Object loggable ) (professional and enterprise editions only)
void log.webWarn ( Object object, boolean saveMessageloggable> ) (professional and enterprise editions only)

Description

Logs a warning message to the web interface status message area. Uses the message header as the top of the message, and then logs all currently monitored session variables underneath as well as the current progress (if known) of the scrape. Also outputs the message to the log. When running in Professional edition, this simply outputs to the log.

Using this method is preferred over logMonitoredValues (which only logs to the log), because if at a later point the scrape is run in server mode for enterprise edition, a useful message is output in the web interface without needing to modify the scrape.

Parameters

object The message to display as a header
saveMessage (optional) Whether or not to save this message and continue to display it below future web messages. By default warn messages are saved.
loggable (optional) An additional object to log, most likely a DataRecord. This will only be logged with this message, and not 'monitored' like other values

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.
5.5.43a	Moved from session to log class.

Examples

Log monitored variables and progress

log.webWarn("Record Saved");

RunnableScrapingSession

Overview

This is a class that can be instantiated within a script in order to run a scraping session.

Also see:

Using RunnableScrapingSesssion Class example scraping sessions
running scraping sessions within scraping sessions documentation

The Maximum number of concurrent running scraping sessions in the settings dialog box will control how many scraping sessions can be run simultaneously.

RunnableScrapingSession

RunnableScrapingSession RunnableScrapingSession ( String name ) (professional and enterprise editions only)
RunnableScrapingSession RunnableScrapingSession ( String name, ScrapingSession inheritedScrapingSession ) (professional and enterprise editions only)
RunnableScrapingSession RunnableScrapingSession ( String name, ScrapingSession inheritedScrapingSession, boolean inheritHttpState ) (professional and enterprise editions only)

Description

Initiates a RunnableScrapingSession object using the name of an existing scraping session.

Parameters

name The name of the scraping session to be run, as a string.
inheritedScrapingSession (optional) Scraping session whose session variables should be copied to the new scraping session. If it is left off no session variables will be passed to the new scrape.
inheritHttpState (optional) Whether HTTP state information, like cookies, should be inherited. This can be important if you have logged into a site and want the runnable scraping sessions to also be logged in.

Return Values

Returns a RunnableScrapingSession. On failure an error will be thrown.

Change Log

Version	Description
5.0	inheritHttpState added as optional parameter.
4.5	Available for professional and enterprise editions.

Class Location

com.screenscraper.scraper

Examples

Creating RunnableScrapingSession

// Creates a new runnable session for the scraping session "My Session".
myScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Creates a new runnable session for the scraping session "My Session"
// and passes it the current scraping session from which it will inherit
// session variables and logging.
myScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session", session );

Catching Error

// If you renamed a scrape and are worried about someone not having the new one
// you can use the thrown error to identify a problem that can be solved using
// the older name
try {
// Attempt to create scrape using the new name
myScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session - New" );
} catch ( error ) {
session.logWarn( error.toString() );
session.logWarn( "Attemping to start scrape with old name." );
myScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );
}

getName

String runnableScrapingSession.getName ( ) (professional and enterprise editions only)

Description

Retrieve the name of the scraping session in the runnableScrapingSession.

Parameters

This method does not receive any parameters.

Return Values

Returns a string with the name of the scraping session.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Retrieve Scrape Name

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Stores the name of the scraping session in the variable sessionName.
sessionName = runnableScrapingSession.getName();

getTimeout

int runnableScrapingSession.getTimeout ( ) (professional and enterprise editions only)

Description

Get the timeout of the session in the runnableScrapingSession.

Parameters

This method does not receive any parameters.

Return Values

Returns a integer representing the timeout length in minutes.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Write Timeout to Log

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Outputs the value of the timeout of the runnable scraping session
// to the log.
session.log( "Session timeout: " + runnableScrapingSession.getTimeout() );

getVariable

Object runnableScrapingSession.getVariable ( String variableName ) (professional and enterprise editions only)

Description

Retrieve the the value of a session variable. This method should be called after scrape method has returned.

Parameters

variableName Name of the variable, as a string.

Return Values

Returns the value of the session variable: object, boolean, int, string, etc. If the variable doesn't exists it returns null.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Write Variable to Log

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Ensure scrape will be run before the script continues
runnableScrapingSession.setDoLazyScrape( false );

// Start the scrape
runnableScrapingSession.scrape();

// Outputs the value of the variable FOO to the log.
session.log( "FOO: " + runnableScrapingSession.getVariable( "FOO" ) );

scrape

void runnableScrapingSession.scrape() (professional and enterprise editions only)

Description

Run the session scraping.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

The default is for the script to continue executing without waiting for the scraping session to finish. You can use setDoLazyScrape to force the script to wait until the scape finishes before continuing the script.

Examples

Start Scrape in Separate Thread

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Tells the session to start scraping.
runnableScrapingSession.scrape();

// Script continues execution without waiting for end of scrape

Start Scrape in Same Thread

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Turn off LazyScrape
runnableScrapingSession.setDoLazyScrape( false );

// Tells the session to start scraping.
runnableScrapingSession.scrape();

// Script halts execution until the scrape is finished

setDoLazyScrape

void runnableScrapingSession.setDoLazyScrape ( boolean doLazyScrape ) (professional and enterprise editions only)

Description

Indicate whether or not the scraping session should run concurrently with (at the same time as) other scraping sessions. The default for doLazyScrape is true.

Parameters

doLazyScrape If lazy (concurrent) scraping should be used, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

We recommend not setting this value to false! When running scraping sessions in the workbench, it will cause the interface to freeze up until sessions have completed.

If you'd like to run multiple scraping sessions serially (one after another), the best option is to set the Maximum number of concurrent running scraping sessions to 1 in the settings window.

Examples

Turn off LazyScrape

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Indicates that the runnable scraping session should not be run
// in a separate thread.
runnableScrapingSession.setDoLazyScrape( false );

// Start the scrape
runnableScrapingSession.scrape();

setTimeout

void runnableScrapingSession.setTimeout ( int timeout ) (professional and enterprise editions only)

Description

Sets the timeout of the session. That is, after the given number of minutes have passed the session will automatically terminate.

Parameters

timeout An integer representing the timeout length in minutes.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method must be called before scrape.

Examples

Set Scrape Timeout

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Sets the timeout of the session to 60 minutes.
runnableScrapingSession.setTimeout( 60 );

runnableScrapingSession.scrape();

setVariable

void runnableScrapingSession.setVariable ( String identifier, Object value ) (professional and enterprise editions only)

Description

Set the value of a session variable.

Parameters

identifier Name of the variable, as a string.
value What to store in the variable: object, boolean, int, string, etc.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Set Session Variable

runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );

// Sets the session variable "LOGIN_USERNAME" with the value
// "my_username".
runnableScrapingSession.setVariable( "LOGIN_USERNAME", "my_username" );

// Start the scrape
runnableScrapingSession.scrape();

scrapeableFile

Overview

The scrapeableFile object refers to the current file being requested from a given server. It houses both the request for a file and response and can be manipulated to meet any necessary requirements: GET and POST parameters, referer information, cookies, FILE parameters, HTTP headers, characterset, and such.

addGETHTTPParameter

void scrapeableFile.addGETHTTPParameter ( String key, String value, int sequence ) (professional and enterprise editions only)

Description

Dynamically adds a GET parameter to the URL of the current scrapeable file. If a parameter with the given sequence already exists, it will be replaced by the one created from this method call. Calling this method is the equivalent in the workbench of adding a parameter under the "Parameters" tab, and designating the type as GET. Once the scraping session is completed the original HTTP parameters (those under the "Parameters" tab in the workbench) will be restored.

Parameters

key The key portion of the parameter. For example, if the parameter were foo=bar, the key portion would be "foo".
value The value portion of the parameter. For example, if the parameter were foo=bar, the value portion would be "bar".
sequence The sequence the parameter (equivalent to the value under the "Sequence" column in the workbench).

Return Values

None

Change Log

Version	Description
5.5.32a	Available in Professional and Enterprise editions.

Examples

Add a GET HTTP parameter to a scrapeable file

scrapeableFile.addGETHTTPParameter( "searchTerm", "LP player", 3 );

addHTTPHeader

void scrapeableFile.addHTTPHeader ( String key, String value ) (professional and enterprise editions only)

Description

Add an HTTP header to be sent along with the request.

Parameters

key Name of the variable, as a string.
value Value of the variable, as a string

Return Values

Returns void. If you are not using enterprise edition it will throw an error.

Change Log

Version	Description
5.0	Available for professional and enterprise edition.
4.5	Available for enterprise edition.

In certain rare cases it may be necessary to explicitly add a custom header of the POST data of an HTTP request. This may be required in cases where a site is using AJAX, and the POST payload of a request is sent as XML (e.g., using the setRequestEntity method). This method must be invoked before the HTTP request is made (e.g., "Before file is scraped" for a scrapeable file).

Examples

Add AJAX header

// In a script called "Before file is scraped"

// Add and set AJAX-Method header to true.
scrapeableFile.addHTTPHeader( "AJAX-Method", "true" );

addHTTPParameter

void scrapeableFile.addHTTPParameter ( HTTPParameter parameter )

Description

Dynamically add an HTTPParameter to the current scrapeable file.

Parameters

parameter HTTPParameter object.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

The HTTPParameter constructor is as follows: HTTPParameter( String key, String value, int sequence, String type ). Valid types for the constructor are GET, POST, and FILE. Calling this method will have no effect unless it's invoked before the file is scraped.

Examples

Add GET HTTP Parameter

// This would be in a script called "Before file is scraped"

// Create HTTP parameter "page" with a value of "3" in the first location (GET is default)
httpParameter = new com.screenscraper.common.HTTPParameter("page", "3", 1);

// Adds a new GET HTTP parameter to the current file.
scrapeableFile.addHTTPParameter( httpParameter );

Add POST HTTP Parameter

// This would be in a script called "Before file is scraped"

// Create HTTP parameter "page" with a value of "3" in the first location
httpParameter = new com.screenscraper.common.HTTPParameter("page", "3", 1, "POST");

// Adds a new POST HTTP parameter to the current file.
scrapeableFile.addHTTPParameter( httpParameter );

addPOSTHTTPParameter

void scrapeableFile.addPOSTHTTPParameter ( String key, String value ) (professional and enterprise editions only)
void scrapeableFile.addPOSTHTTPParameter ( String key, String value, int sequence )(professional and enterprise editions only)

Description

Dynamically adds a POST parameter to the existing set of POST parameters. If a parameter with the given sequence already exists, it will be replaced by the one created from this method call. If the method call is used that doesn't take a sequence, the new POST parameter will carry a sequence just higher than the highest existing sequence. Calling this method is the equivalent in the workbench of adding a parameter under the "Parameters" tab, and designating the type as POST. Once the scraping session is completed the original HTTP parameters (those under the "Parameters" tab in the workbench) will be restored.

Parameters

key The key portion of the parameter. For example, if the parameter were foo=bar, the key portion would be "foo".
value The value portion of the parameter. For example, if the parameter were foo=bar, the value portion would be "bar".
sequence The sequence the parameter (equivalent to the value under the "Sequence" column in the workbench).

Return Values

None

Change Log

Version	Description
5.5.32a	Available in Professional and Enterprise editions.

Examples

Add a POST HTTP parameter to a scrapeable file

// Adds a POST parameter to the end of the existing set.
scrapeableFile.addPOSTHTTPParameter( "EVENTTARGET", session.getv( "EVENTTARGET" ) );

// Replaces the existing POST parameter with a sequence of 2 with a new one.
scrapeableFile.addPOSTHTTPParameter( "VIEWSTATE", session.getv( "VIEWSTATE" ), 2 );

extractData

DataSet scrapeableFile.extractData ( String text, String extractorPatternName ) (professional and enterprise editions only)

Description

Manually apply an extractor pattern to a string.

Parameters

text The string to which the extractor pattern will be applied.
extractorPatternName Name of extractor pattern in the scrapeable file, as a string. Optionally the scraping session and scrapeable file where the extractor pattern can be found can be specified in the form [scraping session:][scrapeable file:]extractor pattern.

Return Values

Returns DataSet on success. Failures will be written out to the log as errors.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

An example of how to manually extract data is available.

Examples

Extract DataSet

// Applies the "PRODUCT" extractor pattern to the text found in the
// productDescriptionText variable. The resulting DataSet from
// extractData is stored in the variable productData.

DataSet productData = scrapeableFile.extractData( productDescriptionText, "PRODUCT" );

Loop Through DataRecords

// Expanded example using the "PRODUCT" extractor pattern to the text found in the
// productDescriptionText variable. The resulting DataSet from
// extractData is stored in the variable myDataSet, which has multiple dataRecords.
// Each myDataRecord has a PRICE and a PRODUCT_ID. 

myDataSet = scrapeableFile.extractData( productDescriptionText, "PRODUCT" );
for (i = 0; i < myDataSet.getNumDataRecords(); i++) {
myDataRecord = myDataSet.getDataRecord(i);

session.setVariable("PRICE", myDataRecord.get("PRICE"));
session.setVariable("PRODUCT_ID", myDataRecord.get("PRODUCT_ID"));
}

Extractor Pattern from another Scrapeable File

// Apply extractor pattern "PRODUCT" from "Another scrapeable file"
// to the variable productDescriptionText

DataSet productData = scrapeableFile.extractData( productDescriptionText, "Another scrapeable file:PRODUCT" );

Extractor Pattern from another Scraping Session

// Apply extractor pattern "PRODUCT" from "Another scrapeable file"
// in "Other scraping session" to the variable productDescriptionText

DataSet productData = scrapeableFile.extractData( productDescriptionText,
"Other scraping session:Another scrapeable file:PRODUCT" );

extractOneValue

String scrapeableFile.extractOneValue ( String text, String extractorPatternName ) (professional and enterprise editions only)
String scrapeableFile.extractOneValue ( String text, String extractorPatternName, String extractorTokenName ) (professional and enterprise editions only)

Description

Manually retrieve the value of a single extractor token.

Parameters

text The string to which the extractor pattern will be applied.
extractorPatternName Name of extractor pattern in the scrapeable file, as a string. Optionally the scraping session and scrapeable file where the extractor pattern can be found can be specified in the form [scraping session:][scrapeable file:]extractor pattern.
extractorTokenName (optional) Extractor token name, as a string, whose matched value should be returned. If left off the matched value for the first extractor token in the data set will be returned.

Return Values

Returns the match from the last data record, as a string, on success. On failure it returns null and writes a error to the log.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

If you want it to be from the first data record you could use getDataRecord.

Examples

Extract Value

// Applies the extractor pattern "Product Name" to the data found in
// the variable productDescriptionText. The extracted string is
// stored in the productName variable.
// Returns the value found in the first token found in the extractor pattern
// or null if no token is found.

productName = scrapeableFile.extractOneValue( productDescriptionText, "Product Name" );

Extract Value of Specified Token

// Applies the extractor pattern "Product Name" to the data found in
// the variable productDescriptionText. The extracted string is
// stored in the productName variable.
// Returns the value found in the token "NAME" found in the extractor pattern
// or null if no token is found.

productName = scrapeableFile.extractOneValue( productDescriptionText, "Product Name", "NAME" );

Extractor Pattern from another Scrapeable File

// Apply extractor pattern "Product Name" from "Another scrapeable file"
// to the variable productDescriptionText return the first "NAME"

String productName = scrapeableFile.extractOneValue( productDescriptionText, "Another scrapeable file:Product Name", "NAME" );

Extractor Pattern from another Scraping Session

// Apply extractor pattern "Product Name" from "Another scrapeable file"
// in "Other scraping session" to the variable productDescriptionText
// return the first "NAME"

String productName = scrapeableFile.extractData( productDescriptionText,
"Other scraping session:Another scrapeable file:Product Name",
"NAME" );

getASPXValues

DataRecord scrapeableFile.getASPXValues ( boolean onlyStandard ) (professional and enterprise editions only)

Description

Gets the ASPX .NET values from the string. The standard values are __VIEWSTATE, __EVENTTARGET, __EVENTVALIDATION, and __EVENTARGUMENT. Values will be stored in the returned DataRecord as ASPX_VIEWSTATE, ASPX_EVENTTARGET, etc...

Parameters

onlyStandard Sets whether or not to only get the four standard tags, or look for any tags that begin with __

Return Values

A DataRecord object with each ASPX name as ASPX_[NAME] mapped to it's value. Note that when onlyStandard is false, any parameter that starts with the name __ will be returned in this DataRecord

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

Get the .NET values for a page

DataRecord aspx = scrapeableFile.getASPXValues(true);

getAuthenticationPreemptive

boolean scrapeableFile.getAuthenticationPreemptive ( )

Description

Retrieve the authentication expectation of the request.

Parameters

This method does not receive any parameters.

Return Values

Returns whether the scrapeable file expects to have to authenticate and so will send the information initially instead of waiting for the request for it, as a boolean.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Write Expectation Status to Log

// Log expectation of authentication
if ( scrapeableFile.getAuthenticationPreemptive() )
{
session.log( "Expecting Authentication" );
}

getCharacterSet

String scrapeableFile.getCharacterSet ( )

Description

Get the character set being used in the page response rendering.

Parameters

This method does not receive any parameters.

Return Values

Returns the character set applied to the scraped page, as a string. If a character set has not been specified then it will default to the character set specified in settings dialog box.

Change Log

Version	Description
4.5	Available for all editions.

Version	Description
4.5	Available for all editions.

Examples

Collect URL

// In script called "After file is scraped"

// Stores the current URL in the variable currentURL.
currentURL = scrapeableFile.getCurrentURL();

getExtractorPatternTimedOut

boolean scrapeableFile.getExtractorPatternTimedOut () (professional and enterprise editions only)

Description

Indicates whether or not the most recent extractor pattern application timed out.

Parameters

None

Return Values

true or false

Change Log

Version	Description
5.5.36a	Available in all editions.

Examples

Find out about the last extractor pattern attempt

if( scrapeableFile.getExtractorPatternTimedOut() )
{
session.log( "Most recent extractor pattern timed out." );
}

getForceNonBinary

boolean scrapeableFile.getForceNonBinary ( )

Description

Determine whether or not the contents of this response are being forced to be recognized as non-binary.

Parameters

This method does not receive any parameters.

Return Values

Returns true if the scrapeable file is being forced to be treated as non-binary; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Check Binary Status of File

// Determine if the file is being forced
// to be recognized as non-binary

forced = scrapeableFile.getForceNonBinary();

getHTTPResponseHeader

String scrapeableFile.getHTTPResponseHeader ( String header ) (professional and enterprise editions only)

Description

Gets the value of the header in the response of the scrapeable file, or returns null if it couldn't be found

Parameters

header The header name (case-insensitive) to get

Return Value

The value of the header, or null if not found

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Log the Content-Type

session.log(scrapeableFile.getHTTPResponseHeader());

getHTTPResponseHeaderSection

String scrapeableFile.getHTTPResponseHeaderSection ( ) (professional and enterprise editions only)

Description

Gets the header section of the HTTP Response

Parameters

This method takes no parameters

Return Value

A String containing the HTTP Response Headers

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Log the headers

// Split the headers into lines
String[] headers = scrapeableFile.getHTTPResponseHeaderSection().split("[\\r\\n]");
for(int i = 0; i < headers.length; i++)
{
session.log(headers[i]);
}

getHTTPResponseHeaders

Map<String, String> scrapeableFile.getHTTPResponseHeaders ( ) (professional and enterprise editions only)

Description

Gets the headers of the HTTP Response as a map, and returns them.

Parameters

This method takes no parameters

Return Value

A Map from header name to header value for the response headers.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Get the Content-Type header

Map headers = scrapeableFile.getHTTPResponseHeaders();
Iterator it = headers.keySet().iterator();
while(it.hasNext())
{
String next = it.next();
if(next.equalsIgnoreCase("Content-Type"))
session.log("Content-Type was: " + headers.get(next));
}

getLastTidyAttemptFailed

boolean scrapeableFile.getLastTidyAttemptFailed ()

Description

Indicates whether or not the most recent attempt to tidy the HTML failed.

Parameters

None

Return Values

true or false

Change Log

Version	Description
5.5.36a	Available in all editions.

Examples

Find out about the last HTML tidy attempt

if( scrapeableFile.getLastTidyAttemptFailed() )
{
session.log( "Most recent tidy attempt failed." );
}

getMaxRequestAttemptsReached

boolean scrapeableFile.getMaxRequestAttemptsReached () (professional and enterprise editions only)

Description

Indicates whether or not the maximum attempts to request a given scrapeable file were reached.

Parameters

None

Return Values

true or false

Change Log

Version	Description
5.5.36a	Available in all editions.

Examples

Find out about the last request attempt

if( scrapeableFile.getMaxRequestAttemptsReached() )
{
session.log( "Maximum request attempts were reached." );
}

getMaxResponseLength

int scrapeableFile.getMaxResponseLength ( )

Description

Retrieve the kilobyte limit for information retrieved by the scrapeable file, any additional information will not be retrieved.

Parameters

This method does not receive any parameters.

Return Values

Returns the current kilobyte limit on the response, as an integer.

Change Log

Version	Description
5.0	Add for professional and enterprise editions.

Examples

Log Response Size Limit

// Log Limit
session.log( "Max Response Length: " + scrapeableFile.getMaxResponseLength() + " KB" );

getName

String scrapeableFile.getName ( )

Description

Get the name of the scrapeable file.

Parameters

This method does not receive any parameters.

Return Values

Returns the name of the scrapeable file, as a string.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Scrapeable File Name to Log

// Outputs the name of the scrapeable file to the log.

session.log( "Current scrapeable file: " + scrapeableFile.getName() );

getNonTidiedHTML

String scrapeableFile.getNonTidiedHTML ( ) (enterprise edition only)

Description

Retrieve the non-tidied HTML of the scrapeable file.

Parameters

This method does not receive any parameters.

Return Values

Returns the non-tidied contents of the scrapeable file, as a string. On failure it returns null.

Change Log

Version	Description
4.5	Available for enterprise edition.

By default non-tidied html is not retained. For this method to return anything other than null you must use setRetainNonTidiedHTML to force non-tidied html to be retained.

Examples

Write Untidied HTML to Log if Retained

// Outputs the non-tidied HTML from the scrapeable file
// to the log based on whether it was retained or not.

if (scrapeableFile.getRetainNonTidiedHTML())
{
session.log( "Non-tidied HTML: " + scrapeableFile.getNonTidiedHTML() );
}
else
{
session.log( "The non-tidied HTML was not retained or the file has not yet been scraped." );
}

getRedirectURLs

String[] scrapeableFile.getRedirectURLs ( ) (professional and enterprise editions only)

Description

Gets an array of strings containing the redirect URL's for the current scrapeable file request attempt.

Parameters

This method does not receive any parameters.

Return Values

Returns the array of strings; may be empty.

Change Log

Version	Description
6.0.24a	Available in Professional and Enterprise editions.

getRetainNonTidiedHTML

boolean scrapeableFile.getRetainNonTidiedHTML ( ) (enterprise edition only)

Description

Determine if the scrapeable file is set to retain non-tidied html.

Parameters

This method does not receive any parameters.

Return Values

Returns boolean flag for non-tidied contents being retained.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Write Untidied HTML to Log if Retained

// Outputs the non-tidied HTML from the scrapeable file
// to the log if it was retained otherwise just a message.

if (scrapeableFile.getRetainNonTidiedHTML())
{
session.log( "Non-tidied HTML: " + scrapeableFile.getNonTidiedHTML() );
}
else
{
session.log( "The non-tidied HTML was not retained or the file has not yet been scraped." );
}

getRetryPolicy

RetryPolicy scrapeableFile.getRetryPolicy ( ) (professional and enterprise editions only)

Description

Returns the retry policy. Note that in any 'After file is scraped' scripts this is null

Parameters

This method takes no parameters.

Return Value

The Retry Policy that will be used by this scrapeable file

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Check for a retry policy

if(scrapeableFile.getRetryPolicy() == null)
{
session.log(scrapeableFile.getName() + ": Retry policy has been set for this scrapeable file.");
}

getStatusCode

int scrapeableFile.getStatusCode ( ) (professional and enterprise editions only)

Description

Determine the HTTP status code sent by the server.

Parameters

This method does not receive any parameters.

Return Values

Returns integer corresponding to the HTTP status code of the response.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Write warning to log on 404 error

// Check for a 404 response (file not found).
if( scrapeableFile.getStatusCode() == 404 )
{
url = scrapeablefile.getCurrentURL();
session.log( "Warning! The server returned a 404 response for the url ( " + url + ")." );
}

getUserAgent

String scrapeableFile.getUserAgent ( )

Description

Retrieve the name of the user agent making the request.

Parameters

This method does not receive any parameters.

Return Values

Returns the user agent, as a string.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Write User Agent to Log

// write to log
session.log( scrapeableFile.getUserAgent( ) );

inputOutputErrorOccured

boolean scrapeableFile.inputOutputErrorOccurred ( )

Description

Determine if an input or output error occurred when requesting file.

Parameters

This method does not receive any parameters.

Return Values

Returns true if an error has occurred; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for all editions.

This method should be run after the scrapeable file has been scraped.

Examples

End scrape on Error

// Check for error 
if (scrapeableFile.inputOutputErrorOccurred())
{
// Log error occurrence
session.log("Input/output error occurred.");
// End scrape
session.stopScraping();
}

noExtractorPatternsMatched

boolean scrapeableFile.noExtractorPatternsMatched ( )

Description

Determine whether any extractor patterns associated with the scrapeable file found a match.

Parameters

This method does not receive any parameters.

Return Values

Returns boolean corresponding to whether any extractor pattern matched in the scrapeable file.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Warning if no Extractor Patterns matched

// If no patterns matched, outputs a message indicating such
// to the session log.

if( scrapeableFile.noExtractorPatternsMatched() )
{
session.log( "Warning! No extractor patterns matched." );
}

removeAllHTTPParameters

void scrapeableFile.removeAllHTTPParameters ( ) (professional and enterprise editions only)

Description

Remove all of the HTTP parameters from the current scrapeable file.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Delete HTTP Parameters

// Removes all of the HTTP parameters from the current scrapeable file.
scrapeableFile.removeAllHTTPParameters();

removeHTTPHeader

void scrapeableFile.removeHTTPHeader ( String key ) (enterprise edition only)
void scrapeableFile.removeHTTPHeader ( String key, String value ) (enterprise edition only)

Description

Remove an HTTP header from a scrapeable file.

Parameters

key The name of the HTTP header to be removed, as a string.
value (optional) The value of the HTTP header that is to be removed, as a string. If this is left off then all headers of the specified key will be removed.

Return Values

Returns void.

Change Log

Version	Description
5.0.5a	Introduced for enterprise edition.

Examples

Remove All Values of a Header

// delete all cookie headers for this scrapeableFile
// this can be done on a global scale
// using session.clearCookies
scrapeableFile.removeHTTPHeader( "User-Agent" );

removeHTTPParameter

void scrapeableFile.removeHTTPParameter ( int sequence )
void scrapeableFile.removeHTTPParameter ( String key ) (professional and enterprise editions only)

Description

Dynamically removes an HTTPParameter. The order of the remaining parameters are adjusted immediately.

Parameters

sequence The ordered location of the parameter.
key The key identifying the HTTP parameter to be removed.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.
5.5.32a: Added method call that takes a String.	Available for Professional and Enterprise editions.

If calling this method more than once in the same script, when used in conjunction with the addHTTPParameter method, it is important to keep track of how the list is reordered before calling either method again.

Calling this method will have no effect unless it's invoked before the file is scraped.

This method can be used for both GET and POST parameters.

Examples

Remove HTTP parameter

// In a script called "Before file is scraped"

// Removes the eighth HTTP parameter from the current file.
scrapeableFile.removeHTTPParameter( 8 );

resequenceHTTPParameter

void scrapeableFile.resequenceHTTPParameter ( String key, int sequence ) (professional and enterprise editions only)

Description

Resequences an HTTP parameter.

Parameters

key The key identifying the HTTP parameter to be resequenced.
sequence The new sequence the parameter should have.

Return Values

None

Change Log

Version	Description
5.5.32a	Available in Professional and Enterprise editions.

Examples

Resequence an HTTP parameter

// Give the "VIEWSTATE" HTTP parameter a sequence of 3.
scrapeableFile.resequenceHTTPParameter( "VIEWSTATE", 3 );

resolveRelativeURL

String scrapeableFile.resolveRelativeURL ( String urlToResolve ) (professional and enterprise editions only)

Description

Resolves a relative URL to an absolute URL based on the current URL of this scrapeable file.

Parameters

urlToResolve Relative file path, as a string.

Return Values

Returns string containing the complete url to the file. On failure it will return the relative path and an error will be written to the log.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Resolve relative URL into an absolute URL

// Assuming the URL of the current scrapeable file is
// "https://www.screen-scraper.com/path/to/file/"
// the method call would result in the URL
// "https://www.screen-scraper.com/path/to/file/thisfile.php"
// begin assigned to the "fullURL" variable.

fullURL = scrapeableFile.resolveRelativeURL( "thisfile.php" );

saveFileBeforeTidying

void scrapeableFile.saveFileBeforeTidying ( String filePath ) (professional and enterprise editions only)

Description

Write non-tidied contents of the scrapeable file response to a text file.

Parameters

filePath File path, as a string, where the file should be saved.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method must be called before the file is scraped.

Because the response header are also saved in the file, if the file is anything except a text file it will not be valid (e.g. images, pdfs).

Examples

Save Untidied Request and Response

// In script called "Before file is scraped"

// Causes the non-tidied HTML from the scrapeable file
// to be output to the file path.

scrapeableFile.saveFileBeforeTidying( "C:/non-tidied.html" );

saveFileOnRequest

void scrapeableFile.saveFileOnRequest ( String filePath ) (enterprise edition only)

Description

Save the file returned from a scrapeable file request.

Parameters

filePath Location where the file should be saved as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

This method must be called from a scrapeable file before the file is scraped. Do not call this method from a script which is invoked by other means such as after an extractor pattern match or from within another script.

It is preferable to use downloadFile; however, at times you may have to send POST parameters in order to access a file. If that is the case, you would use this method.

This method cannot save local file requests to another location.

Examples

Save requested file

// In script called "Before file is scraped"

// When the current file is requested it will be saved to the
// local file system as "sample.pdf".

scrapeableFile.saveFileOnRequest( "C:/downloaded_files/sample.pdf" );

setAuthenticationPreemptive

void scrapeableFile.setAuthenticationPreemptive ( boolean preemptiveAuthentication )

Description

Set the authentication expectation of the request.

Parameters

preemptiveAuthentication Whether the scrapeable file expects to have to authenticate and so will send the information initially instead of waiting for the request for it, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Set Preemptive Authentication

// Set expectation of authentication
scrapeableFile.setAuthenticationPreemptive( false );

setCharacterSet

void scrapeableFile.setCharacterSet ( String characterSet ) (professional and enterprise editions only)

Description

Set the character set used in a specific scrapeable file's response renderings. This can be particularly helpful when the page renders characters incorrectly.

Parameters

characterSet Java recognized character set, as a string. Java provides a list of supported character sets in its documentation.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

This method must be called before the file is scraped.

If you are having trouble with characters displaying incorrectly, we encourage you to read about how to go about finding a solution using one of our FAQs.

Examples

Set Character Set of Scrapeable File

// In script called "Before file is scraped"

// Sets the character set to be applied to the last response.
scrapeableFile.setCharacterSet( "ISO-8859-1" );

setContentType

void scrapeableFile.setContentType ( String contentType ) (professional and enterprise editions only)

Description

Set POST payload type. This is particularly helpful with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.

Parameters

setContentType Desired content type of the POST payload, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method must be called before the file is scraped.

This method is usually used in connection with setRequestEntity as that method specifies the content of the POST data.

Examples

Set Content Type for XML payload in AJAX

// In script called "Before file is scraped"

// Sets the type of the POST entity to XML.
scrapeableFile.setContentType( "text/xml" );

// Set content of POST data
scrapeableFile.setRequestEntity( "<person><name>John Smith</name></person>" );

setForceMultiPart

void scrapeableFile.setForceMultiPart ( boolean forceMultiPart ) (professional and enterprise editions only)

Description

Set content type header to multipart/form-data.

Parameters

forceMultiPart Boolean representing whether the request contains multipart data (e.g. images, files) as opposed to plain text. The default is false.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method must be called before the file is scraped.

Occasionally a site will expect a multi-part request when a file is not being sent in the request.

If you include a file upload parameter under the parameters tab of the scrapeable file the request will automatically be multi-part.

Examples

Specify that Request contains Files

// In script called "Before file is scraped"

// Will cause the request to be made as a multi-part request.
scrapeableFile.setForceMultiPart( true );

setForceNonBinary

void scrapeableFile.setForceNonBinary ( boolean forceNonBinary )

Description

Set whether or not the contents of this response should be forced to be treated as non-binary. Default forceNonBinary value is false.

Parameters

forceNonBinary Whether or not the scrapeable file should be forced to be non-binary.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

This is provided in the case where screen-scraper misidentifies a non-binary file as a binary file. It doesn't happen often but is possible.

Examples

Check Binary Status of File

// Force file to be recognized as non-binary
scrapeableFile.setForceNonBinary( true );

setForcePOST

void scrapeableFile.setForcePOST ( Boolean forcePOST ) (professional and enterprise editions only)

Description

Determines whether or not a POST request should be forced.

Parameters

forcePOST Whether a POST

Return Values

Returns void.

Change Log

Version	Description
6.0.14a	Available in Professional and Enterprise editions.

setForcedRequestType

void scrapeableFile.setForcedRequestType ( ScrapeableFile.RequestType type ) (professional and enterprise editions only)

Description

Sets the request type to use.

Parameters

type The type of request to issue, or null to let screen-scraper decide.

ScrapeableFile.RequestType is an enum with the following options as values
- GET
- POST
- HEAD
- DELETE
- OPTIONS
If the method sets the request to one of those types, all paramenters set as GET in the paramenters tab will be appended to the url (like normal) and all parameters set as POST parameters will be used to buld the request entity. If there are POST values on a type that doesn't support a request entity an exception will be thrown when the request is issued.

Return Values

Returns void.

Change Log

Version	Description
6.0.55a	Available in Professional and Enterprise editions.

Examples

Sets the request type

scrapeableFile.setForcedRequestType(ScrapeableFile.RequestType.PUT)

setLastScrapedData

void scrapeableFile.setLastScrapedData(String) (enterprise edition only)

Description

Overwrite the content of the "last response"

Parameters

String Desired new content of the last response

Return Values

Returns void.

This method must be called from an extractor pattern before the pattern is run.

Examples

Replace new line characters with a space

newLastResponse = scrapeableFile.getContentAsString().replaceAll("\\n"," ");
scrapeableFile.setLastScrapedData(newLastResponse );

setMaxResponseLength

void scrapeableFile.setMaxResponseLength ( int maxKBytes ) (professional and enterprise editions only)

Description

Limit the amount of information retrieved by the scrapeable file. This method can be useful in cases of very large responses where the desired information is found in the first portion of the response. It can also help to make the scraping process more efficient by only downloading the needed information.

Parameters

maxKBytes Kilobytes to be downloaded, as an integer.

Return Values

Returns void.

Change Log

Version	Description
5.0	Add for professional and enterprise editions.

This method must be called before the file is scraped.

Examples

Limit Response Size

// In script called "Before file is scraped"

// Only download the first 50 KB
scrapeableFile.setMaxResponseLength(50);

setReferer

void scrapeableFile.setReferer ( String url ) (professional and enterprise editions only)

Description

Set referer HTTP header.

Parameters

url URL of the referer, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method must be called before the file is scraped.

Examples

Specify that Request contains Files

// In script called "Before file is scraped"

// Sets the value of url as the HTTP header
// referer for the current scrapeable file.

scrapeableFile.setReferer( "http://www.foo.com/" );

setRequestEntity

void scrapeableFile.setRequestEntity ( String requestEntity ) (professional and enterprise editions only)

Description

Set POST payload data. This is particularly helpful with scraping some site's implementation of AJAX, where the payload in explicitly set as xml.

Parameters

requestEntity Desired content of the POST payload, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method must be called before the file is scraped.

This method is usually used in connection with setContentType as that method specifies the content of the POST data.

Though you can set plain text POST data using this method it is preferable to use the addHTTPParameter method for this task.

Examples

Set POST data as XML

setRetainNonTidiedHTML

void scrapeableFile.setRetainNonTidiedHTML ( boolean retainNonTidiedHTML ) (enterprise edition only)

Description

Set whether or not non-tidied HTML is to be retained for the current scrapeable file.

Parameters

retainNonTidiedHTML Whether the non-tidied HTML should be retained, as a boolean. The default is false.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

If, after the file is scraped, you want to be able to use getNonTidiedHTML this method has to be called before the file is scraped.

Examples

Retain Non-tidied HTML

// In script called "Before file is scraped"

// Tells screen-scraper to retain tidied HTML for the current
// scrapeable files.

scrapeableFile.setRetainNonTidiedHTML( true );

setRetryPolicy

void scrapeableFile.setRetryPolicy ( RetryPolicy policy ) (professional and enterprise editions only)

Description

Sets a Retry Policy that will be run to check if a page should be re-downloaded or not. The policy will be checked after all the extractors have run, and will check for an error on the page based on a set of conditions. If the policy shows an error on the page, it can run scripts or other code to attempt to remedy the situation, and then it will rescrape the file.

The file will be re-downloaded without rerunning any of the scripts that run before the file is downloaded, and before any of the scripts marked to run after the file is scraped. If there is any change that needs to be made to session variables/headers, etc... they should be made in the script or runnable that will be executed. Also, the policy can specify that session variables should be restored to their previous values before the file is rescraped. If it does, they will be reset after the error checking portion of the policy but before the policy runs the code to make changes before a retry.

The retry policy should be set in a script run 'Before file is scraped', but can also be set by a script on an extractor pattern. It it is set on an extractor pattern, session variables will not be restored if the retry is required

Parameters

policy The policy that should be run. See the RetryPolicyFactory for standard policies, or one can be created by implementing the RetryPolicy interface

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Set a basic retry policy

import com.screenscraper.util.retry.RetryPolicyFactory;

// Use a policy that will retry up to 5 times, and on each failed attempt to load
// the page, it will execute the "Get new Proxy" script

scrapeableFile.setRetryPolicy(RetryPolicyFactory.getBasicPolicy(5, "Get new Proxy"));

setUserAgent

void scrapeableFile.setUserAgent ( String userAgent ) (professional and enterprise editions only)

Description

Explicitly state the user agent making the request.

Parameters

userAgent User agent name, as a string. There are a lot of possible user agents, a list is maintained by User-Agents.org. The default is Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322).

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method must be called before the file is scraped.

Examples

Set User Agent

// In script called "Before file is scraped"

// Causes screen-scraper to identify itself as Firefox
// running on Linux.

scrapeableFile.setUserAgent( "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826" );

wasErrorOnRequest

boolean scrapeableFile.wasErrorOnRequest ( )

Description

Determine if an error occurred with the request. Errors are considered to be server timeouts as well as any status code outside of the range 200-399.

Parameters

This method does not receive any parameters.

Return Values

Returns true for server timeouts as well as any status code outside of the range 200-399; otherwise, it returns false.

Change Log

Version	Description
4.5	Available for all editions.

This method must be called after the file is scraped.

If you want to know what the status code was you can use getStatusCode.

Examples

Check for Request Errors

// In script called "After file is scraped"

// If an error occurred when the file was requested, an error
// message indicating such gets output to the log.

if( scrapeableFile.wasErrorOnRequest() )
{
session.log( "Connection error occurred." );
}

session

Overview

This object refers to the current scraping session that is running. To make the methods a little easier to sort through they have been grouped into related methods. The groups have been named to ease in finding them when they are needed.

Anonymization

Overview

The following methods are provided to aid you in setting up an anonymous scraping session. If you are using your own server proxy pool you will use the methods to allow screen-scraper to interact with and manage your proxy pool. If you are using automatic anonymization then the only method you will use is currentProxyServerIsBad as screen-scraper will manage the servers using the anonymization settings from your setup.

See an example of Anonymization via Manual Proxy Pools.

currentProxyServerIsBad

void session.currentProxyServerIsBad ( ) (professional and enterprise editions only)

Description

Remove proxy server from proxy pool. This is only used with anonymization and indicates that one server in the pool is bad and should be removed.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

If you are using automatic anonymization or manual proxy pools, a new proxy server will be created as a result of the method call.

When checking if a request you have made is invalid it is best not to rely on the HTTP status code (eg. 404) alone as the status codes are not always accurate. It is recommended that you also scrape a known string (eg. "Not found") from the response HTML that validates the status code.

Examples

Flag Proxy Server

// Indicates that the current proxy server is bad.
session.currentProxyServerIsBad();

getCurrentProxyServerFromPool

ProxyServer session.getCurrentProxyServerFromPool ( )

Description

Get the current proxy server from the proxy server pool.

Parameters

This method does not receive any parameters.

Return Values

Returns the current proxy server being used.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Proxy Server Description to Log

// Get Proxy Server
proxyServer = session.getCurrentProxyServerFromPool();

// Log Server Description
session.log( "Proxy Server: " + proxyServer.getDescription() );

getProxyServerPool

void session.getProxyServerPool ()

Description

Holds the proxy server pool object that allows proxies to be cycled through.

Parameters

This method does not receive any parameters.

Return Values

Returns true if there is an available proxy server pool.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Check if ProxyServerPool object exists.

// If ProxyServerPool does not exist
// Create a new ProxyServerPool object.
if ( !session.getProxyServerPool() )
{
// The ProxyServerPool object will
// control how screen-scraper interacts with proxy servers.

proxyServerPool = new ProxyServerPool();

// We give the current scraping session a reference to
// the proxy pool. This step should ideally be done right
// after the object is created (as in the previous step).

session.setProxyServerPool( proxyServerPool );
}

getTerminateProxiesOnCompletion

boolean session.getTerminateProxiesOnCompletion ( )

Description

Determine whether proxies are set to be terminated when the scrape ends.

Parameters

This method does not receive any parameters.

Return Values

Returns true if a proxy will be terminated; otherwise, it returns false.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Check Termination Setting

// Log whether proxies are being terminated or not
if ( session.getTerminateProxiesOnCompletion() )
{
session.log( "Anonymous Proxies are set to be terminated with the scrape." );
}
else
{
session.log( "Anonymous Proxies are set to continue running after the scrape is finished." );
}

getUseProxyFromPool

boolean session.getUseProxyFromPool ( )

Description

Determine whether proxies are being used from proxy pool.

Parameters

This method does not receive any parameters.

Return Values

Returns true if a proxy pool is being used; otherwise, it returns false.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Turn On Proxy Pool Usage If Not Running

// Are proxies being used from a pool
if ( !session.getUseProxyFromPool() )
{
session.setUseProxyFromPool( true );
}

setProxyServerPool

void session.setProxyServerPool ( ProxyServerPool proxyServerPool )

Description

Associate a proxy pool with a scraping session.

Parameters

proxyServerPool A ProxyServerPool object.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Associate Proxy Pool with Scraping Session

setTerminateProxiesOnCompletion

void session.setTerminateProxiesOnCompletion ( boolean terminateProxies )

Description

Manually set the outcome of proxies when the scrape ends.

Parameters

terminateProxies Whether proxies should be terminated at the end of the session or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Make Sure Proxies are Deleted on Scrape Completion

// Test
if ( session.getTerminateProxiesOnCompletion() )
{
session.log( "Anonymous Proxies are set to be terminated with the scrape." );
}
else
{
// Set proxies to be terminated with the scrape
session.setTerminateProxiesOnCompletion( true );
session.log( "Anonymous Proxies updated to be terminated with the scrape." );
}

setUseProxyFromPool

void session.setUseProxyFromPool ( boolean useProxyFromPool )

Description

Determine if proxies from a proxyServerPool be used when making scrapeable file request.

Parameters

useProxyFromPool Whether proxies in the proxyServerPool should be used, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Anonymize Scrapeable Files

// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.

proxyServerPool = new ProxyServerPool();

// We give the current scraping session a reference to
// the proxy pool. This step should ideally be done right
// after the object is created (as in the previous step).

session.setProxyServerPool( proxyServerPool );

... Proxy Server Pool Setup ...

// This is the switch that tells the scraping session to make
// use of the proxy servers. Note that this can be turned on
// and off during the course of the scrape. You may want to
// anonymize some pages, but not others.
session.setUseProxyFromPool( true );

External Proxy Settings

Overview

If you are already going through a proxy server, screen-scraper must be told the credentials in order to get out to the internet. These methods are all provided to manually tell screen-scraper how to get through your external proxy.

If you always go through the same external proxy you would probably want to set the credentials in screen-scraper's proxy settings so that you don't have to specify them in all of your scrapes.

getExternalNTProxyDomain

string session.getExternalNTProxyDomain ( )

Description

Retrieve the external NT proxy domain.

Parameters

This method does not receive any parameters.

Return Values

Returns the external NT domain, as a string.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Log External NT Proxy Settings

// Log External Proxy Settings
session.log( "Username: " + session.getExternalNTProxyUsername( ) );
session.log( "Password: " + session.getExternalNTProxyPassword( ) );
session.log( "Domain: " + session.getExternalNTProxyDomain( ) );
session.log( "Host: " + session.getExternalNTProxyHost( ) );

getExternalNTProxyHost

string session.getExternalNTProxyHost ( )

Description

Retrieve the external NT proxy host.

Parameters

This method does not receive any parameters.

Return Values

Returns the external NT host, as a string.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Log External NT Proxy Settings

getExternalNTProxyPassword

string session.getExternalNTProxyPassword ( )

Description

Retrieve the external NT proxy password.

Parameters

This method does not receive any parameters.

Return Values

Returns the external NT password, as a string.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Log External NT Proxy Settings

getExternalNTProxyUsername

string session.getExternalNTProxyUsername ( )

Description

Retrieve the external NT proxy username.

Parameters

This method does not receive any parameters.

Return Values

Returns the external NT username, as a string.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Log External NT Proxy Settings

getExternalProxyHost

string session.getExternalProxyHost ( )

Description

Retrieve the external proxy host.

Parameters

This method does not receive any parameters.

Return Values

Returns the external host, as a string.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Log External Proxy Settings

// Log External Proxy Settings
session.log( "Username: " + session.getExternalProxyUsername( ) );
session.log( "Password: " + session.getExternalProxyPassword( ) );
session.log( "Host: " + session.getExternalProxyHost( ) );
session.log( "Port: " + session.getExternalProxyPort( ) );

getExternalProxyPassword

string session.getExternalProxyPassword ( )

Description

Retrieve the external proxy password.

Parameters

This method does not receive any parameters.

Return Values

Returns the external password, as a string.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Log External Proxy Settings

getExternalProxyPort

string session.getExternalProxyPort ( )

Description

Retrieve the external proxy port.

Parameters

This method does not receive any parameters.

Return Values

Returns the external port, as a string.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Log External Proxy Settings

getExternalProxyUsername

string session.getExternalProxyUsername ( )

Description

Retrieve the external proxy username.

Parameters

This method does not receive any parameters.

Return Values

Returns the external username, as a string.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Log External Proxy Settings

setExternalNTProxyDomain

void session.setExternalNTProxyDomain ( String domain )

Description

Manually set external NT proxy domain.

Parameters

domain Domain for the external NT proxy, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

If you are using this method on all of your scripts you might want to set it in screen-scraper's external NT proxy settings.

If you are using NTLM (Windows NT) authentication you'll need to designate settings for both the standard external proxy as well as the external NT proxy.

Examples

Manually Setup External NT Proxy

// Setup External Proxy
session.setExternalNTProxyUsername( "guest" );
session.setExternalNTProxyPassword( "guestPassword" );
session.setExternalNTProxyDomain( "Group" );
session.setExternalNTProxyHost( "proxy.domain.com" );

setExternalNTProxyHost

void session.setExternalNTProxyHost ( String host )

Description

Manually set external NT proxy host/domain.

Parameters

host Host/domain for the external NT proxy, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

If you are using this method on all of your scripts you might want to set it in screen-scraper's external NT proxy settings.

If you are using NTLM (Windows NT) authentication you'll need to designate settings for both the standard external proxy as well as the external NT proxy.

Examples

Manually Setup External NT Proxy

setExternalNTProxyPassword

void session.setExternalNTProxyPassword ( String password )

Description

Manually set external NT proxy password.

Parameters

password Password for the external NT proxy, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

If you are using this method on all of your scripts you might want to set it in screen-scraper's external NT proxy settings.

If you are using NTLM (Windows NT) authentication you'll need to designate settings for both the standard external proxy as well as the external NT proxy.

Examples

Manually Setup External NT Proxy

setExternalNTProxyUsername

void session.setExternalNTProxyUsername ( String username )

Description

Manually set external NT proxy username.

Parameters

username Username for the external NT proxy, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

If you are using this method on all of your scripts you might want to set it in screen-scraper's external NT proxy settings.

If you are using NTLM (Windows NT) authentication you'll need to designate settings for both the standard external proxy as well as the external NT proxy.

Examples

Manually Setup External NT Proxy

setExternalProxyHost

void session.setExternalProxyHost ( String host )

Description

Manually set external proxy host/domain.

Parameters

host Host/domain for the external proxy, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

If you are using this method on all of your scripts you might want to set it in screen-scraper's external proxy settings.

Examples

Manually Setup External Proxy

// Setup External Proxy
session.setExternalProxyUsername( "guest" );
session.setExternalProxyPassword( "guestPassword" );
session.setExternalProxyHost( "proxy.domain.com" );
session.setExternalProxyPort( "80" );

setExternalProxyPassword

void session.setExternalProxyPassword ( String password )

Description

Manually set external proxy password.

Parameters

password Password for the external proxy, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

If you are using this method on all of your scripts you might want to set it in screen-scraper's external proxy settings.

Examples

Manually Setup External Proxy

setExternalProxyPort

void session.setExternalProxyPort ( String port )

Description

Manually set external proxy port.

Parameters

port Port for the external proxy, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

If you are using this method on all of your scripts you might want to set it in screen-scraper's external proxy settings.

Examples

Manually Setup External Proxy

setExternalProxyUsername

void session.setExternalProxyUsername ( String username )

Description

Manually set external proxy username.

Parameters

username Username for the external proxy, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

If you are using this method on all of your scripts you might want to set it in screen-scraper's external proxy settings.

Examples

Manually Setup External Proxy

Logging

Overview

Use of log is a great tool to ensure that your scrapes are working correctly as well as troubleshooting problems that arise. Though logging large amounts of information may slow down a scrape, the best way around this is not to remove log writing requests but rather change the verbosity of the logging when running the scrape in a production environment. If you do this, know that you make it harder to troubleshoot some problems should they arise.

The number of methods provided is merely to enhance your ability to log information according to importance.

getLogFileName

String session.getLogFileName ( ) (professional and enterprise editions only)

Description

Get the name of the current log file.

Parameters

This method does not receive any parameters.

Return Values

Returns the name of the log file, as a string.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method can be very helpful when screen-scraper is running in server mode and you are tracking the log where the scrape of a record is located, or for tracking the location of errors in larger scrapes.

Examples

Get Log's File Name

// Output the name of the log file to the session log.
logName = session.getLogFileName();

log

void session.log ( Object message )

Description

Write message to the log.

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for all editions.

When the workbench is running, this will be found under the log tab for the scraping session. When screen-scraper is running in server mode, the message will get sent to the corresponding .log file found in screen-scraper's log folder. When screen-scraper is invoked from the command line, the message will get sent to standard out.

Examples

Write to Log

// Sends the message to the log.
session.log( "Inserting extracted data into the database." );

logCurrentDateAndTime

void session.logCurrentDateAndTime ( ) (professional and enterprise editions only)

Description

Write current date and time to log (at most verbose level). It is formatted to be human readable.

Parameters

This method does not receive any parameters.

Return Values

Returns void. If an error occurs, an error will be thrown.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Log Date and Time

// Output the current date and time to the log.
session.logCurrentDateAndTime();

logCurrentTime

void session.logCurrentTime ( ) (professional and enterprise editions only)

Description

Write current time to log (at most verbose level). The time is formatted to be human readable.

Parameters

This method does not receive any parameters.

Return Values

Returns void. If an error occurs, an error will be thrown.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Log Formatted Time

// Output the current date and time to the log.
session.logCurrentTime();

logDebug

void session.logDebug ( Object message ) (professional and enterprise editions only)

Description

Write message to the log, at the the debug level (most verbose).

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for professional and enterprise editions.

Examples

Write to Log at Debug level

// Sends the message to the lowest level of logging.
session.logDebug( "Index: " + session.getVariable( "INDEX" ) );

log() [session] - Sends a message to the log as a debugging message
logInfo() [session] - Sends a message to the log as an informative message
logWarn() [session] - Sends a message to the log as a warning
logError() [session] - Sends a message to the log as an error message
debug() [log] - Sends a message to the log as a debug message

logElapsedRunningTime

void session.logElapsedRunningTime ( ) (professional and enterprise editions only)

Description

Write scrape run time to the log (at most verbose level). It is formatted to be human readable, including breaking it into days, hours, minutes, and seconds.

Parameters

This method does not receive any parameters.

Return Values

Returns void. If an error occurs, an error will be thrown.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Log Time the Scrape has been Running

// Output the running time to the log.
session.logElapsedRunningTime();

logError

void session.logError ( Object message ) (professional and enterprise editions only)

Description

Write message to the log, at the the error level (least verbose).

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void. If an error occurs, an error will be thrown.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for professional and enterprise editions.

Examples

Write to Log at Error level

// Sends the message to the highest level of logging.
session.logError( "Error parsing date: " + session.getVariable( "DATE" ) );

log() [session] - Sends a message to the log as a debugging message
logDebug() [session] - Sends a message to the log as a debugging message
logInfo() [session] - Sends a message to the log as an informative message
logWarn() [session] - Sends a message to the log as a warning
error() [log] - Sends a message to the log as an error message

logInfo

void session.logInfo ( Object message ) (professional and enterprise editions only)

Description

Write message to the log, at the the info level (second most verbose).

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void. If an error occurs, an error will be thrown.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for professional and enterprise editions.

Examples

Write to Log at Info level

// Sends the message to the second lowest level of logging.
session.logInfo( "Traversing search results pages..." );

log() [session] - Sends a message to the log as a debugging message
logDebug() [session] - Sends a message to the log as a debugging message
logWarn() [session] - Sends a message to the log as a warning
logError() [session] - Sends a message to the log as an error message
info() [log] - Sends a message to the log as an info message

logVariables

void session.logVariables ( ) (professional and enterprise editions only)

Description

Write all session variables to log.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Log All Session Variables

// Write Variables to Log
session.logVariables();

logWarn

void session.logWarn ( Object message ) (professional and enterprise editions only)

Description

Write message to the log, at the the warn level (third most verbose).

Parameters

message Message to be written to the log after being converted to a String using String.valueOf( message ).

Return Values

Returns void. If an error occurs, an error will be thrown.

Change Log

Version	Description
5.5	Now accepts any Object as a message
4.5	Available for professional and enterprise editions.

Examples

Write to Log at Info level

// Sends the message to the third level of logging.
session.logWarn( "Warning! Received a 404 response." );

log() [session] - Sends a message to the log as a debugging message
logDebug() [session] - Sends a message to the log as a debugging message
logInfo() [session] - Sends a message to the log as an informative message
logError() [session] - Sends a message to the log as an error message
warn() [log] - Sends a message to the log as an warning message

Web Interface Interactions

Overview

These methods are used in connection with the web interface of screen-scraper. Their use will provide the interface with more detailed information regarding the state of a running scrape. If you are not running the scrapes using the web interface then these methods are not particularly helpful to you.

As the web interface is an enterprise edition feature, these methods are only available in enterprise edition users.

addToNumDuplicateRecordsScraped

void session.addToNumDuplicateRecordsScraped ( Object value ) (enterprise edition only)

Description

Add to the value of duplicate records scraped. (As opposed to new or error records.)

Parameters

value Value to be added to the count. Usually a integer but if it is given a string (e.g. "10") it will try to transform it into an integer before adding.

Return Values

Returns void.

Change Log

Version	Description
7.0	Available for enterprise edition.

Examples

Record New Records Scraped

// Adds 10 to the value of new records scraped.
session.addToNumDuplicateRecordsScraped(10);

Have session record each time a new record saved to the database

// In script called "After each pattern match"
import java.sql.PreparedStatement;
import java.sql.ResultSet;

dm = session.getv("_DM");
con = dm.getConnection();

try
{
String sql = "SELECT id FROM table WHERE did = ?";
PreparedStatement pstmt = con.prepareStatement(sql);
pstmt.setString(1, dataRecord.get("ID"));
ResultSet rs = pstmt.executeQuery();
if (rs.next())
{
log.log("---Already in DB");
session.addToNumDuplicateRecordsScraped(1);
}
else
{
session.scrapeFile("Results");
}
}
catch (Exception e)
{
log.logError(e);
session.setFatalErrorOccurred(true);
session.setErrorMessage(e);
}
finally
{
con.close();
}

addToNumErrorRecordsScraped

void session.addToNumErrorRecordsScraped ( Object value ) (enterprise edition only)

Description

Add to the value error records. (As opposed to duplicate or new records.)

Parameters

value Value to be added to the count. Usually a integer but if it is given a string (e.g. "10") it will try to transform it into an integer before adding.

Return Values

Returns void.

Change Log

Version	Description
7.0	Available for enterprise edition.

Examples

Record New Records Scraped

// Adds 10 to the value of new records scraped.
session.addToNumErrorRecordsScraped(10);

Have session record each time a dataRecord is missing a vital datam

// In script called "After each pattern match"
if (sutil.isNullOrEmptyString(dataRecord.get("VITAL_DATUM")))
{
log.logError("Missing VITAL_DATUM");
session.addToNumErrorRecordsScraped(1);
}

addToNumNewRecordsScraped

void session.addToNumNewRecordsScraped ( Object value ) (enterprise edition only)

Description

Add to the value of new records scraped. (As opposed to duplicate or error records.)

Parameters

value Value to be added to the count. Usually a integer but if it is given a string (e.g. "10") it will try to transform it into an integer before adding.

Return Values

Returns void.

Change Log

Version	Description
7.0	Available for enterprise edition.

Examples

Record New Records Scraped

// Adds 10 to the value of new records scraped.
session.addToNumNewRecordsScraped(10);

Have session record each time a new record saved to the database

// In script called "After each pattern match"
dm = session.getv("_DM");
dm.addData("db_table", dataRecord);
dm.commit("db_table");
if (dm.flush())
{
session.addToNumNewRecordsScraped(1);
}

addToNumRecordsScraped

void session.addToNumRecordsScraped ( Object value ) (enterprise edition only)

Description

Add to the value of number of records scraped.

Parameters

value Value to be added to the count. Usually a integer but if it is given a string (e.g. "10") it will try to transform it into an integer before adding.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Record Number of Records Scraped

// Adds 10 to the value of the number of records scraped.
session.addToNumRecordsScraped( 10 );

Have session record each time a DataRecord exists

// In script called "After file is scraped"

// Adds number of DataRecords in DataSet
// to the value of the number of records scraped.

session.addToNumRecordsScraped( dataSet.getNumDataRecords() );

appendErrorMessage

void session.appendErrorMessage ( String errorMessage ) (enterprise edition only)

Description

Append an error message to any existing error messages.

Parameters

errorMessage Error message that should be added, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

User Specified Error

// First set the flag indicating that an error occurred.
session.setFatalErrorOccurred( true );

// Append an error message.
session.appendErrorMessage( "An error occurred in the scraping session." );

getErrorMessage

String session.getErrorMessage ( ) (enterprise edition only)

Description

Get the current error message.

Parameters

This method does not receive any parameters.

Return Values

Returns current error message, as a string.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Write Error Message to the Log

// Output the current error message to the log.
session.log( "Error message: " + session.getErrorMessage() );

getFatalErrorOccurred

boolean session.getFatalErrorOccurred ( ) (enterprise edition only)

Description

Determine the fatal error status of the scrape.

Parameters

This method does not receive any parameters.

Return Values

Returns whether a fatal error has occurred, as a boolean .

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Write Fatal Error State to Log

// Output the "fatal error" state to the log.
session.log( "Fatal error occurred: " + session.getFatalErrorOccurred() );

getNumRecordsScraped

int session.getNumRecordsScraped ( ) (enterprise edition only)

Description

Get the number of records that have been scraped.

Parameters

This method does not receive any parameters.

Return Values

Returns number of records scraped, as a integer.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Write Number of Records Scraped to Log

// Outputs the number of records that have been scraped to the log.
session.log( "Num records scraped so far: " + session.getNumRecordsScraped() );

resetNumRecordsScraped

void session.resetNumRecordsScraped ( ) (enterprise editions only)

Description

Reset the count on the number of scraped records.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Reset Count

// Clear number of records scraped
session.resetNumRecordsScraped();

setErrorMessage

void session.setErrorMessage ( String errorMessage ) (enterprise edition only)

Description

Set the current error message.

Parameters

errorMessage Desired error message, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Specify an Error Message

// First set the flag indicating that an error occurred.
session.setFatalErrorOccurred( true );

// Append an error message.
session.setErrorMessage( "An error occurred in the scraping session." );

Web Interface Feedback

// Append an error message. Without flagging it as an error.
// This will hijack the error message so it is more just a
// status message. Don't hijack if there was a fatal error.

if ( !session.getFatalErrorOccurred() )
{
session.appendErrorMessage( "Scraping Page: " + session.getv( "PAGE" ) );
}

setFatalErrorOccurred

void session.setFatalErrorOccurred ( boolean fatalErrorOccurred ) (enterprise edition only)

Description

Set the fatal error status of the scrape.

Parameters

fatalErrorOccurred Desired fatal error status to set, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Set Fatal Error Flag

// Set the flag indicating that an error occurred.
session.setFatalErrorOccurred( true );

setNumRecordsScraped

void session.setNumRecordsScraped ( Object value ) (enterprise edition only)

Description

Set the number of records that have been scraped.

Parameters

value Value to set the count of the number of records scraped.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Set the Number of Records Scraped

// Sets the value of the number of records scraped to 10.
session.setNumRecordsScraped( 10 );

addEventCallback

void session.addEventCallback ( EventFireTime eventTime, EventHandler callback ) (professional and enterprise editions only)
void session.addEventCallbackWithPriority ( EventFireTime eventTime, EventHandler callback, int priority ) (professional and enterprise editions only)

Description

Add a runnable that will be executed at the given time.

Note: session.addEventCallback is automatically executed at a priority of 0.

Parameters

eventTime The time to execute a callback.
callback The callback to execute.
priority The prority for this callback. Lower numbers are higher priority.

Return Values

Returns void.

Change Log

Version	Description
6.0.55a	Introduced for pro and enterprise editions.

Examples

Sets a handler to do something after the scripts set to run at the end of the session have run.

// using the default callback with the priority being 0.
session.addEventCallback(SessionEventFireTime.AfterEndScripts, handler);

// if we need to set the priority to be something else (or variable) use the second option
// in this case the priority could still be set to 0 if you wanted to.
session.addEventCallbackWithPriority(SessionEventFireTime.AfterEndScripts, handler, 3);

More Examples

EventFireTime

The EventFireTime is an interface which defines the methods that a fire time must have and so the addEventCallback method can take different types of fire times.

A number of different types of classes based on this interface have been defined for you which call out the various parts of a scrape that you can add event handlers to. Those are defined below.

ExtractorPatternEventFireTime

Enum

BeforeExtractorPattern Before an extractor is applied (including before any scripts on it run). The returned value should be a boolean and indicates whether the extractor should be run or not. Any non-boolean result is the same as true. Also note that regardless of whether the extractor will be run or not, the event for after extractor pattern will still be fired.
AfterExtractorPatternAppliedButBeforeScripts After an extractor is applied (but before any scripts on it run &emdash; including the after apparent match scripts).
AfterEachExtractorMatch After each match of an extractor. This will be applied before any of the "After each pattern match" scripts are applied.
AfterExtractorPattern After an extractor is applied (including any scripts on it run).

Change Log

Version	Description
6.0.55a	Introduced for pro and enterprise editions.

Examples

How to use the EventFireTime with the session.addEventcallback method.

session.addEventCallback(ExtractorPatternEventFireTime.AfterEachExtractorMatch, handler);

ScrapeableFileEventFireTime

Enum

BeforeScrapeableFile Before a scrapeable file is launched (inlcuding before any scripts on it run).
BeforeHttpRequest Fired right before the http request (after any "before scrapeable fie" scripts, and wil fire each time the request is retired). If it returns a non-null String, that will be used as the response instead of issuing a request. This response will still get passed into the AfterHttpRequest even, but it will not pass through any tidying.
AfterHttpRequest Fire right after the http response and running tidy, if set, but before anything else happens. Returns the data that should be used as the response data.
AfterScrapeableFile After a scrapeable file is completed (including afer any scripts on it run).
OnHttpRedirect^* Called when a redirect will occur, and returns true if a redirect should occur or false if it should not (any non boolean results in no chanage).

^*Note: When using the Async HTTP client you will have access to the request builder from ScrapeableFileEventData.getRedirectRequestBuilder() which can be used to modify and adjust the request before it is sent. If you use the Apache HTTP client the getRedirectRequestBuilder() method will always return null.

Change Log

Version	Description
6.0.55a	Introduced for pro and enterprise editions.

Examples

How to use the EventFireTime with the session.addEventcallback method.

session.addEventCallback(ScrapeableFileEventFireTime.BeforeScrapeableFile, handler);

getRedirectToURL

String scrapeableFileEventData.getRedirectToURL ( )

Description

Returns the RedirectToURL value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the RedirectToURL value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the redirect URL

public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
String url = data.getRedirectToURL();

// do something
}

ScriptEventFireTime

Enum

AfterScript After a script is executed
BeforeScript Before a script is executed
OnScriptEnd Run when the script finishes executing. The difference between AfterScript and this is that AfterScript fires after the script is done running, and this runs after all the developer code has run but the script engine is still active. The return value is an injected string to execute, or null (or the empty string) to do nothing aside from execute the script code.
OnScriptError Executes when an error occurs in a script.
OnScriptStart Run when the script beings to execute. The difference between BeforeScript and this is that BeforeScript fires as preparation is made to launch a script, and this runs after all the default pre-script code is executed by the script engine, but before the developer code in the script. The return value is an injected string to execute, or null (or the empty string) to do nothing aside from execute the script code.

Change Log

Version	Description
6.0.55a	Introduced for pro and enterprise editions.

Examples

How to use the EventFireTime with the session.addEventcallback method.

session.addEventCallback(ScriptEventFireTime.OnScriptEnd, handler);

SessionEventFireTime

Enum

AfterEndScripts After the scrape finishes and all
NumRecordsSavedModified When the ScrapingSession.addToNumRecordsScraped(Object) is called, this will also be called. The returned value will be the actual value to add.
StopScrapingCalled When the session is stopped, either by calling the stopScraping method or clicking the stop scraping button in the workbench.
SessionVariableSet^* Called whenever a session variable is set. This is called before the value is actually set. The variable value passed in will be the new value to be set, and the return value of the handler will be the actual value returned.
SessionVariableRetrieved^* Called whenever a session variable is retrieved. This is called after the value is retrieved. The variable value passed in will be the current value, and the return value of the handler will be the actual value returned.

^*Note: Calling a setVariable or getVariable method in here WILL trigger the events for those again. Avoid infinite recursion please!

Change Log

Version	Description
6.0.55a	Introduced for pro and enterprise editions.

Examples

How to use the EventFireTime with the session.addEventcallback method.

session.addEventCallback(SessionEventFireTime.AfterEndScripts, handler);

StringOperationEventFireTime

Enum

HttpParameterEncodeKey Called when an http parameter key (GET or POST) is encoded. The input string will be the value that is already encoded, and the return value should be the value to actually use.
HttpParameterEncodeValue Called when an http parameter value (GET or POST) is encoded. The input string will be the value that is already encoded, and the return value should be the value to actually use.

Change Log

Version	Description
6.0.55a	Introduced for pro and enterprise editions.

Examples

How to use the EventFireTime with the session.addEventcallback method.

session.addEventCallback(StringOperationEventFireTime.HttpParameterEncodeKey, handler);

EventHandler

EventHandler EventHandler ( ) (professional and enterprise editions only)

Description

Creates an EventHandler callback object which will be called when the event triggers

Change Log

Version	Description
6.0.55a	Introduced for pro and enterprise editions.

Examples

Define a handler for the session.addEventCallback to use.

// Create an EventHandler object which will be called when the event triggers
EventHandler handler = new EventHandler()
{
/**
* Returns the name of the handler. This method doens't need to be implemented
* but helps with debugging (on error executing the callback it will output this)
*/
public String getHandlerName()
{
return "A test event handler";
}

/**
* Processes the event, and potentially returns a useful value modifying something
* in the internal code
*
* @param fireTime The fire time of the event. This helps when using the same handler
* for multiple event times, to determine which was called
* @param data The actual data from the event. Based on the event time this
* will be a different type. It could be SessionEventData, ScrapeableFileEventData,
* ScriptEventData, StringEventData, etc... It will match the fire time class name
*
* @return A value indicating how to proceed (or sometimes the value is ignored)
*/
public Object handleEvent(EventFireTime fireTime, AbstractEventData data)
{
// While you can specifically grab any data from the data object,
// if this is a method that has a return value that matters,
// it's best to get it as the last return value, so that multiple
// events can be chained together. The input data object
// will always have the original values for all the other getters
Object returnValue = data.getLastReturnValue();

// Do stuff...

// The EventFireTime values describe in the documentation what the return
// value will do, or says nothing about it if the value is ignored
// If you don't intend to modify the return, always return data.getLastReturnValue();
return returnValue;
}
};

getHandlerName

String getHandlerName ( )

Description

Returns the name of the handler. This method doesn't need to be implemented but helps with debugging.

Parameters

This method does not receive any parameters.

Return Values

Returns the name of the handler. This method doesn't need to be implemented but helps with debugging.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

handleEvent

Object handleEvent ( EventFireTime fireTime, AbstractEventData data )

Description

Processes the event, and potentially returns a useful value modifying something in the internal code as defined by the EventFireTime used to launch this event.

Parameters

fireTime Defines the methods that a fire time must have.
data Allows for the accessing of various data values found within ScreenScraper dependent on the class used.

Return Values

Returns a value based on which AbstractEventData class is used.

Change Log

Version	Description
6.0.55a	Available for all editions.

EventHandler handler = new EventHandler()
{
public String getHandlerName()
{
// return something
}

/**
* Processes the event, and potentially returns a useful value modifying something
* in the internal code
*
* @param fireTime The fire time of the event. This helps when using the same handler
* for multiple event times, to determine which was called
* @param data The actual data from the event. Based on the event time this
* will be a different type. It could be SessionEventData, ScrapeableFileEventData,
* ScriptEventData, StringEventData, etc... It will match the fire time class name
*
* @return A value indicating how to proceed (or sometimes the value is ignored)
*/
public Object handleEvent(EventFireTime fireTime, AbstractEventData data)
{
// While you can specifically grab any data from the data object,
// if this is a method that has a return value that matters,
// it's best to get it as the last return value, so that multiple
// events can be chained together. The input data object
// will always have the original values for all the other getters
Object returnValue = data.getLastReturnValue();

// Do stuff...

// The EventFireTime values describe in the documentation what the return
// value will do, or says nothing about it if the value is ignored
// If you don't intend to modify the return, always return data.getLastReturnValue();
return returnValue;
}
};

AbstractEventData

The AbstractEventData class is an abstract class which allows for the accessing of various data values found within ScreenScraper. Below are the various classes that extend AbstractEventData

AbstractEventData is extended by the following classes and it is those classes that should be used in place of AbstractEventData.

getLastReturnValue

Object getLastReturnValue ( )

Description

Returns the LastReturnValue for the object. This is the value previously returned by another callback. This can be null, if no callbacks have been fired yet for this event. A null value is also the default return value for the given event.

Parameters

This method does not receive any parameters.

Return Values

Returns the LastReturnValue for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Write to Log

// In practice AbstractEventData is just the abstract class.
// You must actually use one of the classes that extend it.
public Object handleEvent(EventFireTime fireTime, AbstractEventData data) {
// While you can specifically grab any data from the data object,
// if this is a method that has a return value that matters,
// it's best to get it as the last return value, so that multiple
// events can be chained together. The input data object
// will always have the original values for all the other getters
Object returnValue = data.getLastReturnValue();

// do something

// The EventFireTime values describe in the documentation what the return
// value will do, or says nothing about it if the value is ignored
// If you don't intend to modify the return, always return data.getLastReturnValue();
return data.getLastReturnValue();
}

setLastReturnValue

void setLastReturnValue ( Object lastReturnValue )

Description

Sets the LastReturnValue fro the object.

Parameters

lastReturnValue The new value for the LastReturnValue

Return Values

Returns void.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

// In practice AbstractEventData is just the abstract class.
// You must actually use one of the classes that extend it.
public Object handleEvent(EventFireTime fireTime, AbstractEventData data) {

Object foo = // something here;
data.setLastReturnValue(foo);

// do something

// The EventFireTime values describe in the documentation what the return
// value will do, or says nothing about it if the value is ignored
// If you don't intend to modify the return, always return data.getLastReturnValue();
return data.getLastReturnValue();
}

ExtractorPatternEventData

ExtractorPatternEventData extends AbstractEventData

This contains the data for various extractor pattern operations

Inherits the following methods from AbstractEventData

extractorPatternTimedOut

boolean extractorPatternEventData.extractorPatternTimedOut ( )

Description

Returns the status of the extractor pattern timeout. Returns true if and only if the extractor pattern was applied and timed out while doing so. Otherwise it will return false.

Parameters

This method does not receive any parameters.

Return Values

Returns a boolean value representing the status of the extractor pattern timeout.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Determine if an extractor pattern has timed out.

public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
if (data.extractorPatternTimeOut()) {
// do something
}
}

getDataRecord

DataRecord extractorPatternEventData.getDataRecord ( )

Description

Returns the DataRecord value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the DataRecord value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current DataRecord.

public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
DataRecord dr = data.getDataRecord();

// do something
}

getDataSet

DataSet extractorPatternEventData.getDataSet ( )

Description

Returns the DataSet value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the DataSet value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current DataSet.

public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
DataSet ds = data.getDataSet();

// do something
}

getExtractorPattern

ExtractorPattern extractorPatternEventData.getExtractorPattern ( )

Description

Returns the ExtractorPattern value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the ExtractorPattern value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current ExtractorPattern.

public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
ExtractorPattern pattern = data.getExtractorPattern();

// do something
}

getScrapeableFile

ScrapeableFile extractorPatternEventData.getScrapeableFile ( )

Description

Returns the Scrapeablefile value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the Scrapeablefile value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current ScrapeableFile.

public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
ScrapeableFile sf = data.getScrapeableFile();

// do something
}

getSession

ScrapingSession extractorPatternEventData.getSession ( )

Description

Returns the Session value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the Session value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current Session.

public Object handleEvent(EventFireTime fireTime, ExtractorPatternEventData data) {
ScrapingSession _session = data.getSession();

// do something
}

ScrapeableFileEventData

ScrapeableFileEventData extends AbstractEventData

This contains the data for various scrapeable file operations

Inherits the following methods from AbstractEventData

getHttpResponseData

String scrapeableFileEventData.getHttpResponseData ( )

Description

Returns the HttpResponseData for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the HttpResponseData for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the HttpResponseData

public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
String responseData = data.getHttpResponseData();

// do something
}

getRedirectRequestBuilder

ScrapingRequest.Builder scrapeableFileEventData.getRedirectRequestBuilder ( )

Description

Returns the RedirectRequestBuilder for the object. Use this to add headers, etc... for the redirect. It can be null depending on the HTTP client being used, and whether or not it supports manually playing with the redirect.

Parameters

This method does not receive any parameters.

Return Values

Returns the RedirectRequestBuilder for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the Request Builder in order to modify it.

public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
ScrapingRequest.Builder builder = data.getRedirectRequestBuilder();

// do something
}

getScrapeableFile

ScrapeableFile scrapeableFileEventData.getScrapeableFile ( )

Description

Returns the Scrapeablefile value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the Scrapeablefile value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current ScrapeableFile.

public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
ScrapeableFile sf = data.getScrapeableFile();

// do something
}

getSession

ScrapingSession scrapeableFileEventData.getSession ( )

Description

Returns the Session value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the Session value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current Session.

public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data) {
ScrapingSession _session = data.getSession();

// do something
}

ScriptEventData

ScriptEventData extends AbstractEventData

This contains the data for various script operations

Inherits the following methods from AbstractEventData

getDataRecord

DataRecord scriptEventData.getDataRecord ( )

Description

Returns the DataRecord value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the DataRecord value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current DataRecord.

public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
DataRecord dr = data.getDataRecord();

// do something
}

getDataSet

DataSet scriptEventData.getDataSet ( )

Description

Returns the DataSet value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the DataSet value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current DataSet.

public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
DataSet ds = data.getDataSet();

// do something
}

getScrapeableFile

ScrapeableFile scriptEventData.getScrapeableFile ( )

Description

Returns the Scrapeablefile value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the Scrapeablefile value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current ScrapeableFile.

public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
ScrapeableFile sf = data.getScrapeableFile();

// do something
}

getScriptException

java.lang.Exception scriptEventData.getScriptException ( )

Description

Returns the ScriptException for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the ScriptException for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the script exception

public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
java.lang.Exception e = data.getScriptException();

// do something
}

getScriptName

String scriptEventData.getScriptName ( )

Description

Returns the ScriptName value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the ScriptName value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the script name

public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
String name = data.getScriptName();

// do something
}

getSession

ScrapingSession scriptEventData.getSession ( )

Description

Returns the Session value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the Session value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current Session.

public Object handleEvent(EventFireTime fireTime, ScriptEventData data) {
ScrapingSession _session = data.getSession();

// do something
}

SessionEventData

SessionEventData extends AbstractEventData

This contains the data for various session operations

Inherits the following methods from AbstractEventData

getIncrementRecordsAmount

Object sessionEventData.getIncrementRecordsAmount ( )

Description

Returns the IncrementRecordsAmount value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the IncrementRecordsAmount value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current increment records amount.

public Object handleEvent(EventFireTime fireTime, SessionEventData data) {
Object recordsAmt = data.getIncrementRecordsAmount();

// do something
}

getSession

ScrapingSession sessionEventData.getSession ( )

Description

Returns the Session value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the Session value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current Session.

public Object handleEvent(EventFireTime fireTime, SessionEventData data) {
ScrapingSession _session = data.getSession();

// do something
}

getVariableName

String sessionEventData.getVariableName ( )

Description

Returns the VariableName value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the VariableName value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the variable name.

public Object handleEvent(EventFireTime fireTime, SessionEventData data) {
String name = data.getVariableName();

// do something
}

getVariableValue

Object sessionEventData.getVariableValue ( )

Description

Returns the VariableValue value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the VariableValue value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Get the current Session.

public Object handleEvent(EventFireTime fireTime, SessionEventData data) {
Object value = data.getVariableValue();

// do something
}

StringEventData

StringEventData extends AbstractEventData

This contains the data for various string operations

Inherits the following methods from AbstractEventData

getInput

String stringEventData.getInput ( )

Description

Returns the Input value for the object.

Parameters

This method does not receive any parameters.

Return Values

Returns the Input value for the object.

Change Log

Version	Description
6.0.55a	Available for all editions.

Examples

Write to Log

public Object handleEvent(EventFireTime fireTime, StringEventData data) {
String str = data.getInput();

// do something
}

addToVariable

void session.addToVariable ( String variable, int value ) (professional and enterprise editions only)

Description

Add to the value of a session variable.

Parameters

variable Key of the variable, as a string.
value Value to be added to the variable, as a integer.

Return Values

Returns void. If the variable doesn't exist, or is not a string or integer, a message will be added to the log. If it cannot add to the variable for any other reason it will write an error to the log.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Increment Variable

// Increments the session variable "PAGE_NUM" by one.
session.addToVariable( "PAGE_NUM", 1 )

breakpoint

void session.breakpoint ( ) (professional and enterprise editions only)

Description

Pause scrape and display breakpoint window. If the scrape is running in server mode, to avoid the break, logVariables will be called in place of breakpoint.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Examples

Open BreakPoint Window

// Causes the breakpoint window to be displayed.
session.breakpoint();

clearAllSessionVariables

void session.clearAllSessionVariables ( )

Description

Remove all session variables.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Clear Session Variables

// Clear all session variables.
session.clearAllSessionVariables();

clearCookies

void session.clearCookies ( ) (enterprise edition only)

Description

Clear stored cookies.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Clear Cookies

// Clear all current cookies,
session.clearCookies();

clearVariables

void session.clearVariables ( Map variables ) (professional and enterprise editions only)
void session.clearVariables ( Collection variables ) (professional and enterprise editions only)

Description

Clears the value of all session variables that match the keys in the Map. This will ignore a key of DATARECORD.

This method is provided using a Map or Collection rather than a List or Set to work easier with the setSessionVariables method.

Parameters

Map The map to use when clearing the session variables.
Collection The collection to use when clearing the session variables.

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Changed from session.removeSessionVariablesInMap to session.clearVariables.

Examples

Clear the ASPX values for a .NET site after scraping the next page

DataRecord aspx = scrapeableFile.getASPXValues();

session.setSessionVariables(aspx);
session.scrapeFile("Next Results");
session.clearVariables(aspx);

convertHTMLEntitiesInVariable

void session.convertHTMLEntitiesInVariable ( String variable )

Description

Decode HTML Entities on a session variable.

Parameters

variable Session variable whose HTML Entities will be converted to characters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Decode HTML Entities In Variable

// Set variable
session.setv( "LOCATION", "Angela's Room" );

// Convert HTML entities
session.convertHTMLEntitiesInVariable( "LOCATION" );

// Write to Log
session.log( session.getv( "LOCATION" ) ); //logs Angela's Room

downloadFile

boolean session.downloadFile ( String url, String fileName ) (professional and enterprise editions only)
boolean session.downloadFile ( String url, String fileName, int maxNumAttempts ) (professional and enterprise editions only)
boolean session.downloadFile ( String url, String fileName, int maxNumAttempts, boolean doLazy ) (enterprise edition only)

Description

Downloads the file to the local file system.

Parameters

url URL reference to the desired file, as a string.
fileName Local file path when the file should be saved, as a string.
maxNumAttempts (optional) Number of times the file will be requested without success, as an integer. Defaults to 3.
doLazy (optional) Whether the file should be downloaded in a separate thread, as a boolean. Defaults to false.

Return Values

Returns true on successful download of the file otherwise it return false.

Change Log

Version	Description
4.5	Available for professional and enterprise editions. Lazy scrape only available for enterprise edition.

If the file to download requires that POST data is sent in order to get the file you would use saveFileOnRequest with a scrapeable file.

Using this method in a script takes the place of requesting the target URL as a scrapeable file.

Examples

Download File in a Separate Thread

// Downloads the image pointed to by the URL to the local C: drive.
// A maximum number of 5 attempts will be made to download the file,
// and the file will be downloaded in its own thread.

session.downloadFile( "http://www.foo.com/imgs/puppy_image.gif", "C:/images/puppy.gif", 5, true );

executeScript

void session.executeScript ( String scriptName ) (professional and enterprise editions only)

Description

Manual start the execution of a script.

Parameters

scriptName Name of the script to execute, as a string. The script has to be on the same instance of screen-scraper as the scraping session.

Return Values

Returns void. If the file doesn't exist a message will be written to the log. If the called script has an error in it a warning will be written to the log.

Change Log

Version	Description
5.0	Scripts called using this method are now exported with the scraping session.
4.5	Available for professional and enterprise editions.

Examples

Execute Script

// Executes the script "My Script".
session.executeScript( "My Script" );

executeScriptWithContext

void session.executeScriptWithContext ( String scriptName ) (professional and enterprise editions only)

Description

Executes the named script, but preserves the current context (dataRecord, scrapeableFile, etc...)

Parameters

scriptName The name of the script to execute.

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Execute a script, but preserve the context

// Execute the 'Do more stuff' script, but give it access to the scrapeableFile this script has access to.
session.executeScriptWithContext("Do more stuff");

getCharacterSet

String session.getCharacterSet ( )

Description

Get the general character set being used in page response renderings.

Parameters

This method does not receive any parameters.

Return Values

Returns the character set applied to the scraping session's files, as a string. If a character set has not been specified then it will default to the character set specified in settings dialog box.

Change Log

Version	Description
4.5	Available for all editions.

True if debug mode is enabled, false otherwise.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set some hardcoded values to use when the scrape is being developed

// Comment out the line below for production
session.setDebugMode(true);

if(session.getDebugMode())
{
session.setVariable("SEARCH_TERM", "DVDs");
session.setVariable("USERNAME", "some user");
session.setVariable("PASSWORD", "the password");
}

getDefaultRetryPolicy

RetryPolicy session.getDefaultRetryPolicy ( ) (professional and enterprise editions only)

Description

Gets the default retry policy to be used by each scrapeable file when one wasn't set for it.

Parameters

This method takes no parameters

Return Value

The default return policy, or null if there isn't one

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Check for a default RetryPolicy

if(session.getDefaultRetryPolicy() == null)
{
session.logWarn("No default retry policy specified");
}

getElapsedRunningTime

long session.getElapsedRunningTime ( ) (professional and enterprise editions only)

Description

Get how long the current session has been running.

Parameters

This method does not receive any parameters.

Return Values

Returns number of milliseconds the scrape has been running, as a long (8-byte integer).

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

Version	Description
5.0	Added for all editions.

Examples

Check If More Scripts Can Be Run

getRetainNonTidiedHTML

boolean session.getRetainNonTidiedHTML ( ) (enterprise edition only)

Description

Determine whether or not non-tidied HTML is to be retained for all scrapeable files in this scraping session.

Parameters

This method does not receive any parameters.

Return Values

Returns whether non-tidied HTML is be retained for all scrapeable files or not, as a boolean.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Determine if Non-tidied HTML is Being Retained

// Outputs the non-tidied HTML from the scrapeable file
// to the log if it was retained otherwise just a message.

if (session.getRetainNonTidiedHTML())
{
session.log( "All scrapeable files will retain non-tidied HTML" );
}
else
{
session.log( "Non-tidied HTML will not be not retained." );
}

getScrapeableSessionID

int session.getScrapeableSessionID ( ) (enterprise edition only)

Description

Get the unique identifier for the scraping session.

Parameters

This method does not receive any parameters.

Return Values

Returns unique session id for the scraping session, as an integer.

Change Log

Version	Description
5.0	Added for enterprise edition.

Examples

Retrieve Unique ID

// Get Unique ID
int i = session.getScrapeableSessionID();

getStartTime

long session.getStartTime ( )

Description

Retrieve the time at which the scrape started.

Parameters

This method does not receive any parameters.

Return Values

Returns the start time of the scrape in milliseconds, as a long.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Get Session Start Time

// Retrieves the start time and places it
// in the variable "start".

start = session.getStartTime();

getTimeZone

TimeZone session.getTimeZone ( )

Description

Gets the current time zone of the Scraping Session

Parameters

This method takes no parameters.

Return Value

The time zone this scrape is set to.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Get the current Time Zone in use

TimeZone currentTimeZone = session.getTimeZone();

getVariable

Object session.getVariable ( String identifier )

Description

Retrieve the value of a saved session variable.

Parameters

identifier The name of the variable whose value is to be retrieved, as a string.

Return Values

Returns the value of the session variable. This will be a string unless you have used setVariable to place something other than a string into a session variable.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Retrieve Session Variable

// Places the session variable "CITY_CODE" in the local
// variable "cityCode".

cityCode = session.getVariable( "CITY_CODE" );

getv

Object session.getv ( String identifier )

Description

Retrieve the value of a saved session variable (alias of getVariable).

Parameters

identifier The name of the variable whose value is to be retrieved, as a string.

Return Values

Returns the value of the session variable. This will be a string unless you have used setVariable to place something other than a string into a session variable.

Change Log

Version	Description
4.5	Added for all editions.

Examples

Retrieve Session Variable

// Places the session variable "CITY_CODE" in the local
// variable "cityCode".

cityCode = session.getv( "CITY_CODE" );

isRunningFromCommandLine

boolean session.isRunningFromCommandLine ( )

Description

Returns whether or not we are currently running in the command line. This is a convenience method for doing something different in a script when running in the command line as opposed to other modes

Parameters

This method does not receive any parameters.

Return Values

Returns true if and only if the scrape is currently running in the command line.

Change Log

Version	Description
6.0.37a	Introduced for all editions.

Examples

Retrieve Connection Timeout

if (session.isRunningFromCommandLine()) {
// do something only done in the command line
}

isRunningInServer

boolean session.isRunningInServer ( )

Description

Returns whether or not we are currently running in the server. This is a convenience method for doing something different in a script when running in the server as opposed to other modes

Parameters

This method does not receive any parameters.

Return Values

Returns true if and only if the scrape is currently running in the server.

Change Log

Version	Description
6.0.37a	Introduced for all editions.

Examples

Retrieve Connection Timeout

if (session.isRunningInServer()) {
// do something only done in the server
}

isRunningInWorkbench

boolean session.isRunningInWorkbench ( )

Description

Returns whether or not we are currently running in the workbench. This is a convenience method for doing something different in a script when running in the workbench as opposed to other modes

Parameters

This method does not receive any parameters.

Return Values

Returns true if and only if the scrape is currently running in the workbench.

Change Log

Version	Description
6.0.37a	Introduced for all editions.

Examples

Retrieve Connection Timeout

if (session.isRunningInWorkbench()) {
// do something only done in workbench
}

loadStateFromString

boolean session.loadStateFromString ( String stateXML ) (professional and enterprise editions only)

Description

Loads the state that would have been previously saved by invoking the session.saveStateToString method.

Parameters

stateXML A string representing session state.

Return Values

None

Change Log

Version	Description
5.5.30a	Available in Professional and Enterprise editions.

Examples

Load state in from a file

import org.apache.commons.io.FileUtils;

File f = new File( "session_state.xml" );
sessionState = FileUtils.readFileToString( f, session.getCharacterSet() );

session.loadStateFromString( sessionState );

loadVariables

void session.loadVariables ( String fileToReadFrom ) (enterprise edition only)

Description

Load session variables from a file.

Parameters

fileToReadFrom File path of the file that contains the session variables, as a string.

Return Values

Returns void. If there is a problem retrieving the file contents an I/O error will be written to the log.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Load Session Variables from File

// Reads in variables from the file located at "C:\myvars.txt".
// Note that a forward slash is used instead of a back slash
// as a folder delimiter. If back slashes were used, they
// would need to be doubled so that they're properly escaped
// out for the script interpreter.

session.loadVariables( "C:/myvars.txt" );

Sample Variables File

BIRTHDAY=12%2F25
NAME=Santa
AGE=Unknown

saveStateToString

boolean session.saveStateToString ( boolean saveCookies, boolean saveVariables ) (professional and enterprise editions only)

Description

Saves the current state of the scraping session to a string. An example use case for this method would be a scraping session that logs in to a site, extracts some information, and then is stopped, saving its state out to a file. A second scraping session could then be run, loading the state back in from the file, which would keep the session logged in so that other information could be obtained without logging in once again. By default the scraping session will save out information such as the URL to use as a referer. More information can be saved using the boolean flags described below.

Parameters

saveCookies Whether or not cookies should be saved.
saveVariables Whether or not session variables should be saved.

Return Values

None

Change Log

Version	Description
5.5.30a	Available in Professional and Enterprise editions.

Examples

Save out state to a file

// Put the current state in a local variable.
sessionState = session.saveStateToString( true, true );

// Write the state out to a file.
sutil.writeValueToFile( sessionState, "session_state.xml", session.getCharacterSet() );

saveVariables

void session.saveVariables ( String fileToSaveTo ) (enterprise edition only)

Description

Saves all current string and integer variables to a file.

Parameters

fileToSaveTo File path where the file should be saved, as a string.

Return Values

Returns void. If there is a problem retrieving the file contents an I/O error will be written to the log.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Save Session Variables to File System

// Saves the current session variables out to C:\myvars.txt.
// Note that a forward slash is used instead of a back slash
// as a folder delimiter. If back slashes were used, they
// would need to be doubled so that they're properly escaped
// out for the script interpreter.

session.saveVariables( "C:/myvars.txt" );

scrapeFile

void session.scrapeFile ( String scrapeableFileIdentifier )

Description

Manually scrape a scrapeable file.

Parameters

scrapeableFileIdentifier Name of the scrapeable file, as a string.

Return Values

Returns void. If there is a problem accessing the scrapeable file an message will be written to the log.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Scrape File Manually

// Causes the scrapeable file "Login" to be requested.
session.scrapeFile( "Login" );

scrapeString

boolean session.scrapeString ( String scrapeableFileName, String content ) (professional and enterprise editions only)

Description

Invokes a scrapeable file using a string of content instead of a web page or local file.

Parameters

scrapeableFileName The scrapeable file to be invoked.
content The content to load.

Return Values

None

Change Log

Version	Description
5.5.13a	Available in all editions.

Examples

Invoke a scrapeable file using a string

content = session.getv( "PARTIAL_PAGE_CONTENT" );
session.scrapeString( "My Scrapeable File", content );

sendDataToClient

void session.sendDataToClient ( String key, Object value ) (enterprise edition only)

Description

Send data to the external script that initiated the scrape. This isn't currently supported with all drivers (e.g., remote scraping session), check the documentation on the language of the external script for more information.

Parameters

key Name of the information being sent, as a string.
value Data to be processed by external script, supported types are Strings, Integers, DataRecords, and DataSets.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Send dataRecord to Client

// Causes the current DataRecord object to be sent to the client
// for processing.

session.sendDataToClient( "MyDataRecord", dataRecord );

setCharacterSet

void session.setCharacterSet ( String characterSet )

Description

Set the general character set used in page response renderings. This can be particularly helpful when the pages render characters incorrectly.

Parameters

characterSet Java recognized character set, as a string. Java provides a list of supported character sets in its documentation.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

This method must be invoked before the session starts.

If you are having trouble with characters displaying incorrectly, we encourage you to ready about how to go about finding a solution using one of our FAQs.

Examples

Set Character Set of All Scrapeable Files

// In script called "Before scraping session begins"

// Sets the character set to be applied to the last responses
// of all scrapeable files in session.

session.setCharacterSet( "ISO-8859-1" );

setConnectionTimeout

void session.setConnectionTimeout ( int timeout )

Description

Set the timeout value for scrapeable files in the session.

Parameters

timeout The length of the timeout in seconds, as an integer.

Return Values

Returns void.

Change Log

Version	Description
5.0.1a	Introduced for all editions.

Examples

Set Connection Timeout

// set connection timeout to 15 seconds
session.setConnectionTimeout( 15 );

setCookie

void session.setCookie ( String domain, String key, String value ) (professional and enterprise editions only)

Description

Manually set a cookie in the current session state.

Parameters

domain The domain to which the cookie pertains, as a string.
key The name of the cookie, as a string.
value The value of the cookie, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for professional and enterprise editions.

This method should be rarely used as screen-scraper automatically manages cookies. In cases where cookies are set via JavaScript, this function might be necessary.

Examples

Manually Set Cookie

// Sets a cookie associated with "mydomain.com", using the
// key "user" and the value "John Smith".

session.setCookie( "mydomain.com", "user", "John Smith" );

setDebugMode

void session.setDebugMode ( boolean debugMode )

Description

Sets the debug state for the scrape. Enabled debug mode simply outputs a warning periodically while running, to help prevent running a production scrape in debug mode.

Parameters

debugMode True to enable debug mode, false to disable it.

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set some hardcoded values to use when the scrape is being developed

setDefaultRetryPolicy

void session.setDefaultRetryPolicy ( RetryPolicy retryPolicy ) (professional and enterprise editions only)

Description

Sets a retry policy that will affect all files in the scrape. This policy will be used by all scrapeable files that do not have a retry policy set for them. If a retry policy was manually set for them, this one will not be used.

Parameters

retryPolicy The retry policy to use by default, if no other retry policy is set.

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Create a defaul RetryPolicy

import com.screenscraper.util.retry.RetryPolicyFactory;

// Use a retry policy that will rotate the proxy if there was an error on request
session.setDefaultRetryPolicy(RetryPolicyFactory.getBasicPolicy(5, "Get new proxy"));

setKeyStoreFilePath

void session.setKeyStoreFilePath ( String filePath ) (professional and enterprise editions only)

Description

Sets the path to the keystore file. Some web sites require a special type of authentication that requires the use of a keystore file. See our blog entry on Using Client Certificates for more detail. Calling this method is the equivalent of setting the corresponding value under the "Advanced" tab for the scraping session in the workbench.

Parameters

filePath The path to the keystore file.

Return Values

None

Change Log

Version	Description
5.5.10a	Available in all editions.

Examples

Set the path to the keystore file

// Set the path.
session.setKeyStoreFilePath( "~/key_files/my_key.crt" );

// Output the current path.
session.log( "Keystore file path is: " + session.getKeyStoreFilePath() );

setKeyStorePassword

void session.setKeyStorePassword ( String password ) (professional and enterprise editions only)

Description

Sets the password for the keystore file. Some web sites require a special type of authentication that requires the use of a keystore file. See our blog entry on Using Client Certificates for more detail. Calling this method is the equivalent of setting the corresponding value under the "Advanced" tab for the scraping session in the workbench.

Parameters

filePath The password for the keystore file.

Return Values

None

Change Log

Version	Description
5.5.10a	Available in all editions.

Examples

Set the path to the keystore file

// Set the password.
session.setKeyStorePassword( "My_password" );

// Output the current password.
session.log( "Keystore password is: " + session.getKeyStorePassword() );

setLoggingLevel

void session.setLoggingLevel ( int loggingLevel )

Description

Set the logging level of the scrape.

Parameters

loggingLevel Level of logging that should be used, as an integer. It works best if you use the Notifiable interface in case levels are ever changed.

Return Values

Returns void.

Change Log

Version	Description
5.0.1a	Introduced for all editions.

Examples

Set Logging Level

// get logging level
logLevel = session.getLoggingLevel();

if (logLevel < Notifiable.LEVEL_WARN )
{
session.setLoggingLevel( Notifiable.LEVEL_WARN );
}

setMaxConcurrentFileDownloads

void session.setMaxConcurrentFileDownloads ( int maxConcurrentFileDownloads ) (professional and enterprise editions only)

Description

Set the maximum number of concurrent file downloads to a allow.

Parameters

maxConcurrentFileDownloads The maximum number of downloads to allow, as an integer.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for professional and enterprise editions.

Examples

Set Max for Concurrent File Downloads

// Limit the number of concurrent file downloads to 10
session.setMaxConcurrentFileDownloads( 10 );

setMaxHTTPRequests

void session.setMaxHTTPRequests ( int maxAttempts ) (professional and enterprise editions only)

Description

Set the number of attempts that scrapeable files should make to get the requested page.

Parameters

maxAttempts The number of attempts that will be made, as a integer.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.

Examples

Set the Retry Value

// Set retries for files
session.setMaxHTTPRequests( 3 );

setMaxScriptsOnStack

void session.setMaxScriptsOnStack ( int maxScriptsOnStack ) (enterprise edition only)

Description

Get the total number of scripts that can be running concurrently. Default value for maxScriptsOnStack is 50.

Parameters

maxScriptsOnStack Number of scripts to be allowed to run concurrently, as an integer.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for enterprise edition.

Before you start upping the value of the number of scripts that can be on the stack you should make sure that your scrape is not eating more then it should. One thing to consider is recursion instead of iterating. This is discussed in more details on our blog or in the Tips, Tricks, and Samples section of this site.

Examples

Allocate More Resources to Scrape

// Allow for 100 scripts (instead of 50)
session.setMaxScriptsOnStack(100);

setRandomizeUserAgent

void session.setRandomizeUserAgent ( boolean randomizeUserAgent ) (professional and enterprise editions only)

Description

Causes the "User-Agent" header sent by screen-scraper to be randomized. The user agent strings from which screen-scraper will select are found in the "resource\conf\user_agents.txt" file.

Parameters

randomizeUserAgent true or false

Return Values

None

Change Log

Version	Description
5.5.34a	Available in Professional and Enterprise editions.

Examples

Randomize the user-agent header

session.setRandomizeUserAgent( true );

// You can also access the current value like so:
session.log( "Randomize user agent: " + session.getRandomizeUserAgent() );

setRetainNonTidiedHTML

void session.setRetainNonTidiedHTML ( boolean retainNonTidiedHTML ) (enterprise edition only)

Description

Set whether or not non-tidied HTML is to be retained for all scrapeable files.

Parameters

retainNonTidiedHTML Whether the non-tidied HTML should be retained, as a boolean. The default is false.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

If, after the file is scraped, you want to be able to use getNonTidiedHTML this method has to be called before a file is scraped.

Examples

Retain Non-tidied HTML

// Tell screen-scraper to retain tidied HTML for the all
// scrapeable files.

session.setRetainNonTidiedHTML( true );

setSessionVariables

void session.setSessionVariables ( Map variables) (professional and enterprise editions only)(professional and enterprise editions only)
void session.setSessionVariables ( Map variables, boolean ignoreLowerCaseKeys)(professional and enterprise editions only)

Description

Sets the value of all session variables that match the keys in the Map to the values in the Map. This will ignore a key of DATARECORD.

Parameters

Map The map to use when setting the session variables.
ignoreLowerCase True if keys with lowercase characters should be ignored. This would include A_KEy

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.
5.5.43a	Changed from session.setSessionVariablesFromMap to session.setSessionVariables.

Examples

Set the ASPX values for a .NET site before scraping the next page

DataRecord aspx = scrapeableFile.getASPXValues();

session.setSessionVariables(aspx);
session.scrapeFile("Next Results");

setStatusMessage

void session.setStatusMessage ( String message ) (enterprise edition only)

Description

Sets a status message to be displayed in the web interface.

Parameters

message The message to be set.

Return Values

None

Change Log

Version	Description
5.5.32a	Available in Enterprise edition.

Examples

Append a status message

if( scrapeableFile.getMaxRequestAttemptsReached() )
{
session.setStatusMessage( "Maximum requests reached for scrapeable file: " + scrapeableFile.getName() );

// Output the current status message.
session.log( "Current status message: " + session.getStatusMessage() );
}

setStopScrapingOnExtractorPatternTimeout

void session.setStopScrapingOnExtractorPatternTimeout ( boolean stopScrapingOnExtractorPatternTimeout ) (professional and enterprise editions only)

Description

If this method is passed the value of true, it will cause screen-scraper to stop the current scraping session if an extractor pattern timeout occurs.

Parameters

stopScrapingOnExtractorPatternTimeout true or false

Return Values

None

Change Log

Version	Description
5.5.36a	Available in Professional and Enterprise editions.

Examples

Indicate that the scraping session should be stopped when an extractor pattern timeout occurs

session.setStopScrapingOnExtractorPatternTimeout( true );

// You can also access the current value like so:
session.log( "Stop scraping on extractor pattern timeout: " + session.getStopScrapingOnExtractorPatternTimeout() );

setStopScrapingOnMaxRequestAttemptsReached

void session.setStopScrapingOnMaxRequestAttemptsReached ( boolean stopScrapingOnMaxRequestAttemptsReached ) (professional and enterprise editions only)

Description

If this method is passed the value of true, it will cause screen-scraper to stop the current scraping session if the maximum attempts to request a file is reached.

Parameters

stopScrapingOnMaxRequestAttemptsReached true or false

Return Values

None

Change Log

Version	Description
5.5.36a	Available in Professional and Enterprise editions.

Examples

Indicate that the scraping session should be stopped if the maximum attempts to request a file is reached

session.setStopScrapingOnMaxRequestAttemptsReached( true );

// You can also access the current value like so:
session.log( "Stop scraping on max attempts reached: " + session.getStopScrapingOnMaxRequestAttemptsReached() );

setStopScrapingOnScriptError

void session.setStopScrapingOnScriptError ( boolean stopScrapingOnScriptError ) (professional and enterprise editions only)

Description

If this method is passed the value of true, it will cause screen-scraper to stop the current scraping session if a script error occurs.

Parameters

stopScrapingOnScriptError true or false

Return Values

None

Change Log

Version	Description
5.5.36a	Available in Professional and Enterprise editions.

Examples

Indicate that the scraping session should be stopped if a script error occurs

session.setStopScrapingOnScriptError( true );

// You can also access the current value like so:
session.log( "Stop scraping on script error: " + session.getStopScrapingOnScriptError() );

setTimeZone

void session.setTimeZone ( String timeZone )
void session.setTimeZone ( TimeZone timeZone )

Description

Sets the time zone that will be used when using a method that returns a time formatted as a string.

Parameters

timeZone The new timezone to use. If null is given, the local timezone will be used.

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set the time zone

session.setTimeZone("America/Denver");

setUseServerCharacterSet

void session.setUseServerCharacterSet ( boolean useServerCharacterSet ) (professional and enterprise editions only)

Description

If this method is passed the value of true, it will cause screen-scraper to utilize whatever character set is specified by the server in its "Content-Type" response header. If no such header exists, screen-scraper will default to either the character set indicated for the scraping session or the global character set (in that order).

Parameters

useServerCharacterSet true or false

Return Values

None

Change Log

Version	Description
5.5.11a	Available in all editions.

Examples

Indicate that the server character set should be used

session.setUseServerCharacterSet( true );

// You can also access the current value like so:
session.log( "Use server character set: " + session.getUseServerCharacterSet() );

setUserAgent

void session.setUserAgent ( String userAgent ) (professional and enterprise editions only)

Description

Sets the user agent to be used for all requests.

Parameters

userAgent true or false

Return Values

None

Change Log

Version	Description
5.5.23a	Available in Professional and Enterprise editions.

Examples

Set the user agent

session.setUserAgent( "Opera/9.64(Windows NT 5.1; U; en) Presto/2.1.1" );

// You can also access the current value like so:
session.log( "Session user agent: " + session.getUserAgent() );

setVariable

void session.setVariable ( String identifier, Object value )

Description

Set the value of a session variable.

Parameters

identifier Name of the session variable, as a string.
value Value of the session variable. This can be any Java object, including (but not llimited to) a String, DataSet, or DataRecord.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Set Session Variable

// Sets the session variable "CITY_CODE" with the value found
// in the first dataRecord (at index 0) pointed to by the
// identifier "CITY_CODE".

session.setVariable( "CITY_CODE", dataSet.get( 0, "CITY_CODE" ) );

setv

void session.setv ( String identifier, Object value )

Description

Set the value of a session variable (alias of setVariable).

Parameters

identifier Name of the session variable, as a string.
value Value of the session variable. This can be any Java object, including (but not llimited to) a String, DataSet, or DataRecord.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Set Session Variable

// Sets the session variable "CITY_CODE" with the value found
// in the first dataRecord (at index 0) pointed to by the
// identifier "CITY_CODE".

session.setv( "CITY_CODE", dataSet.get( 0, "CITY_CODE" ) );

shouldStopScraping

boolean session.shouldStopScraping ( )

Description

Determine if the scrape has been stopped. This can be done using the stop button in the workbench or the stop scraping button on the web interface (for enterprise users).

Parameters

This method does not receive any parameters.

Return Values

Returns true if the scrape has been requested to stop; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for enterprise edition.

Examples

Stop Iterator if Scrape is Stopped

for (int i = 0; i < dataSet.getNumDataRecords(); ++i)
{
// check during every iteration to see if we should exit early.
// Without this check, the iteration will continue even
// if the stop scraping button were to be pressed.
if ( session.shouldStopScraping() )
{
break;
}

session.setVariable( "URL", dataSet.get( i, "NEXT_PAGE_URL" ) );
session.scrapeFile( "NEXT_PAGE" );
}

stopScraping

void session.stopScraping ( )

Description

Stop the current scraping session.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Stop Scrape on Scrapeable File Request Error

// Stops scraping if an error response was received
// from the server.
if( scrapeableFile.wasErrorOnRequest() )
{
session.stopScraping();
}

waitForFileDownloadsToComplete

void session.waitForFileDownloadsToComplete() (enterprise edition only)

Description

Waits for any file downloads to complete before returning. This should be used in tandem with the session.downloadFile method call that takes the "doLazy" paraameter.

Parameters

None

Return Values

None

Change Log

Version	Description
5.5.43a	Available in Enterprise edition.

Examples

Set the user agent

// Download five image files concurrently.
for( i = 0; i < 5; i++ )
{
session.downloadFile( "http://www.mysite.com/images/image" + i + ".jpg", "output/image" + i + ".jpg", 5, true );
}

// Wait for all of the images to finish downloading before continuing.
session.waitForFileDownloadsToComplete();

sutil

Overview

The sutil class provides general functions used to manipulate and work with extracted data. It also allows you to get information regarding screen-scraper such as its memory usage or version.

Images

Overview

In the course of a scrape it you might want to gather images associated with the other information being gathered. These methods are provided to not only download the images but to gather size information and resize to your desired size.

These methods are only available to enterprise edition users.

getImageHeight

int sutil.getImageHeight ( String imagePath ) (enterprise edition only)

Description

Get the height of an image.

Parameters

imagePath File path to the image, as a string.

Return Values

Returns the height in pixels of the image file, as an integer. If the file doesn't exist or is not an image an error will be thrown and -1 will be returned.

Change Log

Version	Description
5.0	Moved from session to sutil.
4.5	Available for enterprise edition.

Examples

Write Image Height to Log

// Output the height of the image to the log.
session.log( "Image height: " + sutil.getImageHeight( "C:/my_image.jpg" ) );

getImageWidth

int sutil.getImageWidth ( String imagePath ) (enterprise edition only)

Description

Get the width of an image.

Parameters

imagePath File path to the image, as a string.

Return Values

Returns the width in pixels of the image file, as an integer. If the file doesn't exist or is not an image an error will be thrown and -1 will be returned.

Change Log

Version	Description
5.0	Moved from session to sutil.
4.5	Available for enterprise edition.

Examples

Write Image Width to Log

// Output the width of the image to the log.
session.log( "Image height: " + sutil.getImageWidth( "C:/my_image.jpg" ) );

resizeImage

Overview

Internally, only one function is used to resize all images; however, to facilitate the resizing of images, we have provided you with three methods. Each method will help you specify what measurement is most important (width or height) and whether the image should retain its aspect ratio.

resizeImageFixHeight() [sutil] - Resize image, retaining aspect ratio, based on specified height.
resizeImageFixWidth() [sutil] - Resize image, retaining aspect ratio, based on specified width.
resizeImageFixWidthAndHeight() [sutil] - Resize image to a specified size (will not check aspect ratio).

resizeImageFixHeight

void sutil.resizeImageFixHeight ( String originalFile, String newFile, int newHeightSize, boolean deleteOriginalFile ) (enterprise edition only)

Description

Resize image, retaining aspect ratio, based on specified height.

Parameters

originalFile File path of the image to be resized, as a string.
newFile File path when the new image should be created, as a string.
newHeightSize The height of the resized image in pixels, as a integer.
deleteOriginalFile Whether the origionalFile should be retained, as a boolean.

Return Values

Returns void. If an error is encountered it will be thrown.

Change Log

Version	Description
5.0	Moved from session to sutil.
4.5	Available for enterprise edition.

Examples

Resize Image to Specified Height

// Resizes a JPG to 100 pixels high, maintaining the
// aspect ratio. After the image is resized, the original
// will be deleted.

sutil.resizeImageFixHeight( "C:/my_image.jpg", "C:/my_image_thumbnail.jpg", 100, true );

resizeImageFixWidth

void sutil.resizeImageFixWidth ( String originalFile, String newFile, int newWidthSize, boolean deleteOriginalFile ) (enterprise edition only)

Description

Resize image, retaining aspect ratio, based on specified width.

Parameters

originalFile File path of the image to be resized, as a string.
newFile File path when the new image should be created, as a string.
newWidthSize The width of the resized image in pixels, as a integer.
deleteOriginalFile Whether the origionalFile should be retained, as a boolean.

Return Values

Returns void. If an error is encountered it will be thrown.

Change Log

Version	Description
5.0	Moved from session to sutil.
4.5	Available for enterprise edition.

Examples

Resize Image to Specified Width

// Resizes a JPG to 100 pixels wide, maintaining the
// aspect ratio. After the image is resized, the original
// will be deleted.

sutil.resizeImageFixWidth( "C:/my_image.jpg", "C:/my_image_thumbnail.jpg", 100, true );

resizeImageFixWidthAndHeight

void sutil.resizeImageFixWidth ( String originalFile, String newFile, int newWidthSize, int newHeightSize, boolean deleteOriginalFile ) (enterprise edition only)

Description

Resize image to a specified size.

Parameters

originalFile File path of the image to be resized, as a string.
newFile File path when the new image should be created, as a string.
newWidthSize The width of the resized image in pixels, as a integer.
newHeightSize The height of the resized image in pixels, as a integer.
deleteOriginalFile Whether the origionalFile should be retained, as a boolean.

Return Values

Returns void. If an error is encountered it will be thrown.

Change Log

Version	Description
5.0	Moved from session to sutil.
4.5	Available for enterprise edition.

This method can cause distortions of the image if the aspect ratio of the original and target images are different.

Examples

Resize Image to Specified Size

// Resizes a JPG to 100x100 pixels.
// After the image is resized, the original
// will be deleted.

sutil.resizeImageFixWidthAndHeight( "C:/my_image.jpg", "C:/my_image_thumbnail.jpg", 100, 100, true );

DecodedImage

Overview

To be used in conjunction with the ImageDecoder class.

This class represents decoded images. The objects can be queried for the text that was in the image, as well as any error that occurred while the image was being decoded. When the returned text is incorrect, there is a method that can be used to report it as bad. This can be used for sites like decaptcher.com, where refunds are given for incorrectly interpreted images.

getError

String getError ( )

Description

Gets any error message, or returns null if there was no error

Parameters

This method takes no parameters

Return Value

The error message returned

Error messages

OK Nothing went wrong
BALANCE_ERROR Insufficient funds with paid service
NETWORK_ERROR General network error (timeout, lost connection, server busy, etc...)
INVALID_LOGIN Credentials are invalid
GENERAL_ERROR General error, possibly image was bad or the site couldn't resolve it. See the error message for details
UNKNOWN Unknown error

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Convert an image to text

import com.screenscraper.util.images.*;

// Assuming an ImageDecoder was created in a different location and saved in "IMAGE_DECODER"
ImageDecoder decoder = session.getVariable("IMAGE_DECODER");
DecodedImage result = decoder.decodeFile("someFile.jpg");

if(result.wasError())
{
session.logWarn("Error converting image to text: " + result.getError());
}
else
{
session.log("Decoded Text: " + result.getResult());
}

// If the result was bad
result.reportAsBad();

getResult

Object getResult ( )

Description

Gets the result from decoding the image. Most likely this will be a String, but each implementation could return a specific object type.

Parameters

This method takes no parameters

Return Value

The text extracted from the image, or null if there was an error

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Convert and image to text

reportAsBad

void reportAsBad ( )

Description

Handles an incorrectly resolved image. Some types of decoders won't have anything here

Parameters

This method takes no parameters

Return Value

This method returns void.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Convert and image to text

wasError

String wasError ( )

Description

Returns true if there was an error, false otherwise. Also returns false if the image has not been resolved yet

Parameters

This method takes no parameters

Return Value

True if there was an error, false otherwise

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Convert and image to text

ImageDecoder

Overview

Class to convert images to text for interacting with CAPTCHA challenges. There are currently two implementations:

ManualDecoder: Creates a pop-up window for a user to enter in the text they read from the image
DecaptcherDecoder: Interface for the paid service decaptcher.com

When a reference to an image is passed to an instance of this class, it returns a DecodedImage object that can be queried for the resulting text, errors, and can report an image as poorly converted.

See example attached.

DecaptcherDecoder

void DecaptcherDecoder (ScrapingSession session, String username, String password, int port)
void DecaptcherDecoder (ScrapingSession session, String username, String password, String port)
void DecaptcherDecoder (ScrapingSession session, String username, String password, String port, String apiUrl)
void DecaptcherDecoder (ScrapingSession session, String username, String password, int port, String apiUrl)

Description

Requires an account with decaptcher.com.

Type of ImageDecoder in the com.screenscraper.util.images package that uses the decaptcher.com service to convert images to text. The constructor is DecaptcherDecoder(ScrapingSession session, String username, String password) or DecaptcherDecoder(ScrapingSession session, String username, String password, String apiUrl).

Parameters

session Name of currently running scraping session.
username Username used to log in to decaptcher.com service.
password Password used to log in to decaptcher.com service.
port The port given by De-captcher.com to access your account on their site.
apiUrl (optional) URL used to access decaptcher.com service. This setting will override the default URL.

Return Values

Returns void. If it runs into any problems accessing the decaptcher.com service an error will be thrown.

Change Log

Version	Description
5.5.29a	Available in all editions
5.5.40a	Added the port parameter. The service now requires the correct port in order to authenticate.

Examples

Initialization script

import com.screenscraper.util.images.*;

ImageDecoder decoder;

decoder = new DecaptcherDecoder(session, "username", "password", 12345, "http://api.de-captcher.com");

session.setVariable("IMAGE_DECODER", decoder);

ManualDecoder

void ManualDecoder (ScrapingSession session)

Description

Type of ImageDecoder in the com.screenscraper.util.images package that uses a popup window prompting the user to enter the text read from an image. Useful for debugging purposes, as the input text should always be correct (so long as it is typed correctly). Helpful during testing to avoid costs associated with paid-for CAPTCHA decoding services such as decaptcher.com.

Parameters

session Name of currently running scraping session.

Return Values

Returns void. If it runs into any problems decoding an image an error will be thrown.

Change Log

Version	Description
5.5.29a	Available in all editions

Examples

Initialize script

import com.screenscraper.util.images.*;

ImageDecoder decoder;

decoder = new ManualDecoder(session);

session.setVariable("IMAGE_DECODER", decoder);

decodeFile

DecodedImage decodeFile ( String file )
DecodedImage decodeFile ( File file )

Description

Converts the image given to a DecodedImage that will handle it. Does not delete the file.

Parameters

file The image file

Return Value

A DecodedImage used to get the text, errors, and possibly report a result as bad.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

image = decoder.decodeFile("path to the image file");

decodeURL

DecodedImage decodeURL ( String url )

Description

Converts the image at the given URL to a DecodedImage that will handle it. Temporarily saves the file in the screen-scraper root folder, but deletes it once it has been decoded. By default, this will use the scraping session's HttpClient to request the URL.

Parameters

url The url to the image

Return Value

A DecodedImage used to get the text, errors, and possibly report a result as bad.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

DecodedImage image = decoder.decodeURL(dataRecord.get("IMAGE_URL"));

applyXPathExpression

convertDateToString

String sutil.convertDateToString ( Date date ) (professional and enterprise editions only)
String sutil.convertDateToString ( Date date, String format ) (professional and enterprise editions only)

Description

Converts the Date given to a string in a specified format, or in the "MM/dd/yyyy HH:mm:ss.SS zzz" if no format is given.

Parameters

date The date to convert
format (optional) A String representation (as a SimpleDateFormat) for the output

Return Values

A String representing the date given

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

// Log the current time
Date now = new Date();
session.log(sutil.convertDateToString(now, "MM/dd/yyyy HH:mm:ss zzz"));

convertHTMLEntities

void sutil.convertHTMLEntities ( String value )

Description

Decode HTML Entities.

Parameters

value String whose HTML Entities will be converted to characters.

Return Values

Returns string with decoded HTML entities.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Decode HTML Entities

// Returns Angela's Room
sutil.convertHTMLEntities( "Angela's Room" );

convertStringToDate

Date sutil.convertStringToDate ( String dateString, String format ) (professional and enterprise editions only)

Description

Converts a String to a Date object using the given format. If null is given as a format, "MM/dd/yyyy HH:mm:ss.SS zzz" is used

Parameters

dateString The date string
format The format of the date, following SimpleDateFormat formatting.

Return Values

The Date object matching the date given in the String, or null if it couldn't be parsed with the given format

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

// Convert an input value to a date for later use
Date lastUpdate = sutil.convertStringToDate(session.getVariable("LAST_RUN_DATE"), "yyyy-MM-dd");

if(lastUpdate == null)
{
session.logError("No last run specified, stopping scrape");
session.stopScraping();
}

convertUTFWhitespace

String sutil.convertUTFWhitespace (String input ) (enterprise edition only)

Description

Replaces the UTF variants on whitespace with a regular space character.

Parameters

input The input string.

Return Values

Returns the converted string.

Change Log

Version	Description
6.0.55a	Available in all editions.

Examples

Tidying a string from a site that has non-uniform ways of returning strings.

// useful when tidying a string
String cleanedInput = sutil.convertUTFWhitespace(input);
cleanedInput = cleanedInput.replaceAll("\\s{2,}", " ").trim();

dateIsWithinDays

boolean sutil.dateIsWithinDays ( Date date1, Date date2, int days ) (professional and enterprise editions only)

Description

Checks to see if one date is within a certain number of days of another.

Parameters

date1 The first date.
date2 The second date.
days The maximum number of days that can be between the two dates.

Return Values

True if the dates are close than or the number of days apart, false otherwise.

Change Log

Version	Description
5.5.13a	Available in all editions.

Examples

Check the proximity of one date to another

date1 = sutil.convertStringToDate( "2012-02-15", "yyyy-MM-dd" );
date2 = sutil.convertStringToDate( "2012-02-24", "yyyy-MM-dd" );

days = 5;
session.log( "First date is within 5 days of second date: " + sutil.dateIsWithinDays( date1, date2, days ) );

days = 15;
session.log( "First date is within 15 days of second date: " + sutil.dateIsWithinDays( date1, date2, days ) );(

equalsIgnoreCase

boolean sutil.equalsIgnoreCase ( String stringOne, String stringTwo )

Description

Compare two strings ignoring case.

Parameters

stringOne First string.
stringTwo Second string.

Return Values

Returns true if the values of the two strings are equal when case is not considered; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Compare Two Strings (Case Insensitive)

// Compare strings without regard to case
sutil.equalsIgnoreCase( "aBc123","ABc123" );

formatNumber

String sutil.formatNumber ( String number ) (professional and enterprise editions only)
String sutil.formatNumber ( String number, int decimals, boolean padDecimals ) (professional and enterprise editions only)

Description

Returns a number formatted in such a way that it could be parsed as a Float, such as xxxxxxxxx.xxxx. It attempts to figure out if the number is formatted as European or American style, but if it cannot determine which it is, it defaults to American. If the number is something with a k on the end, it will convert the k to thousand (as 000). It will also try to convert m for million and b for billion. It also assumes that you won't have a number like 3.123k or 3.765m, however 3.54m is fine. It figures if you wanted all three of those digits you would have specified it as 3765k or 3,765k

Parameters

number String containing the number.
decimals (optional) The number of maximum number of decimal places to include in the result. When this value is omitted, any decimals are retained, but none are added
padDecimals (optional) Sets whether or not to pad the decimals (convert 5.1 to 5.10 if 2 decimals are specified)

Return Values

Returns a String formatted as a phone number, such as +1 (123) 456-7890x2, or null if the input was null

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

Format a scraped abbreviated number as a dollar amount

// Format a number to two decimal places
String dollars = sutil.formatNumber("3.75k", 2, true);
// This would set dollars to the String "3750.00"

// Format the amount without cents.
String dollarsNoCents = sutil.formatNumber("3.75m");
// This would set dollars to the String "3750000"

Format a European number to be inserted in a MySQL statement

String number = sutil.formatNumber("3.275,10", 2, false);
// number would now be "3275.1"

formatUSPhoneNumber

String sutil.formatUSPhoneNumber ( String number ) (professional and enterprise editions only)

Description

Converts a String to a US formatted phone number, as +1 (123) 456-7890x2. Expects a 7 digit or 10+ digit phone number. The extension is optional, and will be any digits found after an x. This allows for extensions listed as ext, x, or extension.

Parameters

number String containing the phone number. The only digits in this String should be the digits of the phone number.

Return Values

Returns a String formatted as a phone number, such as +1 (123) 456-7890x2, or null if the input was null

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

Format a scraped phone number

// Formats the phone number extracted
String phone = sutil.formatUSPhoneNumber(dataRecord.get("PHONE_NUMBER"));

// If the extracted value had been "13334445678 ext. 23" the returned value "+1 (333) 444-5678x23"

formatUSZipCode

String sutil.formatUSZipCode ( String zip ) (professional and enterprise editions only)

Description

Formats and returns a US style zip code as 12345-6789. If the given zip code isn't 5 or 9 digits, will log a warning, but it will put 5 digits before the - and anything else (if any) after the -

Parameters

zip String to format as a zip code, either 5 or 9 digits

Return Values

Zip code formatted String, such as 12345-6789 or 12345

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

// Format a number to a nicer looking zip code
String zip = sutil.formatUSZipCode(" 001011458");

// zip would be "00101-1458"

getCurrentDate

String sutil.getCurrentDate ( String format )

Description

Returns the current date in a specified format, or uses the "MM/dd/yyyy HH:mm:ss.SS zzz" if null is given. Uses the session's timezone.

Parameters

format The format for the output string

Return Values

A String representing the date and time this method was invoked

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

// Log the current time
session.log(sutil.getCurrentDate(null));

getInstallDir

Sting sutil.getInstallDir ( )

Description

Retrieve the file path of the screen-scraper installation.

Parameters

This method does not receive parameters.

Return Values

Returns the installation directory file path, as a string.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Download to screen-scraper Directory

url = "http://www.foo.com/imgs/puppy_image.gif";

// Get installation file path
path = sutil.getInstallDir() + "images/puppy.gif";

// Download to screen-scraper directory
session.downloadFile( url, path );

getMemoryUsage

int sutil.getMemoryUsage ( ) (enterprise edition only)

Description

Get memory usage of screen-scraper.

Parameters

This method does not receive any parameters.

Return Values

Returns the average percentage of memory used by screen-scraper over the past 30 seconds, as an integer.

Change Log

Version	Description
5.0	Moved from session to sutil.
4.5	Available for enterprise edition.

For tips on optimizing screen-scraper's memory usage so that it can run faster, see our FAQ on optimization.

Examples

Stop Scrape on Memory Leak

// Stop scrape if memory is low
if( sutil.getMemoryUsage() > 98 )
{
session.log( "Memory is critically low. Stopping the scraping session." );
session.stopScraping();
}

getMimeType

String sutil.getMimeType ( String path )

Description

Get the mime-type of a local file.

Parameters

path File path to the local file, as a string.

Return Values

Returns the mime-type of the file, as a string.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Get File Mime Type

// Get mime-type
sutil.getMimeType( "c:/image/puppy.gif" );

getNumRunnableScrapingSessions

int sutil.getNumRunnableScrapingSessions ( ) (enterprise edition only)

Description

Get the number of runnable scraping sessions.

Parameters

This method does not receive any parameters.

Return Values

Returns the number of scraping sessions in this instance of screen-scraper, as a integer.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Get the Number of Runnable Scrapes

// Write the number of running scrapes to the log
session.log( "Number of Runnable Scrapes: " + sutil.getNumRunnableScrapingSessions() );

getNumRunningScrapingSessions

int sutil.getNumRunningScrapingSessions ()
int sutil.getNumRunningScrapingSessions ( String scrapingSessionName )

Description

Gets the number of scraping sessions that are currently being run.

Parameters

scrapingSessionName Narrows the scope to a given scraping session, if this parameter is passed in.

Return Values

An int representing the number of running scraping sessions.

Change Log

Version	Description
5.5.42a	Available in Enterprise edition.

Examples

session.log( "Num running scraping sessions: " + sutil.getNumRunningScrapingSessions( session.getName() ) );
if( sutil.getNumRunningScrapingSessions( session.getName() ) > 1 )
{
session.log( "SESSION ALREADY RUNNING." );
session.stopScraping();
return;
}

getOptionSet

DataSet sutil.getOptionSet ( String options ) (professional and enterprise editions only)
DataSet sutil.getOptionSet ( String options, String ignoreLabel, boolean tidyRecords ) (professional and enterprise editions only)
DataSet sutil.getOptionSet ( String options, String[] ignoreLabels, boolean tidyRecords ) (professional and enterprise editions only)
DataSet sutil.getOptionSet ( String options, Collection<String> ignoreLabels, boolean tidyRecords ) (professional and enterprise editions only)

Description

Gets a DataSet containing each of the elements of a <select> tag. The returned DataRecords will contain a key for the text found between the tags (possibly with html tags removed), a value indicating if it was the selected option, and the value to submit for the specific option. Note that this only looks for option tags, and as such passing in text containing more than a single select tag will produce false output.

Parameters

options The text containing the options HTML from the select tag
ignoreLabels (or ignoreLabel) (optional) Text value(s) to ignore in the output set. Usually this would include the strings like "Please select a category"
tidyRecords (optional) Should the TEXT be tidied before being stored in the resulting DataRecords

Return Values

A DataSet with one record per option. Values extracted will be stored in
VALUE : The value the browser would submit for this option
TEXT : The text that was between the tags
SELECTED : A boolean that is true if this option was selected by default

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

Search each option from a dropdown menu

String options = dataRecord.get("ITEM_OPTIONS");

// We don't want the value for "Select an option" because that doesn't go to a search results page
DataSet items = sutil.getOptionSet(options, "Select an option", true);

for(int i = 0; i < items.getNumDataRecords(); i++)
{
DataRecord next = items.getDataRecord(i);
session.setVariable("ITEM_VALUE", next.get("VALUE"));
session.log("Now scraping results for " + next.get("TEXT"));
session.scrapeFile("Search Results");
}

getRadioButtonSet

DataSet sutil.getRadioButtonSet ( String buttons, String buttonName ) (professional and enterprise editions only)
DataSet sutil.getRadioButtonSet ( String buttons, String buttonName, String ignoreLabel ) (professional and enterprise editions only)
DataSet sutil.getRadioButtonSet ( String buttons, String buttonName, Collection<String> ignoreLabels ) (professional and enterprise editions only)
DataSet sutil.getRadioButtonSet ( String buttons, String buttonName, Collection<String> ignoreLabels, boolean tidyRecords ) (professional and enterprise editions only)

Description

Gets all the options from a radio button group. The values are returned in a data record. Any labels that are to be ignored will not be included in the returned set. Not all buttons have a label, as radio buttons do not require a label, and it would be difficult to know in a regular expression exactly what to extract as the label unless there is a label tag.

Parameters

buttons The text containing the buttons
buttonName The name of the buttons that should be grabbed, as a Regex pattern
ignoreLabels (or ignoreLabel) (optional) Any labels that should be excluded from the resulting set
tidyRecords (optional) Should the TEXT be tidied before being stored in the resulting DataRecords

Return Value

DataSet containing one record for each of the extracted radio buttons. Values will be stored in
VALUE : The value the browser would submit for this radio button
TEXT : The text that represents this button, or null if no label could be found for it
SELECTED : A boolean that is true if this button was selected by default
ID : The ID of the radio button, or null if no ID was found

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Change Log

Version	Description
5.0	Added for all editions.

Examples

Write Version to Log

// Write the current version to log.
session.log("Current edition: " + sutil.getScreenScraperEdition());

getScreenScraperVersion

String sutil.getScreenScraperVersion ( )

Description

Get version of screen-scraper instance.

Parameters

This method does not receive any parameters.

Return Values

Returns the version number, as a string.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Write Version to Log

// Write the current version to log.
session.log("Current version: " + sutil.getScreenScraperVersion());

isInt

boolean sutil.isInt ( String string )

Description

Determine if the value of a string is an integer.

Parameters

obj Object to be tested for containing an integer.

Return Values

Returns true if the string is an integer; otherwise, it returns false. If it is passed an object that is not a string, including an integer, an error will be thrown.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Check String Value

// Does the GUESTS variable contain an integer
if ( !sutil.isInt( session.getv( "GUESTS" ) ) )
{
session.logWarn( "Could not get the number of guests!" );
}

isNullOrEmptyString

boolean sutil.isNullOrEmptyString ( Object object )

Description

Determine if an object's value is null or empty.

Parameters

object The object whole value will be tested.

Return Values

Returns true if the value of the object is null or an empty string; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Warning for Empty Variable

// Give warning and stop scrape if variable is empty
if ( sutil.isNullOrEmptyString( session.getv( "NAME" ) ) )
{
session.log( "The NAME variable was blank." );
session.stopScraping();
}

isPlatformLinux

boolean sutil.isPlatformLinux ( )

Description

Determine if operating system is a Linux platform.

Parameters

This method does not receive parameters.

Return Values

Returns true if the operating system is Linux; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Check Linux Platform

url = "http://www.foo.com/imgs/puppy_image.gif";

// Determine download location based on platform
if ( sutil.isPlatformLinux() )
{
session.downloadFile( url, "/home/user/images/puppy.gif" );
}
else if ( sutil.isPlatformMac() )
{
session.downloadFile( url, "/Volumes/Documents/images/puppy.gif" );
}
else if ( sutil.isPlatformWindows() )
{
session.downloadFile( url, "c:/images/puppy.gif" );
}

isPlatformMac

boolean sutil.isPlatformMac ( )

Description

Determine if operating system is a Mac platform.

Parameters

This method does not receive parameters.

Return Values

Returns true if the operating system is Mac; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Check Mac Platform

isPlatformWindows

boolean sutil.isPlatformWindows ( )

Description

Determine if operating system is a Windows platform.

Parameters

This method does not receive parameters.

Return Values

Returns true if the operating system is Windows; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Check Windows Platform

makeGETRequest

Sting sutil.makeGETRequest ( String url )

Description

Retrieve the response contents of a GET request.

Parameters

url URL encoded version of page request, as a string. Java provides a URLEncoder to aid in URL encoding of a string.

Return Values

Returns contents of the response, as a string.

Change Log

Version	Description
5.0	Added for all editions.

This method will use any proxy settings that have been specified in the Settings dialog box.

Examples

Retrieve Page Contents

// Returns contents resulting from
// request to "http://www.screen-scraper.com"

pageContents = sutil.makeGETRequest("http://www.screen-scraper.com/tutorial/basic_form.php?text_string=Hello+World");

makeGETRequestNoSessionProxy

String sutil.makeGETRequestNoSessionProxy ( String urlString )

Description

Makes a GET request and returns the result as a string. This method will use the proxy settings indicated in the "Settings" dialog box, if any.

Parameters

This method does not receive any parameters.

Return Values

urlString The URL to request, as a string.

Throws

java.lang.Exception If anything naughty happens.

Change Log

Version	Description
6.0.6a	Introduced for all editions.

makeGETRequestUseSessionProxy

String sutil.makeGETRequestUseSessionProxy ( String urlString )

Description

Makes a GET request and returns the result as a string. This method will use the proxy settings attached to the current scraping session.

Parameters

This method does not receive any parameters.

Return Values

urlString The URL to request, as a string.

Throws

java.lang.Exception If anything naughty happens.

Change Log

Version	Description
6.0.6a	Introduced for all editions.

makeHEADRequest

String[][] sutil.makeHEADRequest ( String url )

Description

Retrieve the response header contents.

Parameters

url URL encoded version of page request, as a string. Java provides a URLEncoder to aid in URL encoding of a string.

Return Values

Returns contents of the response, as a two-dimensional array.

Change Log

Version	Description
5.0	Added for all editions.

This method will use any proxy settings that have been specified in the Settings dialog box..

Examples

Retrieve Page Contents

// Log HEAD contents

// Get head contents
headerArray = sutil.makeHEADRequest("http://www.screen-scraper.com/tutorial/basic_form.php?text_string=Hello+World");

// Loop through HEAD contents
for (int i=0; i<headerArray.length; i++)
{
// Write header to log
session.log(headerArray[i][0] + ": " + headerArray[i][1]);
}

/* Example Log:
Date: Fri, 04 Jun 2010 17:18:11 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.1.6
Connection: close
Content-Type: text/html; charset=UTF-8
*/

mergeDataRecords

DataRecord sutil.mergeDataRecords ( DataRecord first, DataRecord second, boolean saveNonEmptyString ) (professional and enterprise editions only)

Description

Merges two data records by copying all values from the second record over values of the first record, and returning a new DataRecord with these values. Doesn't modify either original record

Parameters

first The first DataRecord, into which the values from the second record will be copied
second The second DataRecord, whose values will be copied into the first
saveNonEmptyString True if blank values should not overwrite blank values, whether the non-blank value is in the first or second record. If both records contain a value that is not blank for the same key, the value in the first record is saved and the value in the second record discarded. If false, all values in the second record will overwrite any values in the first record.

Return Values

A new DataRecord with the merged values

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

Combine values from the current dataRecord with a previous one

DataRecord previous = session.getVariable("_DATARECORD");

session.setVariable("_DATARECORD", sutil.mergeDataRecords(previous, dataRecord));

nullToEmptyString

String sutil.nullToEmptyString ( Object object )

Description

Get an object in string format.

Parameters

object Object to be returned in string format.

Return Values

Returns an empty string if the value of the object is null; otherwise, returns the value of the toString method of the object.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Get String Value of Variable

// Always Specify Suffix even if not selected
suffix = sutil.nullToEmptyString( session.getv( "SUFFIX" ) );

parseName

Name sutil.parseName ( String name ) (pro and enterprise editions only)

Description

Attempts to parse a string to a name. The parser is not perfect and works best on english formatted names (for example, "John Smith Jr." or "Guerrero, Antonio K". This uses standard settings for the parser. To get more control over how the name is parsed, use the EnglishNameParser class.

Parameters

name The name to be parsed.

Return Values

Returns the parsed name, as a Name object.

Change Log

Version	Description
6.0.59a	Available for professional and enterprise editions.

Examples

How to use the name parser

String nameRaw = "John Fred Doe";
DataRecord dr = new DataRecord();

log.debug( "Name raw: " + nameRaw );
if( nameRaw!=null )
{
try
{
Name name = sutil.parseName( nameRaw );
log.debug( "First name: " + name.getFirstName() );
log.debug( "Middle name: " + name.getMiddleName() );
log.debug( "Last name: " + name.getLastName() );
//log.debug( "Suffix: " + name.getSuffix() );

dr.put( "FIRST_NAME", name.getFirstName() );
dr.put( "MIDDLE_NAME", name.getMiddleName() );
dr.put( "LAST_NAME", name.getLastName() );
//dr.put( "SUFFIX", name.getAllSuffixString() );
}
catch( Exception e )
{
// The parser may throw an exception if it can't
// parse the name. If this occurs we want to know about it.
log.warn( "Error parsing name: " + e.getMessage() );
}
}

Description

Parameters

name The name to be parsed.

Return Values

Returns the parsed name, as a Name object.

Change Log

Version	Description
6.0.59a	Available for professional and enterprise editions.

Examples

How to use the name parser

parseUSAddress

Address sutil.parseUSAddress ( String address ) (pro and enterprise editions only)

Description

Attempts to parse a string to an address. The parser is not perfect and works best on US addresses. Most likely other address formats can be parsed with the USAddressParser class by providing different constraints in the builder. This method is here for convenience in working with US addresses.

Parameters

address The address to be parsed.

Return Values

Returns the parsed address, as a Address object.

Change Log

Version	Description
6.0.59a	Available for professional and enterprise editions.

Examples

How to use the address parser

import com.screenscraper.util.parsing.address.Address;

String addressRaw = // some address

DataRecord dr = new DataRecord();

try
{
Address address = sutil.parseUSAddress( addressRaw );
log.debug( "Street: " + address.getStreet() );
log.debug( "Suite or Apartment: " + address.getSuiteOrApartment() );
log.debug( "City: " + address.getCity() );
log.debug( "State: " + address.getState() );
log.debug( "Zip: " + address.getZipCode() );

// if all of these four are blank then save only the raw address
// else save what we can
if(
sutil.isNullOrEmptyString( address.getStreet() )
&&
sutil.isNullOrEmptyString( address.getState() )
&&
sutil.isNullOrEmptyString( address.getCity() )
&&
sutil.isNullOrEmptyString( address.getZipCode() )
)
{
dr.put( "ADDRESS", addressRaw );
}
else
{
dr.put( "ADDRESS", address.getStreet() );
dr.put( "ADDRESS2", address.getSuiteOrApartment() );
dr.put( "STATE", address.getState() );
dr.put( "CITY", address.getCity() );
dr.put( "ZIP", address.getZipCode() );
}
session.setv( "DR_ADDRESS", dr );
}
catch( Exception e )
{
// If there was a parsing error, notify so it can be dealt with
log.warn( "Exception parsing address: " + e.getMessage() );
}

pause

void sutil.pause ( long milliseconds ) (professional and enterprise editions only)

Description

Pause scraping session.

Parameters

milliseconds Length of the pause, in milliseconds.

Return Values

Returns void.

Change Log

Version	Description
5.0	Moved from session to sutil.
4.5	Available for professional and enterprise editions.

Pausing the scraping session also pauses the execution of the scripts including the one that initiates the pause.

Examples

Pause Scrape on Server Overload

// It should be noted that a status code of 503 is not
// always a temporary overloading of a server.

// Check status code
if (scrapeableFile.statusCode() == 503)
{
// Pause Scraping for 5 seconds
sutil.pause( 5000 );

// Continue/Rescrape file
...
}

randomPause

void sutil.randomPause ( long min, long max ) (professional and enterprise editions only)

Description

Pauses for a random amount of time. This is also setup to stop immediately if the stop scrape button is clicked, and to allow breakpoints to be triggered while it is pausing.

Parameters

min The minimum duration of the pause, in milliseconds
max The maximum duration of the pause, in milliseconds

Return Value

Returns void.

Change Log

Version	Description
5.5.29a	Available in professional and enterprise editions.

Examples

Wait for between 2 and 4 seconds

sutil.randomPause(2000, 4000);

reformatDate

String sutil.reformatDate ( String date, String dateFormatFrom, String dateFormatTo ) (professional and enterprise editions only)
String sutil.reformatDate ( String date, String dateFormatTo ) (enterprise edition only)

Description

Change a date format.

Parameters

date Date that is being reformatted, as a string.
dateFormatFrom (optional) The format of the date that is being reformated. The date format follows Sun's SimpleDateFormat.
dateFormatTo The format that the date is being changed to. If dateFormatFrom is being used this should also follow Sun's SimpleDateFormat. If dateFormatFrom is left off then the date format should follow PHP's date format. In the later method you can also use timestamp as the value of this parameter and it will return the timestamp corresponding to the date. Note also how PHP treats dashes and dots: "Dates in the m/d/y or d-m-y formats are disambiguated by looking at the separator between the various components: if the separator is a slash (/), then the American m/d/y is assumed; whereas if the separator is a dash (-) or a dot (.), then the European d-m-y format is assumed."

Return Values

Returns formatted date according to the specified format, as a string.

Change Log

Version	Description
5.0	Moved from session to sutil.
4.5	Available for professional and enterprise editions. Unspecified source format available for enterprise edition.

The date formats are not the same for the two methods. Read carefully.

Examples

Reformat Date from Specified Format

// Reformats the date shown to the format "2010-01-01".
// This uses Sun's Date Formats

sutil.reformatDate( "01/01/2010", "dd/MM/yyyy", "yyyy-MM-dd" );

Reformat Date from Unspecified Format

// Reformats the date shown to the format "2010-01-01".
// This uses PHP's Date Formats

sutil.reformatDate( "01/01/2010", "Y-m-d" );

sendMail

void sutil.sendMail ( String subject, String body, String recipients ) (enterprise edition only)
void sutil.sendMail ( String subject, String body, String recipients, String attachments, String headers ) (enterprise edition only)
void sutil.sendMail ( String subject, String body, String recipients, String contentType, String attachments, String headers ) (enterprise edition only)

Description

Send an email using SMTP mail server specified in the settings.

Parameters

subject Subject line of the email, as a string.
body The content of the email, as a string.
recipients Comma-delimited list of email address to which the email will be sent, as a string.
contentType The content type as a valid MIME type.
attachments Comma-delimited list of local file paths to files that should be attached, as a string.
If you do not have any attachments the value of null should be used.
headers Tab-delimited SMTP headers to be used when sending the email, as a string. If you don't have
any headers to send use the value null.

Return Values

Returns void. If it runs into any problems while attempting to send the email an error will be thrown.

Change Log

Version	Description
6.0.35a	Now supports alternate content types.
5.0	Moved from session to sutil.
4.5	Available for enterprise edition.

Examples

Send Email at End of Scrape

// In script called "After scraping session ends"

// Sends an email message with the parameters shown.
String message = "The '" + session.getName() + "' scrape is now finished.";
sutil.sendMail( "Status Report: Scrape Finished", message, "[email protected]", null, null );

sortSet

List sutil.sortSet ( Set set )
List sutil.sortSet ( Set set, boolean ignoreCase )
List sutil.sortSet ( Set set, Comparator comparator )

Description

Sorts the elements in a set into an ordered list.

Parameters

set The set whose elements should be sorted
ignoreCase (optional) True if case is irrelevant when sorting strings
comparator (optional) The Comparator used to compare objects in the set to determine order

Return Values

This method returns a sorted list of elements that are in the set.

Change Log

Version	Description
5.5.26a	Available in all editions.

Examples

Output all the values in a DataRecord in alphabetical order

// Generally when a sorted set or map is needed, a data structure should be chosen that stores the values
// in a sorted way, such as TreeSet or TreeMap. However, sometimes the set or map is returned by a library
// and may not have sorted values, although sorted values are needed.

List keys = sutil.sortSet(dataRecord.keySet(), true);

for(int i = 0; i < keys.size(); i++)
{
key = keys.get(i);
session.log(key + " : " + dataRecord.get(key));
}

startsWithUpperCase

boolean sutil.startsWithUpperCase ( String start, String string )

Description

Determine if one string is the start of another, without regards for case.

Parameters

start Value to be checked as the start, as a string.
string Value to be searched in, as a string.

Return Values

Returns true if string starts with start when case is not considered; otherwise, it returns false.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Does String Start With Another String (Case Insensitive)

// Check for RTMP URLs
sutil.startsWithUpperCase( "rtmp", session.getv( "URL" ) );

stringToFloat

float sutil.stringToFloat ( String str ) (professional and enterprise editions only)

Description

Parse string into a floating point number.

Parameters

str String to be transformed into a float.

Return Values

Returns the string's value as a floating point number.

Change Log

Version	Description
5.0.1a	Introduced for professional and enterprise editions.

Examples

Parse a String into a Float

// Parse Float from String
gpa = sutil.stringToFloat( session.getv( "GPA" ) );

stripHTML

XmlNode sutil.stripHTML (String content ) (enterprise edition only)

Description

Strips HTML from a string, replacing some tags with corresonding text-only equivalents.

Parameters

content The content to be stripped.

Return Values

Returns the stripped content.

Change Log

Version	Description
6.0.20a	Available in only the Enterprise edition.

Examples

Apply an XPath expression to the current response

String cleanedInput = sutil.stripHTML(input)

tidyDataRecord

DataRecord sutil.tidyDataRecord ( DataRecord record ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, Map<String, Boolean> settings ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, Map<String, Boolean> settings, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, Map<String, Boolean> settings ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, Map<String, Boolean> settings, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)

Description

Tidies the DataRecord by performing actions based on the values of the settings map given (or getDefaultTidySettings if none is given). Each value in the record that is a string will be tidied. Keys are not modified. The record given will not be modified, but a new record with the tidied values will be returned. If no settings are given, will use the values obtained from sUtil.getDefaultTidySettings().

Parameters

record The DataRecord to tidy (values in the record will not be overwritten with the tidied values)
scrapeableFile (optional) The current ScrapeableFile, used for resolving relative URLs when tidying links

settings (optional) The operations to perform when tidying, using a Map<String, Boolean>

The settings tidy settings and their default values are given below. If a key is missing in the settings map, that operation will not be performed.

Map Key	Default Value	Description of operation performed
trim	true	Trims whitespace from values
convertNullStringToLiteral	true	Converts the string 'null' (without quotes) to the null literal (unless it has quotes around it, such as "null")
convertLinks	true	Preserves links by converting <a href="link">text</a> to text (link), will try to resolve urls if scrapeableFile isn't null. Note that if there isn't a start and end <a> tag, this will do nothing
removeTags	true	Remove html tags, and attempts to convert line break HTML tags such as <br> to a new line in the result
removeSurroundingQuotes	true	Remove quotes from values surrounded by them -- "value" becomes value
convertEntities (professional and enterprise editions only)	true	Convert html entities
removeNewLines	false	Remove all new lines from the text. Replaces them with a space
removeMultipleSpaces	true	Convert multiple spaces to a single space, and preserve new lines
convertBlankToNull	false	Convert blank strings to null literal

ignoreLowerCaseKeys (optional) True if values with keys containing lowercase characters should be ignored

Return Values

A new DataRecord containing all the tidied values and any values that were not Strings in the original record. The values that were Strings but were not tidied as well as the DATARECORD value will not be in the returned record.

Change Log

Version	Description
5.5.26a	Available in all editions.
5.5.28a	Now uses a Map for the settings, rather than bit flags.

Examples

Tidy all values in an extracted DataRecord

DataRecord tidied = sutil.tidyDataRecord(dataRecord);

// Run code here to save the tidied record

tidyString

String sutil.tidyString ( String value ) (professional and enterprise editions only)
String sutil.tidyString ( String value, Map<String, Boolean> settings ) (professional and enterprise editions only)
String sutil.tidyString ( ScrapeableFile scrapeableFile, String value ) (professional and enterprise editions only)
String sutil.tidyString ( ScrapeableFile scrapeableFile, String value, Map<String, Boolean> settings ) (professional and enterprise editions only)

Description

Tidies the string by performing actions based on the values of the settings map.

Parameters

value The String to tidy

settings(optional) The operations to perform when tidying, using a Map<String, Boolean>

The tidy settings and their default values are given below. If a key is missing in the settings map, that operation will not be performed.

Map Key	Default Value	Description of operation performed
trim	true	Trims whitespace from values
convertNullStringToLiteral	true	Converts the string 'null' (without quotes) to the null literal (unless it has quotes around it, such as "null")
convertLinks	true	Preserves links by converting <a href="link">text</a> to text (link), will try to resolve urls if scrapeableFile isn't null. Note that if there isn't a start and end <a> tag, this will do nothing
removeTags	true	Remove html tags, and attempts to convert line break HTML tags such as <br> to a new line in the result
removeSurroundingQuotes	true	Remove quotes from values surrounded by them -- "value" becomes value
convertEntities (professional and enterprise editions only)	true	Convert html entities
removeNewLines	false	Remove all new lines from the text. Replaces them with a space
removeMultipleSpaces	true	Convert multiple spaces to a single space, and preserve new lines
convertBlankToNull	false	Convert blank strings to null literal

scrapeableFile (optional) The current ScrapeableFile, used for resolving relative URLs when tidying links

Return Values

The tidied string

Change Log

Version	Description
5.5.26a	Available in all editions.
5.5.28a	Now uses a Map for the settings, rather than bit flags.

Examples

Tidy a comment extracted from a website

Assuming the extracted text's HTML code was:
  <a href="http://www.somelink.com">This</a> was great because of these reasons: 
1 - Some reason 
2 - Another reason 
3 - Final reason

String comment = sutil.tidyString(scrapeableFile, dataRecord.get("COMMENT"));

The output text would be:

This (http://www.somelink.com) was great because of these reasons:
1 - Some reason
2 - Another reason
3 - Final reason

Run only specific operations

Map settings = new HashMap();
settings.put("convertEntities", true);
settings.put("trim", true);
String text = sutil.tidyString(" A String to tidy", settings);

unzipFile

void sutil.unzipFile ( String zippedFile )

Description

Unzip a zipped file. Contents will appear in the same directory as the zipped file.

Parameters

zippedFile File path to the zipped file, as a string.

Return Values

Returns void. If a file input/output error is experienced it will be thrown.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Unzip File

// Unzips contents of "c:/mydir/myzip.zip"
// to "c:/mydir/"

sutil.unzipFile( "c:/mydir/myzip.zip" );

writeValueToFile

void sutil.writeValueToFile ( Object value, String file, String charSet )

Description

Write to a file.

Parameters

value The string to be written.
file File path where the value should be created/written, as a string. If the file already exists it will be overwritten.
charSet Character set of the file, as a string. Java provides a list of supported character sets in its documentation.

Return Values

Returns void.

Change Log

Version	Description
5.0	Added for all editions.

Examples

Write To File

// Writes "abc",123 to file myfile.csv using character set UTF-8
sutil.writeValueToFile( "\"abc\",123", "myfile.csv", "UTF-8" );

Write To File Using Default Character Set

// Writes "abc",123 to file myfile.csv
// using screen-scraper's character set

sutil.writeValueToFile("\"abc\",123","myfile.csv", null);

Proxy Server API

Overview

screen-scraper provides three built-in objects for proxy sessions. These objects are: proxySession, request, and response. See the Variable scope section for details on which objects are available based on when scripts are run.

proxySession

Overview

This object gives you the ability to control interactions with the proxy session. It is only for use in scripts that associated with proxy sessions.

getVariable

Object proxySession.getVariable ( String identifier )

Description

Retrieve the value of the proxy session variable.

Parameters

identifier Name of the session variable, as a string.

Return Values

Returns the value of the session variable.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Retrieve Session Variable

// Places the proxy variable "CITY_CODE" in
// the local variable "cityCode"

cityCode = proxySession.getVariable( "CITY_CODE" );

log

void proxySession.log ( String message )

Description

Write to the log.

Parameters

message Message to be written to the log, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write to Log

// Writes "Inserting request parameters into the database."
// to the proxy session log

proxySession.log( "Inserting request parameters into the database." );

setVariable

void proxySession.setVariable ( String identifier, Object value )

Description

Set the value of a proxy session variable.

Parameters

identifier Name of the session variable, as a string.
value The value to be assigned to the session variable.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Set Session Variable

// Sets the variable "CITY_CODE" in the proxySession
// to be equal to the value of the get method of the dataSet
proxySession.setVariable( "CITY_CODE", dataSet.get( 0, "CITY_CODE" ) );

request

A request objects references a proxySession page request. Through this object you can control various aspects of the request.

Scripts run in the scraping engine use the scrapeable file to manipulate server requests.

addHTTPHeader

void request.addHTTPHeader ( String key, String value )

Description

Manually add an HTTP header.

Parameters

key Name of the HTTP header, as a string.
value Value to be associated with the header, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Add HTTP Header

// Set Cookie header to someCookieValue
request.addHTTPHeader( "Cookie" , "someCookieValue");

addPOSTParameter

void request.addPOSTParameter ( String key, String value )

Description

Add POST parameter to HTTP request.

Parameters

key Name of the POST parameter, as a string.
value Value of the POST parameter, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Add POST Parameter

// Add selectedState parameter to the POST variables
// with a value of Alaska

request.addPOSTParameter( "selectedState" , "AL");

getURLAsString

String request.getURLAsString ( String key )

Description

Retrieve the URL of the request.

Parameters

This method does not receive any parameters.

Return Values

Returns the URL of the request, as a string.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Retrieve Request URL

// Retrieve the URL String
url = request.getURLAsString();

removeHTTPHeader

void request.removeHTTPHeader ( String key, String value )

Description

Manually remove an HTTP header. Both the key and value have to be specified as HTTP headers allow for multiple headers with the same key.

Parameters

key Name of the HTTP header, as a string.
value Value to be associated with the header, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Remove HTTP Header

// Remove the Cookie header with the value someCookieValue
request.removeHTTPHeader( "Cookie" , "someCookieValue");

removePOSTParameter

void request.removePOSTParameter ( String key )

Description

Remove POST parameter from HTTP request.

Parameters

key Name of the POST parameter, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Remove POST Parameter

// Removes the POST parameter selectedState
request.removePOSTParameter( "selectedState" );

setRequestLine

void request.setRequestLine ( String requestMethod, String url, String httpVersion )

Description

Manually set the request line.

Parameters

requestMethod HTTP request type, as a string.
url Valid uri, as a string.
httpVersion HTTP version, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Set Request Line

// Sets the request line on the request
request.setRequestLine( "GET" , "http://somesite.com/somepage.html", "HTTP/1.1");

response

The response class provides you with a means for editing the responses received by the proxy server.

Scripts run in the scraping engine us the scrapeable file to manipulate server responses.

addHTTPHeader

void response.addHTTPHeader ( String key, String value )

Description

Add HTTP header to response.

Parameters

key Name of the header, as a string.
value Value associated with the header, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Add HTTP Header

// Adds the HTTP Header Set-Cookie with a value
// of someCookieValue

response.addHTTPHeader( "Set-Cookie" , "someCookieValue");

getContentAsString

String response.getContentAsString ( )

Description

Retrieve the content of the response.

Parameters

This method does not receive any parameters.

Return Values

Returns the content of the response, as a string.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Get the Response Text

// Retrieve the contents of the response
content = response.getContentAsString();

getStatusLine

String response.getStatusLine ( )

Description

Retrieve the status line of the response.

Parameters

This method does not receive any parameters.

Return Values

Returns the status line of the response, as a string.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Get the Status Line Text

// Retrieve the status line of the response
statusLine = response.getStatusLine();

removeHTTPHeader

void response.removeHTTPHeader ( String key, String value )

Description

Remove HTTP header from response.

Parameters

key Name of the header, as a string.
value Value associated with the header, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Remove HTTP Header

// Remove the HTTP Header Set-Cookie that has a
// value of someCookieValue

response.removeHTTPHeader( "Set-Cookie" , "someCookieValue");

setContentAsString

void response.setContentAsString ( String content )

Description

Manually set the response content.

Parameters

content Response text, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Change the Response Text

// Supply your own content to the response
response.setContentAsString( "<html> ... </html>");

setStatusLine

void response.setStatusLine ( String statusLine )

Description

Manually set the status line.

Parameters

statusLine New status line declaration, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Set Status Line

// Set the status line to HTTP/1.1 200 OK
response.setStatusLine( "HTTP/1.1 200 OK" );

Utilities API

Overview

There are many classes that can be very helpful in getting your scripts to run correctly. Many of these are initially developed in-house to speed up coding time and once they have proved very stable offered to the public. For all classes you will need to import their packages. They are not automatically imported like the built-in screen-scraper objects.

Classes

CsvWriter (com.screenscraper.csv): For recording data into a CSV file (helpful for Excel).
DataManagerFactory (com.screenscraper.datamanager): Facilitates the creation of an SqlDataManager.
ProxyServerPool (com.screenscraper.util): For setting up anonymization using your own proxies.
RetryPolicy and RetryPolicyFactory (com.screenscraper.util.retry): Objects that tell a scrapeable file how to check for errors, and optionally what to do before retrying to download them. .
SqlDataManager (com.screenscraper.datamanager): Facilitates writing of data into a SQL database.
XmlWriter (com.screenscraper.xml): Oftentimes you want to write extracted data directly to an XML file. This class facilitates doing that.

Apache Lang Library

Overview

The Apache Lang library provides enhancements to the standard Lang library of Java and can be particularly useful for completing tasks. As it is not a class that we maintain we will not document the methods in case they change without our notice but we invite you to look over how to use it in their API.

CSVReader

Overview

The CSVReader is not a class that is part of screen-scraper but is very useful and well put together. We have used it extensively. It is part of the opencsv package which actually holds the under pinnings of our own CsvWriter. As it is not a class that we maintain we will not document the methods in case they change without our notice but we invite you to look over how to use it in their API or brief documentation.

Using CSVReader

To use the CSVReader simply import it in your script, the same as you would any other utility class. The opencsv.jar file is already included in the Professional and Enterprise Editions of screen-scraper's default installation.

//import opencsv class
import au.com.bytecode.opencsv.*;

// read file
CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));

CsvWriter

Overview

This CsvWriter has been created to work particularly well with the screen-scraper objects. It is simple to use and provided to ease the task of keeping track of everything when creating a csv file.

The most used methods are documented here but if you would like more information you can read the JavaDoc for the CsvWriter.

CsvWriter

CsvWriter CsvWriter ( String filePath ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, boolean addTimeStamp ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, boolean addTimeStamp ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, char quotechar ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, char quotechar, char escapechar ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, char quotechar, String lineEnd ) (professional and enterprise editions only)
CsvWriter CsvWriter ( String filePath, char separator, char quotechar, char escapechar, String lineEnd ) (professional and enterprise editions only)

Description

Create a csv file writer.

Parameters

filePath File path to where the csv file should be created/saved, as a string.
addTimeStamp (optional) If true a time stamp will be added to the filename; otherwise, the filePath will remain unchanged.
seperator (optional) The character that should be used to separate the fields in the csv file, the default is char 44 (comma).
quotechar (optional) The character that should be used to quote fields, the default is char 34 (straight double-quotes).
escapechar (optional) The escape character for quotes, the default is char 34 (straight double-quotes).
lineEnd (optional) The end of line character, as a string. The default is the new line character ("\n").

Return Values

Returns a CsvWriter object. If it encounters an error it will be thrown.

Change Log

Version	Description
5.0	Available for Professional and Enterprise editions.
4.5.18a	Introduced in alpha version.

Class Location

com.screenscraper.csv.CsvWriter

Examples

Create CsvWriter

// Import class
import com.screenscraper.csv.*;

// Create CsvWriter with timestamp
CsvWriter writer = new CsvWriter("output.csv", true);

// Save in session variable for general access
session.setVariable( "WRITER", writer);

close

void csvWriter.close ( )

Description

Clear the buffer contents and close the file.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.
4.5.18a	Introduced in alpha version.

Examples

Close CsvWriter

// Retrieve CsvWriter from session variable
writer = session.getv( "WRITER" );

// Write buffer and close file
writer.close();

flush

void csvWriter.flush ( )

Description

Write the buffer contents to the file.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.
4.5.18a	Introduced in alpha version.

Examples

Write Data Record to CSV

// Retrieve CsvWriter from session variable
writer = session.getv( "WRITER" );

// Write dataRecord to the file (headers already set)
writer.write(dataRecord);

// Flush record to file (write it now)
writer.flush();

setHeader

void csvWriter.setHeader ( String[ ] header )

Description

Set the header row of the csv document. If the document already exists the headers will not be written. Also creates a data record mapping to ease writing to file.

Parameters

header Headers of csv file, as a one-dimensional array of strings.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.
4.5.18a	Introduced in alpha version.

If you want to use the data record mapping then the extractor tokens names should be all caps and all spaces should be replaced with underscores.

Examples

Add Headers to CSV File

// Create CsvWriter with timestamp
CsvWriter writer = new CsvWriter("output.csv", true);

// Create Headers Array
String[] header = {"Brand Name", "Product Title"};

// Set Headers
writer.setHeader(header);

// Write out to file
writer.flush();

// Save in session variable for general access
session.setVariable( "WRITER", writer);

write

void csvWriter.write ( DataRecord dataRecord )

Description

Write to the CsvWriter object.

Parameters

dataRecord The data record containing the mapped token matches (see setHeader). Note that the token names in the data record should be in all caps, and spaces should be replaced with underscores. For example, if one of your headers is "Product ID", the corresponding data record token should be "PRODUCT_ID". This is in keeping with the recommended naming convention for extractor pattern tokens.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for all editions.
4.5.18a	Introduced in alpha version.

Examples

Write Data Record to CSV

DataManagerFactory

Overview

This class is used to instantiate a data manager object. This is done to simplify the process of creating a data manager of a given type. Currently it only creates SqlDataManagers. A SQL data manager can be created without the use of this class, but it is simplified greatly through its use.

This class should no longer be used. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples

This class is only available for Professional and Enterprise editions of screen-scraper.

getMsSqlDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getMsSqlDataManager ( ScrapingSession session, String host, String database, String username, String password, String queryString) (professional and enterprise editions only)

Description

Create a MsSQL data manager object.

Parameters

session The scraping session that the data manager should be attached to.
host The database host (URL and maybe Port), as a string.
database The name of the database, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.
parameters URL encoded query string, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the MsSQL data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the MsSQL JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create MsSQL Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1";
database = "mydb";
username = "user";
password = "pwrd";
parameters = null;

// Get MsSQL datamanager
dm = DataManagerFactory.getMsSqlDataManager( session, host, database, username, password, parameters);

getMySqlDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getMySqlDataManager ( ScrapingSession session, String host, String database, String username, String password, String parameters ) (professional and enterprise editions only)

Description

Create a MySQL data manager object.

Parameters

session The scraping session that the data manager should be attached to.
host The database host (URL and maybe Port), as a string.
database The name of the database, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.
parameters URL encoded query string, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the MySQL data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the MySQL JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create MySQL Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = null;

// Get MySQL datamanager
dm = DataManagerFactory.getMySqlDataManager( session, host, database, username, password, parameters);

getOracleDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getOracleDataManager ( ScrapingSession session, String host, String database, String username, String password, String queryString ) (professional and enterprise editions only)

Description

Create an Oracle data manager object.

Parameters

session The scraping session that the data manager should be attached to.
host The database host (URL and maybe Port), as a string.
database The name of the database, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.
parameters URL encoded query string, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the Oracle data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the Oracle JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create an Oracle Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = null;

// Get Oracle datamanager
dm = DataManagerFactory.getOracleDataManager( session, host, database, username, password, parameters);

getPostreSqlDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getPostreSqlDataManager ( ScrapingSession session, String host, String database, String username, String password, String queryString ) (professional and enterprise editions only)

Description

Create a Postgre data manager object.

Parameters

session The scraping session that the data manager should be attached to.
host The database host (URL and maybe Port), as a string.
database The name of the database, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.
parameters URL encoded query string, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the Postgre data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the Postgre JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create a Postgre Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = null;

// Get PostgreSQL datamanager
dm = DataManagerFactory.getPostreSqlDataManager( session, host, database, username, password, parameters);

getSqliteDataManager

This method is no longer supported. Use a java.sql.BasicDataSource or com.screenscraper.datamanager.SshDataSource instead. See the SqlDataManager.buildSchemas page for examples.

SqlDataManager dataManagerFactory.getSqliteDataManager ( ScrapingSession session, String file, String username, String password ) (professional and enterprise editions only)

Description

Create a SQLite data manager object.

Parameters

session The scraping session that the data manager should be attached to.
file The file path of the sqlite file, as a string.
username Username that is being used to access the database, as a string.
password The username's associated password, as a string.

Return Values

Returns a SqlDataManager object. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

In order to create the Sqlite data manager you will need to make sure to install the appropriate jdbc driver. This can be done by downloading the Sqlite JDBC driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Examples

Create a SQLite Data Manager

// Import classes
import com.screenscraper.datamanager.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
file = "c:/db/mydb.sqlite";
username = "user";
password = "pwrd";

// Get Sqlite datamanager
dm = DataManagerFactory.getSqliteDataManager( session, file, username, password);

ProxyServerPool

Overview

The proxy server pool object is used to aid with manual anonymization of scrapes. An example of how to setup manual proxy pools is available in the documentation. You will likely want to read that page first if you are new to the process.

Additionally, you should reference the available method's available in the Anonymous API

ProxyServerPool

ProxyServerPool ProxyServerPool ( )

Description

Initiate a ProxyServerPool object.

Parameters

This method does not receive any parameters.

Return Values

Returns a ProxyServerPool.

Change Log

Version	Description
4.5	Available for all editions.

Class Location

com.screenscraper.util.ProxyServerPool

Examples

Creating ProxyServerPool

import com.screenscraper.util.*;

// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.

proxyServerPool = new ProxyServerPool();

filter

void proxyServerPool.filter ( int timeout )

Description

Set the timeout that will render a proxy as being bad.

Parameters

timeout Number of seconds before timeout, as an integer.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Setup Timout for Bad Proxies

import com.screenscraper.util.*;

// Create a new ProxyServerPool object.
proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file
proxyServerPool.populateFromFile( "proxies.txt" );

// Validate proxies up to 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );

// This method call tells screen-scraper to filter the list of>
// proxy servers using 7 seconds as a timeout value. That is,
// if a server doesnt respond within 7 seconds, it's deemed
// to be invalid.

proxyServerPool.filter( 7 );

getNumProxyServers

int proxyServerPool.getNumProxyServers ( int numProxyServers )

Description

Retrieve the number of available proxy servers.

Parameters

This method does not receive any parameters.

Return Values

Returns the number of available proxy servers, as an integer.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Good Proxies to File

outputProxyServersToLog

void proxyServerPool.outputProxyServersToLog ( )

Description

Write list of proxies to log.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Good Proxies to File

import com.screenscraper.util.*;

// Create a new ProxyServerPool object.
proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file>
proxyServerPool.populateFromFile( "proxies.txt" );

// Validate proxies up to 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );

// Set timout interval
proxyServerPool.filter( 7 );

// Write good proxies to file
proxyServerPool.writeProxyPoolToFile( "good_proxies.txt" );

// You might also want to write out the list of proxy servers
// to screen-scraper's log.

proxyServerPool.outputProxyServersToLog();

populateFromFile

void proxyServerPool.populateFromFile ( String filePath )

Description

Add proxy servers to pool using a text file.

Parameters

filePath Path to the file containing proxy settings, as a string. The format of the file is a hard return delimited list of domain:port listing.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Creating ProxyServerPool

import com.screenscraper.util.*;

// Create a new ProxyServerPool object. This object will
// control how screen-scraper interacts with proxy servers.

proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file
// containing a list of proxy servers. The format is very
// simple--you should have a proxy server on each line of
// the file, with the host separated from the port by a colon.
// For example:
// one.proxy.com:8888
// two.proxy.com:3128
// 29.283.928.10:8080
// But obviously without the slashes at the beginning.

proxyServerPool.populateFromFile( "proxies.txt" );

setAutomaticProxyCycling

void setAutomaticProxyCycling ( boolean cycleProxies )(professional and enterprise editions only)

Description

Enables or disables automatic proxy cycling. When this is set to false (default is true) the current proxy that was automatically selected from the pool will be used each time the next proxy is requested. When set to true, each call to the getNextProxy method will cycle as normal between all available proxies.

Parameters

A boolean value.

Return Value

None

Change Log

Version	Description
5.5.17a	Available in Professional and Enterprise editions.

Example

// Assuming a ProxyServerPool object was created previously, and
// stored in the PROXY_SERVER_POOL session variable.
pool = session.getv( "PROXY_SERVER_POOL" );

// This will cause the current proxy server to be reused until the
// value is set back to true.
pool.setAutomaticProxyCycling( false );

// The corresponding getter will indicate what the current value is.
session.log( "Automatically cycling proxies: " + pool.getAutomaticProxyCycling() );

setNumProxiesToValidateConcurrently

void proxyServerPool.setNumProxiesToValidateConcurrently ( int numProxies )

Description

Set the number of proxies that can be tested concurrently.

Parameters

numProxies Number of proxies to be validated concurrently, as an integer.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Test Proxies in Pool in Multiple Threads

setRepopulateThreshold

void proxyServerPool.setRepopulateThreshold ( int repopulateThreshold )

Description

Set threshold to get more proxy servers.

Parameters

repopulateThreshold Lowest number of proxies before more proxies are requested.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Good Proxies to File

import com.screenscraper.util.*;

// Create a new ProxyServerPool object.
proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file
proxyServerPool.populateFromFile( "proxies.txt" );

// Validate proxies up to 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );

// Set timout interval
proxyServerPool.filter( 7 );

// Write good proxies to file
proxyServerPool.writeProxyPoolToFile( "good_proxies.txt" );

// Write Proxy Servers to log
proxyServerPool.outputProxyServersToLog();

// As a scraping session runs, screen-scraper will filter out
// proxies that become non-responsive. If the number of proxies
// gets down to a specified level, screen-scraper can repopulate
// itself. Thats what this method call controls.

proxyServerPool.setRepopulateThreshold( 5 );

writeProxyPoolToFile

void proxyServerPool.writeProxyPoolToFile ( String path )

Description

Write list of proxies after invalid proxies have been removed.

Parameters

path File path to where the file should be written, as a string.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for all editions.

Examples

Write Good Proxies to File

import com.screenscraper.util.*;

// Create a new ProxyServerPool object.
proxyServerPool = new ProxyServerPool();

// Must be set on the session before other calls are made
session.setProxyServerPool(proxyServerPool);

// This tells the pool to populate itself from a file
proxyServerPool.populateFromFile( "proxies.txt" );

// Validate proxies up to 25 proxies at a time.
proxyServerPool.setNumProxiesToValidateConcurrently( 25 );

// Set timout interval
proxyServerPool.filter( 7 );

// Once filtering is done, it's often helpful to write the good
// set of proxies out to a file. That way you may not have to
// filter again the next time.

proxyServerPool.writeProxyPoolToFile( "good_proxies.txt" );

RetryPolicy

Overview

Retry Policies are objects that tell a scrapeable file how to check for errors, and optionally what to do before retrying to download the files. Some of the things that can be done are executing scripts when the page loads incorrectly or running Runnables. Usually these things would either request a new proxy, output some helpful information, or could simply stop the scrape. RetryPolicy is an interface and can be implemented to create a custom retry policy, or there is a RetryPolicyFactory class that can be used to create some standard policies.

This policy is checked AFTER all the extractors have been run. This allows for checks on whether extractor patterns matched or not, and also allows a page to have it's 'error status' based off of another page (since extractor patterns could execute scripts that scrape other files, and those files could set a variable that acts as a flag to a previous retry policy). This could also cause some problems if the scrape isn't built to handle a page whose extractors shouldn't be run before the error checking occurs.
This interface is in the com.screenscraper.util.retry package.

Interface Implementation

If you need a custom retry policy, you can implement your own version of it. Be aware that you will need to ensure the references it has to the scrapeableFile are to the correct scrapeableFile. This could be tricky if you use the session.setDefaultRetryPolicy method. When using the scrapeableFile.setRetryPolicy method, the scrapeableFile will be the correct object. The interface is given below.

To help ensure you can create custom retry policies that have access to the scraping session and the scrapeable file that is currently being checked, there is an AbstractRetryPolicy class in the same package as the interface. This class defines some default behavior and adds protected fields for the session and scrapeable file that get set before the policy is run. If you extend this abstract class you can access the session and scrapeable file through this.scrapingSession and this.theScrapeableFile. Due to some oddities with the interpreter it is best to reference these variables with 'this.' to eliminate a few problems that arise in a few specific cases.

public interface RetryPolicy
{
/**
* Checks to see if the page loaded incorrectly
*
* @return True on errors, false otherwise
* @throws Exception If something goes wrong while executing this method
*/
public boolean isError() throws Exception;

/**
* Runs this code when the page had an error. This could include things such as rotating the proxy.
*
* @throws Exception If something goes wrong while executing this method
*/
public void runOnError() throws Exception;

/**
* Returns a map that can be used to output an error message to indicate what checks failed. For instance,
* you could set a key to the value "Status Code" and the value '200', or a key with "Valid Page" and value 'false'
*
* @return Map of keys, or null if no values are indicated
*
* @throws Exception If something goes wrong while executing this method
*/
public Map getErrorChecksMap() throws Exception;

/**
* Returns true if the session variables should be reset before attempting to rescrape the file, if there was an error.
* This can be useful especially if extractors null session variables when they don't match, but the value is needed
* to rescrape the file.
*
* @return True if session variables should be reset if there was an error, false otherwise.
*/
public boolean resetSessionVariablesBeforeRescrape();

/**
* Returns true if the referrer should be reset before attempting to rescrape the file,
* if there was an error. This can be useful to reset so the referrer
* doesn't show the page you just requested.
*
* @return True if the referrer should be reset if there was an error, false otherwise.
*/
public boolean resetReferrerBeforeRescrape();

/**
* Returns true if errors should be logged to the log/web interface when they occur
*
* @return True if errors should be logged to the log/web interface when they occur
*/
public boolean shouldLogErrors();

/**
* Return the maximum number of times this policy allows for a retry before terminating in an error
*
* @return The maximum number of times to allow the ScrapeableFile to be rescraped before resulting in an error
*/
public int getMaxRetryAttempts();

/**
* This will be called if all the retry attempts for the scrapeable file failed.
* In other words, if the policy said to retry 25 times, after 25 failures this
* method will be called. Note that {@link #runOnError()} will be called just before this,
* as it is called after each time the scrapeable file fails to load
* correctly, including the last time it fails to load.
* 
* This should only contain code that handles the final error. Any proxy rotating, cookie
* clearing, etc... should generally be done in the {@link #runOnError()}
* method, especially since it will still be called after the final error.
*/
public void runOnAllAttemptsFailed();
}

getErrorChecksMap

Map getErrorChecksMap ( )

Description

Returns a map that can be used to output an error message to indicate what checks failed. For instance, you could set a key to the value "Status Code" and the value '200', or a key with "Valid Page" and value 'false'

Parameters

This method takes no parameters

Return Value

Map of keys, or null if no values are indicated

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

import com.screenscraper.util.retry.RetryPolicy;

_log = log;
_session = session;

RetryPolicy policy = new RetryPolicy()
{
Map errorMap = new HashMap();

boolean isError() throws Exception
{
errorMap.put("Was Error On Request", scrapeableFile.wasErrorOnRequest());
return scrapeableFile.wasErrorOnRequest();
}

void runOnError() throws Exception
{
session.executeScript("Rotate Proxy");
}

Map getErrorChecksMap() throws Exception
{
return errorMap;
}

boolean resetSessionVariablesBeforeRescrape()
{
return true;
}

boolean shouldLogErrors()
{
return true;
}

int getMaxRetryAttempts()
{
return 5;
}

boolean resetReferrerBeforeRescrape()
{
return false;
}

void runOnAllAttemptsFailed()
{
_log.logError("Failed to fix errors with the retry policy, stopping scrape");
_session.stopScraping();
}
};

scrapeableFile.setRetryPolicy(policy);

getMaxRetryAttempts

int getMaxRetryAttempts ( )

Description

Return the maximum number of times this policy allows for a retry before terminating in an error

Parameters

This method takes no parameters

Return Value

The maximum number of times to allow the ScrapeableFile to be rescraped before resulting in an error

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

isError

boolean isError ( )

Description

Checks to see if the page loaded incorrectly

Parameters

This method takes no parameters

Return Value

True on errors, false otherwise

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

resetReferrerBeforeRescrape

boolean resetReferrerBeforeRescrape ( )

Description

Returns true if the referrer should be reset before attempting to rescrape the file, if there was an error. This can be useful to reset so the referrer doesn't show the page you just requested.

Parameters

This method takes no parameters

Return Value

True if the referrer should be reset if there was an error, false otherwise.

Change Log

Version	Description
6.0.36a	Available in all editions.

Examples

Create a custom RetryPolicy

resetSessionVariablesBeforeRescrape

boolean resetSessionVariablesBeforeRescrape ( )

Description

Returns true if the session variables should be reset before attempting to rescrape the file, if there was an error. This can be useful especially if extractors null session variables when they don't match, but the value is needed to rescrape the file.

Parameters

This method takes no parameters

Return Value

True if session variables should be reset if there was an error, false otherwise.

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

runOnAllAttemptsFailed

void runOnAllAttemptsFailed ( )

Description

This will be called if all the retry attempts for the scrapeable file failed. In other words, if the policy said to retry 25 times, after 25 failures this method will be called. Note that runOnError will be called just before this, as it is called after each time the scrapeable file fails to load correctly, including the last time it fails to load.

This should only contain code that handles the final error. Any proxy rotating, cookie clearing, etc... should generally be done in the runOnError method, especially since it will still be called after the final error.

Parameters

This method takes no parameters

Return Value

This method returns void

Change Log

Version	Description
6.0.37a	Available in all editions.

Examples

Create a custom RetryPolicy

runOnError

void runOnError ( )

Description

Runs this code when the page had an error. This could include things such as rotating the proxy. This code will be executed just before the page is downloaded again.

Parameters

This method takes no parameters

Return Value

This method returns void

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

shouldLogErrors

boolean shouldLogErrors ( )

Description

Returns true if errors should be logged to the log/web interface when they occur

Parameters

This method takes no parameters

Return Value

True if errors should be logged to the log/web interface when they occur

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Create a custom RetryPolicy

RetryPolicyFactory

Overview

Class used to create simple Retry Policies. See the RetryPolicy page for more details on what a RetryPolicy does. This class is found in the com.screenscraper.util.retry package.

getBasicPolicy

RetryPolicy RetryPolicyFactory.getBasicPolicy ( int retries, String scriptOnFail )
RetryPolicy RetryPolicyFactory.getBasicPolicy ( int retries, Runnable runnableOnFail )

Description

Policy that retries if there was an error on the request by status code. Executes the runnable given before retrying.

Parameters

retries How many times max to retry before failing
scriptOnFail/runnableOnFail What to run (script or Runnable) if the policy shows an error on the page. This will be run just before the page is downloaded again. The script or Runnable will be executed in the current thread, so the scrapeable file will not be redownloaded until this runnable or script has finished executing.

Return Value

The RetryPolicy to set in the ScrapeableFile

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set a basic retry policy

import com.screenscraper.util.retry.RetryPolicyFactory;
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getBasicPolicy(5, "Rotate Proxy"));

getEmptyPolicy

RetryPolicy RetryPolicyFactory.getEmptyPolicy ( )

Description

Policy that returns no error. Useful for having a session-wide retry policy, but then using this for a particular scrapeable file so it doesn't use the session's policy

Parameters

This method takes no parameters

Return Value

The RetryPolicy to set in the ScrapeableFile

Change Log

Version	Description
6.0.25a	Available in all editions.

Examples

Set an empty retry policy

import com.screenscraper.util.retry.RetryPolicyFactory;
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getEmptyPolicy());

getMatchingRegexPolicy

RetryPolicy RetryPolicyFactory.getMatchingRegexPolicy ( int retries, String regex )
RetryPolicy RetryPolicyFactory.getMatchingRegexPolicy ( int retries, String regex, String scriptOnFail )
RetryPolicy RetryPolicyFactory.getMatchingRegexPolicy ( int retries, String regex, Runnable runnableOnFail )

Description

Policy that requires a Regular Expression to match the page content (including headers) in order to be considered valid.

Parameters

retries How many times max to retry before failing
regex A Regular expression that must match the page content for the page to be considered valid
scriptOnFail/runnableOnFail (optional) What to run (script or Runnable) if the policy shows an error on the page. This will be run just before the page is downloaded again. The script or Runnable will be executed in the current thread, so the scrapeable file will not be redownloaded until this runnable or script has finished executing.

Return Value

The RetryPolicy to set in the ScrapeableFile

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set a matching regex policy

import com.screenscraper.util.retry.RetryPolicyFactory;
// Require the response to contain the text "Google.com". Since this is a regex, the . must have a \ before it
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getMatchingRegexPolicy(5, "Google\\.com", "Rotate Proxy"));

getMissingRegexPolicy

RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex )
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex, String scriptOnFail )
RetryPolicy RetryPolicyFactory.getMissingRegexPolicy ( int retries, String regex, Runnable runnableOnFail )

Description

Policy that requires a Regular Expression NOT to match the page content (including headers) in order to be considered valid. In other words, if the Regular Expression matches, it means that the page should be rescraped.

Parameters

retries How many times max to retry before failing
regex A Regular expression that must NOT match the page content for the page to be considered valid
scriptOnFail/runnableOnFail (optional) What to run (script or Runnable) if the policy shows an error on the page. This will be run just before the page is downloaded again. The script or Runnable will be executed in the current thread, so the scrapeable file will not be redownloaded until this runnable or script has finished executing.

Return Value

The RetryPolicy to set in the ScrapeableFile

Change Log

Version	Description
5.5.29a	Available in all editions.

Examples

Set a matching regex policy

import com.screenscraper.util.retry.RetryPolicyFactory;
// Require the response to not contain the text "Google.com". Since this is a regex, the . must have a \ before it
scrapeableFile.setRetryPolicy(RetryPolicyFactory.getMissingRegexPolicy(5, "Google\\.com", "Rotate Proxy"));

SqlDataManager

Overview

This object simplifies your interactions with a JDBC-compliant SQL database. It can work with various types of databases and even in a multi-threaded format to allow scrapes to continue without having to wait for the queries to process. View an example of how to use the SqlDataManager.

This feature is only available for Professional and Enterprise editions of screen-scraper.

Prefer a more traditional approach? See an example of Working with MySQL databases.

In order to use the SqlDataManager you will need to make sure to install the appropriate JDBC driver. This can be done by downloading the driver and placing it in the lib/ext folder in the screen-scraper installation directory.

Event Callbacks

Overview

Add an event callback to SqlDataManager object.

This feature is only available for Professional and Enterprise editions of screen-scraper.

Before adding an event to the SqlDataManager, you must build the schema of any tables you will use because events are related to table operations such as inserting data

Parameters

schema Case insensitive schema (table) name
when The event assiciated with the schema that should trigger the callback
- onCreate Triggered whenever the DataManager creates a new DataNode, such as the first addData since the last commit
- onAddData Triggered after dm.addData is called
- onWrite Triggered immediately before the DataNode is written (DataWriter.write). Applies to both inserts and updates
- onInsert Triggered immediately before the data is going to be inserted as a new row in the database as opposed to updating an existing row
- onUpdate Triggered immediately before existing database values are going to be updated as opposed to inserted as a new row
- onWriteError Triggered if an exception was thrown when trying to write to the database
- afterWrite Triggered immediately after the DataNode is written. At this point any values written are in the DataNode, including autogenerated keys
listener A callback interface that must be implemented by the client. There is a single method public void handleEvent(DataManagerEvent event) that needs to be implemented. The DataManagerEvent has a method getDataNode() to retrieve the relevant DataNode.

Return Values

Returns a DataManagerEventListener. The same DataManagerEventListener object that was passed in

Change Log

Version	Description
5.5	Available for professional and enterprise editions.

Class Locations

com.screenscraper.datamanager.DataManager
com.screenscraper.datamanager.DataManagerEventListener
com.screenscraper.datamanager.DataManagerEventSource.EventFireTime

Examples

Register a callback to log out database write errors to 'person' table to the web interface

import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.SqlDataManager;
import org.apache.commons.dbcp.BasicDataSource;

// BasicDataSource
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( "user" );
ds.setPassword( "psswrd" );
ds.setUrl( "jdbc:mysql://127.0.0.1:3306/mydb?UTF8ENCODING" );
ds.setMaxActive( 10 );

// Create Data Manager
dm = new SqlDataManager( ds, session );
dm.buildSchemas();
_session = session;

//This will log out any write errors to the 'person' table to the screen-scraper web interface
dm.addEventListener("person", DataManagerEventSource.EventFireTime.onWriteError,
new DataManagerEventListener() {
public void handleEvent(DataManagerEvent event) {
DataNode n = event.getDataNode();
_session.webError("Database Write Error",n.getObjectMap());
}
}
);

addData

void sqlDataManager.addData ( String table, Map data ) (professional and enterprise editions only)
void sqlDataManager.addData ( String table, String columnName, Object value ) (professional and enterprise editions only)

Description

Add data to fields, in preparation for insertion into a database.

When adding data in a many-to-many relation, if setAutoManyToMany is set to false, a null row should be inserted into the relating table so the datamanager will link the keys correctly between related tables. For example, dm.addData("many_to_many", null);

Before adding data the first time, you must build the schema of any tables you will use, as well as add foreign keys if you are not using a database engine that natively supports them (such as InnoDB for MySQL).

Parameters

table Name of the database table that the data corresponds to, as a string.
data (this or columnName and value) Map using field names as keys to desired values to be added in the database for fields. This can be a dataRecord object.
columnName (requires value) The name of the column/field in the database table that the data is being added for, as a string.
value (requires columnName) The value being inserted into the column/field.

The SqlDataManager will attempt to convert a value that is given to the correct format for the database. For example, if the database requires an int for a column named age, dm.addData("table", "age", "32") will convert the String "32" to an int before adding it to the database. See the table below the examples for other types of java objects and how they map to SQL types.

The table and columnName parameters are not case sensitive. The same is true for the key values in the data map.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Add Data from Data Record

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add DataRecord Information into person table
dm.addData( "person", dataRecord );

// Create and add query to buffer
dm.commit( "person" );

Add Data In a Specific Field

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add DataRecord Information into person table
dm.addData( "person", dataRecord );

// Add Specific Other Data
dm.addData( "person", "date_collected", "2010-07-13" );

// Create and add query to buffer
dm.commit( "person" );

Java Object and SQL Type Mappings

Since the DataManager is designed with screen-scraper in mind all inputs support using the String type in addition to their corresponding Java object type, but the String needs to be parseable into the corresponding data type. For example if there is a column that is defined as an Integer in the database then the String needs to be parseable by Integer.parseInt(String value). Here is a mapping of the sql types (based on java.sql.Types) to Java objects:

SQL Type		Java Object
java.sql.Types.CHAR		String
java.sql.Types.VARCHAR		String
java.sql.Types.LONGVARCHAR		String
java.sql.Types.LONGNVARCHAR		String
java.sql.Types.NUMERIC		BigDecimal
java.sql.Types.DECIMAL		BigDecimal
java.sql.Types.TINYINT		Integer
java.sql.Types.SMALLINT		Integer
java.sql.Types.INTEGER		Integer
java.sql.Types.BIGINT		Long
java.sql.Types.REAL		Float
java.sql.Types.FLOAT		Double
java.sql.Types.DOUBLE		Double
java.sql.Types.BIT		Boolean
java.sql.Types.BINARY		ByteArray
java.sql.Types.VARBINARY		ByteArray
java.sql.Types.LONGVARBINARY		ByteArray
java.sql.Types.DATE		SQLDate or Long
java.sql.Types.TIME		SQLTime or Long
java.sql.Types.TIMESTAMP		SQLTime or Long
java.sql.Types.ARRAY		Object
java.sql.Types.BLOB		ByteArray
java.sql.Types.CLOB		Object
java.sql.Types.JAVA_OBJECT		Object
java.sql.Types.OTHER		Object

addForeignKey

void sqlDataManager.addForeignKey ( String table, String columnName, String foreignTable, String foreignColumnName ) (professional and enterprise editions only)

Description

Manually setup table connection (key matching).

If SqlDataManager.buildSchemas is called, any foreign keys manually added before that point will be overridden or erased.

Parameters

table Name of the database table with the primary key, as a string.
columnName Column/field name of the primary key, as a string.
foreignTable Name of the database table with the foreign key, as a string.
foreignColumnName Column/field name of the foreign key, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

If the database has some indication of foreign keys then these will be followed automatically. If the database does not allow for foreign key references then you will need to build the table connections using this method.

Examples

Setup Table Connections

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas
dm.buildSchemas();

// Setup table connections
// parameter order: "child_table", "child_column", "parent_table", "parent_column"
dm.addForeignKey( "job", "person_id", "person", "id");
dm.addForeignKey( "address", "person_id", "person", "id");

addSessionVariables

void sqlDataManager.addSessionVariables ( String table ) (professional and enterprise editions only)

Description

Manually add session variable data to fields, in preparation for insertion into a database.

Parameters

table Name of the database table that the data corresponds to, as a string.

The keys from the session will be matched in a case insensitive way to the column names of the database.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Add Data from Session Variables

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add session variables into person table
dm.addSessionVariables( "person" );

// Create and add query to buffer
dm.commit( "person" );

addSessionVariablesOnCommit

void sqlDataManager.addSessionVariablesOnCommit ( boolean automate ) (professional and enterprise editions only)

Description

Add corresponding session variables to the tables automatically when it is committed.

Parameters

automate If true then session variables whose names match field names (case insensitive) will be automatically added to queries when the fields are committed.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Automate Session Variables

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

// Write Information to Database
// automatically using session variables
dm.addSessionVariablesOnCommit( true );

buildSchemas

void sqlDataManager.buildSchemas ( ) (professional and enterprise editions only)
void sqlDataManager.buildSchemas ( List tables ) (professional and enterprise editions only)

Description

Collect the database schema information, including foreign key relations between tables.

Schemas must be built for any tables that will be used by this DataManager before data can be added.

Parameters

tables (option) A list of table names, as strings, for which to build schemas.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Build Database Schema using a BasicDataSource

Build Database Schema using an SshDataSource

import com.screenscraper.datamanager.sql.*;

// SshDataSource
ds = new SshDataSource( "[email protected]", "ssPass" );
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( "user" );
ds.setPassword( "psswrd" );

// Accepted values for the first parameter of setUrl are:
// SshDataSource.MYSQL
// SshDataSource.MSSQL
// SshDataSource.ORACLE
// SshDataSource.POSTGRESQL
ds.setUrl( SshDataSource.MYSQL, 3306, "database" );

// Create Data Manager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

clearAllData

void sqlDataManager.clearAllData ( ) (professional and enterprise editions only)

Description

Clear all data from the data manager without writing it to the database. This includes all data previously committed but not yet written.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Write to Database

// Get data manager
dm = session.getv( "DATAMANAGER" );

// Clear information from the datamanager
dm.clearAllData();

clearSessionVariables

void sqlDataManager.clearSessionVariables ( String table ) (professional and enterprise editions only)

Description

Clear session variables corresponding to the fields of a specific table (case insensitive).

Parameters

table Name of the table whose field names will be used to clear session variables.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Clear Session Variables

// Get data manager
dm = session.getv( "DATAMANAGER" );

// Clear session variables for people table
dm.clearSessionVariables( "people" );

clearSessionVariablesOnCommit

void sqlDataManager.clearSessionVariablesOnCommit ( boolean clearVars ) (professional and enterprise editions only)

Description

Clear session variables corresponding to a committed table automatically.

Parameters

clearVars If true then session variables whose names match field names (case insensitive) will be automatically cleared when the table is committed.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Automate Session Variables

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

// Write Information to Database
// automatically using session variables
dm.addSessionVariablesOnCommit( true );

// Clear session variables on commit
// to avoid carry over
dm.clearSessionVariablesOnCommit( true );

close

void sqlDataManager.close ( ) (professional and enterprise editions only)

Description

Close data manager's connections.

If there is data that has not yet been written to the database when this method is called it will not be written.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Close Data Manager

// Get Data Manager
dm = session.getv( "DATAMANAGER" );

// Close Data Manager
dm.close();

commit

void sqlDataManager.commit ( String table ) (professional and enterprise editions only)

Description

Commit a prepared row of data into queue. Once called the data can no longer be edited. When working with multiple tables that relate by a foreign key, it is important to commit rows in the correct order. The rows in each of the child tables should be committed before the parent, or they will not be correctly linked when written to the database.

This does not write the row of data to the database, but rather puts it in queue to be written at a later time.

Parameters

table Name of the database table that the data corresponds to, as a string.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Commit Database Row

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add session variables into person table
dm.addSessionVariables( "person" );

// Create and add query to buffer
dm.commit( "person" );

commitAll

void sqlDataManager.commitAll ( ) (professional and enterprise editions only)

Description

Commit prepared rows of data for all tables into queue. Once called the data can no longer be edited.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Commit Database Row

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Add session variables into tables
dm.addSessionVariables( "person" );
dm.addSessionVariables( "address" );
dm.addSessionVariables( "jobs" );

// Create and add queries to buffer
dm.commitAll();

flush

boolean sqlDataManager.flush ( ) (professional and enterprise editions only)

Description

Write committed data to the database. Any data that has not been committed using either the commit or commitAll method will be lost and not written to the database.

Parameters

This method does not receive any parameters.

Return Values

Returns true data was successfully written to the database; otherwise, it returns false.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Write to Database

// Get data manager
dm = session.getv( "DATAMANAGER" );

// Write Information to Database
dm.flush();

getConnection

Connection sqlDataManager.getConnection ( ) (professional and enterprise editions only)

Description

Retrieve the connection object of the data manager. This can be helpful if you want to do something that the data manager cannot do easily, such as query the database.

Be sure to close the connection once it is no longer needed. Failure to do so could exhaust the connection pool used by the datamanger, which will cause the scraping session to hang.

Parameters

This method does not receive and parameters.

Return Values

Returns a connection object matching the one used in the data manager.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Retrieve Database Connection

// Import SQL object
import java.sql.*;

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Retrieve connection
connection = dm.getConnection();

try {
PreparedStatement ps = connection.prepareStatement( "UPDATE table SET status=?" );
ps.setString( 1, session.getv("STATUS") );
ps.executeUpdate();
} finally {
connection.close();
}

getLastAutoIncrementKey

DataObject sqlDataManager.getLastAutoIncrementKey (String table) (professional and enterprise editions only)

Description

Retrieve the last autogenerated primary key, if any, for the given table

Parameters

case insensitve table name

Return Values

Returns a com.screenscraper.datamanager.DataObject containing the primary key.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Retrieve AutoIncrement Key

//Save some data
dm.addData("table", "column", "important data");
dm.commit("table");
dm.flush("table");

//Retrieve the key associated with the data we just saved as an Integer
key = dm.getLastAutoIncrementKey("table").getInt();

setAutoManyToMany

void sqlDataManager.setAutoManyToMany ( boolean enable ) (professional and enterprise editions only)

Description

Sets whether or not the data manager should automatically take care of many-to-many relationships.

Parameters

enable Whether the data manager should automatically run a commit for many-to-many tables when the connected tables are committed, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

If the many-to-many table has more information than just the keys then you will want to leave this feature turned off so that you can add more data than just the keys before committing.

Examples

Set Automatic Commits for Many-to-many Tables

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Set Automatic Commit on Many-to-many tables
dm.setAutoManyToMany( true );

setGlobalMergeEnabled

void sqlDataManager.setGlobalMergeEnabled ( boolean merge )

This feature is only available for Professional and Enterprise editions of screen-scraper.

Description

Set global merge status. When conflicts exist in data, a merge of true will take the newer values and save them over previous null values.

When merging or updating values in a table, that table must have a Primary Key. When the Primary Key is set to autoincrement, if the value of that key was not set with the addData method the DataManager will create a new row rather than update or merge with an existing row. One solution is to use an SqlDuplicateFilter to set fields that would identify an entry as a duplicate and automatically insert the value of the autoincrement key when data is committed.

By default if the data that you are inserting has the same primary key as data already in the database it will ignore the insert. This behavior can be modified by the dm.setGlobalUpdateEnabled and dm.setGlobalMergeEnabled methods of the DataManager. This allows for four different settings to insert data:

Update	Merge	Resulting Action
false	false	Ignore row on duplicate
true	false	Update only values whose corresponding columns are currently NOT NULL in the database
false	true	Update only values whose corresponding columns are currently NULL in the database
true	true	Update all values to new data

Parameters

merge Whether to turn on global merge or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Global Database Merge

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

// Set Global Update
dm.setGlobalUpdateEnabled( true );

// Set Global Merge
dm.setGlobalMergeEnabled( true );

setGlobalUpdateEnabled

void sqlDataManager.setGlobalUpdateEnabled ( boolean update )

This feature is only available for Professional and Enterprise editions of screen-scraper.

Description

Set update status globally. When conflicts exist in data, an update of true will take the newer values and save them over previous non-null values.

Update	Merge	Resulting Action
false	false	Ignore row on duplicate
true	false	Update only values whose corresponding columns are currently NOT NULL in the database
false	true	Update only values whose corresponding columns are currently NULL in the database
true	true	Update all values to new data

Parameters

update Whether to turn on global update or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Global Database Update

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Build Schemas For all Tables
dm.buildSchemas();

// Set Global Update
dm.setGlobalUpdateEnabled( true );

setLoggingLevel

void sqlDataManager.setLoggingLevel ( Level level ) (professional and enterprise editions only)

Description

Set the error logging level. Currently only DEBUG and ERROR levels are supported. At the DEBUG level, all queries and results will be output to the log.

Parameters

level log4j logging level object.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Logging Level

// Get MySQL datamanager
dm = session.getVariable( "DATAMANAGER" );

// Set Logging Level
dm.setLoggingLevel( org.apache.log4j.Level.ERROR );

// Build Schemas
dm.buildSchemas();

setMergeEnabled

void sqlDataManager.setMergeEnabled ( boolean merge )

This feature is only available for Professional and Enterprise editions of screen-scraper.

Description

Set merge status for a table. When conflicts exists in data, a merge of true will take the newer values and save them over previous null values.

By default if the data that you are inserting has the same primary key as data already in the database it will ignore the insert. This behavior can be modified for a specific table by the dm.setUpdateEnabled and dm.setMergeEnabled methods of the DataManager. This allows for four different settings to insert data:

Update	Merge	Resulting Action
false	false	Ignore row on duplicate
true	false	Update only values whose corresponding columns are currently NOT NULL in the database
false	true	Update only values whose corresponding columns are currently NULL in the database
true	true	Update all values to new data

Parameters

table Name of the database table, as a string.
merge Whether to turn on global merge or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Database Table Merge

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Set Merge
dm.setMergeEnabled( "person", true );

setMultiThreadWrite

void sqlDataManager.setMultiThreadWrite ( int numThreads ) (professional and enterprise editions only)

Description

Set number of threads that the data manager can have open at once. When set higher than one, the scraping session can continue to run and download pages while the database is being written. This can decrease the time required to run a scrape, but also makes debugging harder as there is no guarantee about the order in which data will be written. It is recommended to leave this setting alone while developing a scrape. Also, the flush method will always return true if more than one thread is being used to write to the database, even if the write failed.

Parameters

numThreads The number of threads that the data manager can start and use to write data, as an integer.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Thread Count

// Import classes
import com.screenscraper.datamanager.*;
import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// Set Variables
host = "127.0.0.1:3306";
database = "mydb";
username = "user";
password = "pwrd";
parameters = "autoReconnect=true&useCompression=true";

// Build the BasicDataSource for the database connection
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( username );
ds.setPassword( password );
ds.setUrl( "jdbc:mysql://" + host + "/" + database + "?" + parameters );
ds.setMaxActive( 100 );

// Get MySQL datamanager
dm = new SqlDataManager( ds, session );

// Set number of threads that can be opened
// when interacting with the database
dm.setMultiThreadWrite(10);

// Build Schemas For all Tables
dm.buildSchemas();

setUpdateEnabled

void sqlDataManager.setUpdateEnabled ( boolean update )

This feature is only available for Professional and Enterprise editions of screen-scraper.

Description

Set update status for a given table. When conflicts exists in data, an update of true will take the newer values and save them over previous non-null values.

Update	Merge	Resulting Action
false	false	Ignore row on duplicate
true	false	Update only values whose corresponding columns are currently NOT NULL in the database
false	true	Update only values whose corresponding columns are currently NULL in the database
true	true	Update all values to new data

Parameters

table The name of the database table, as a string.
update Whether to turn on global update or not, as a boolean.

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Set Database Table Update

// Get datamanager
dm = session.getv( "DATAMANAGER" );

// Set Update on person table
dm.setUpdateEnabled( "person", true );

SqlDataManager

SqlDataManager SqlDataManager ( BasicDataSource dataSource, ScrapingSession session ) (professional and enterprise editions only)

Description

Initiate a SqlDataManager object.

Before adding data to the SqlDataManager, you must build the schema of any tables you will use, as well as add foreign keys if you are not using a database engine that natively supports them (such as InnoDB for MySQL).

Parameters

dataSource A BasicDataSource object.
session The scraping session to which the data manager should be associated.

Return Values

Returns a SqlDataManager. If an error is experienced it will be thrown.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Class Location

com.screenscraper.datamanager.sql.SqlDataManager

Examples

Create a SQL Data Manager

import com.screenscraper.datamanager.sql.*;
import org.apache.commons.dbcp.BasicDataSource;

// BasicDataSource
BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName( "com.mysql.jdbc.Driver" );
ds.setUsername( "user" );
ds.setPassword( "psswrd" );
ds.setUrl( "jdbc:mysql://127.0.0.1:3306/mydb?UTF8ENCODING" );
ds.setMaxActive( 100 );

// Create Data Manager
dm = new SqlDataManager( ds, session );

Create a SQL Data Manager Over SSH Tunnel

SqlDuplicateFilter

Overview

SqlDuplicateFilters are designed to filter duplicates when more information than just a primary key might define a duplicate entry. For example, you might define a unique person by their SSN, driver's license number, or by their first name, last name, and phone number. It is also possible that a single person may have multiple phone numbers, and if any of them match then the duplicate constraint should be met. Using an SqlDuplicateFilter can check for conditions such as this and correctly recognize duplicate entries.

This feature is only available for Professional and Enterprise editions of screen-scraper.

Examples

Register a new duplicate filter

// Import classes
import com.screenscraper.datamanager.sql.*;

//Get the data manager
SqlDataManager dm = session.getVariable( "_DATAMANAGER" );

// Register a new duplicate filter
// Check for duplicate people, so register it for the people table
SqlDuplicateFilter nameFilter = SqlDuplicateFilter.register("people", dm);

//Add constraints to match when a first name, middle initial, and last name match a different row in the database
nameFilter.addConstraint( "people", "first_name" );
nameFilter.addConstraint( "people", "middle_initial" );
nameFilter.addConstraint( "people", "last_name" );

Match Duplicates across tables

Sometimes the data will need to be filtered across multiple tables, or possibly different constaints might indicate a duplicate. An example of this is a person might be a duplicate if their SSN matches OR if their driver's license number matches. Alternatively, they may be a duplicate when they have the same first name, last name, and phone number.

import com.screenscraper.datamanager.sql.SqlDuplicateFilter;

/*
Perform the setup of the SqlDataManager, as shown previously, and name the variable dm.
*/

//register an SqlDuplicateFilter with the DataManager for the social security number
SqlDuplicateFilter ssnDuplicate = SqlDuplicateFilter.register( "person", dm );
ssnDuplicate.addConstraint( "person", "ssn" );

//register an SqlDuplicateFilter with the DataManager for the drivers license
SqlDuplicateFilter licenseDuplicate = SqlDuplicateFilter.register( "person", dm );
licenseDuplicate.addConstraint( "person", "drivers_license" );

//register an SqlDuplicateFilter with the DataManager for the name/phone number
//where the person table has a child table named phone.
SqlDuplicateFilter namePhoneDuplicate = SqlDuplicateFilter.register( "person", dm );
namePhoneDuplicate.addConstraint( "person", "first_name" );
namePhoneDuplicate.addConstraint( "person", "last_name" );
namePhoneDuplicate.addConstraint( "phone", "phone_number" );

Duplicate filters are checked in the order they are added, so consider perfomance when creating duplicate filters. If, for instance, most duplicates will match on the social security number, create that filter before the others. Also make sure to add indexes into your database on those columns that you are selecting by or else performance will rapidly degrade as your database gets large.

Duplicates will be filtered by any one of the filters created. If multiple fields must all match for an entry to be a duplicate, create a single filter and add each of those fields as constraints, as shown in the third filter created above. In other words, constraints added to a single filter will be ANDed together, while seperate filters will be ORed.

addConstraint

void sqlDuplicateFilter.addConstraint ( String table, String column ) (professional and enterprise editions only)

Description

Add a constraint that checks the value of new entries against the value of entries already in the database for a given column and table.

Parameters

table Name of the database table, either the same table the filter is registered to or one of it's children
column The column that will be checked in the table for a duplicate with new values

Return Values

Returns void.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Register a new duplicate filter

import com.screenscraper.datamanager.sql.SqlDuplicateFilter;

/*
Perform the setup of the SqlDataManager, as shown previously, and name the variable dm.
*/

//register an SqlDuplicateFilter with the DataManager for the social security number
SqlDuplicateFilter ssnDuplicate = SqlDuplicateFilter.register( "person", dm );
ssnDuplicate.addConstraint( "person", "ssn" );

//register an SqlDuplicateFilter with the DataManager for the social security number
SqlDuplicateFilter licenseDuplicate = SqlDuplicateFilter.register( "person", dm );
licenseDuplicate.addConstraint( "person", "drivers_license" );

//register an SqlDuplicateFilter with the DataManager for the name/phone number
//where the person table has a child table named phone.
SqlDuplicateFilter namePhoneDuplicate = SqlDuplicateFilter.register( "person", dm );
namePhoneDuplicate.addConstraint( "person", "first_name" );
namePhoneDuplicate.addConstraint( "person", "last_name" );
namePhoneDuplicate.addConstraint( "phone", "phone_number" );

register

SqlDuplicateFilter SqlDuplicateFilter.register ( String table, SqlDataManager dataManager ) (professional and enterprise editions only)

Description

Create an SqlDuplicateFilter for a specific table and register it with the data manager.

Parameters

table Name of the database table with the primary key, as a string.
dataManager The data manager that will use this filter when adding entries to the database.

Return Values

Returns an SqlDuplicateFilter that can then be configured for duplicate entries.

Change Log

Version	Description
5.0	Available for professional and enterprise editions.

Examples

Register a new duplicate filter

Match Duplicates across tables

// Import classes
import com.screenscraper.datamanager.sql.*;

//Get the data manager
SqlDataManager dm = session.getVariable( "_DATAMANAGER" );

// Register a new duplicate filter
// Check for duplicate people, so register it for the people table
SqlDuplicateFilter personFilter = SqlDuplicateFilter.register("people", dm);

// Catch duplicates when a new entry has the same first name, last name, and phone number as another entry
// Note that phone is a child table of people
personFilter.addConstraint( "people", "first_name" );
personFilter.addConstraint( "people", "last_name" );
personFilter.addConstraint( "phone", "phone_number" );

XmlWriter

Overview

Oftentimes you want to write extracted data directly to an XML file. This class facilitates doing that. Before working with the methods below, you may wish to read our documentation about writing extracted data to XML, which contains examples of scripts that utilize these methods.

This feature is only available to Enterprise editions of screen-scraper.

XmlWriter

XmlWriter XmlWriter ( String fileName, String rootElementName ) (enterprise edition only)
XmlWriter XmlWriter ( String fileName, String rootElementName, String rootElementText ) (enterprise edition only)
XmlWriter XmlWriter ( String fileName, String rootElementName, String rootElementText, Hashtable attributes ) (enterprise edition only)
XmlWriter XmlWriter ( String fileName, String rootElementName, String rootElementText, Hashtable attributes, String characterSet ) (enterprise edition only)

Description

Initiate a XmlWriter object.

Parameters

fileName The file path where the file will be created, as a string.
rootElementName The root element's name in the XML file, as a string.
rootElementText (optional) Any text to be added inside of the root node, as a string.
attributes (optional) Hashtable of attribute names and their associated values, for the root node.

Return Values

Returns a XmlWriter. If an error is experienced it will be thrown.

Change Log

Version	Description
4.5	Available for enterprise edition.
5.5.3a	Added the constructor that takes a character set.

Class Location

com.screenscraper.xml.XmlWriter

Examples

Create an XmlWriter

// Import package
import com.screenscraper.xml.*;

// Create XmlWriter
xmlWriter = new XmlWriter( "C:/students.xml", "students" );

addElement

Element XmlWriter.addElement ( String name ) (enterprise edition only)
Element XmlWriter.addElement ( String name, String text ) (enterprise edition only)
Element XmlWriter.addElement ( String name, String text, Hashtable attributes ) (enterprise edition only)
Element XmlWriter.addElement ( Element elementToAppendTo, String name ) (enterprise edition only)
Element XmlWriter.addElement ( Element elementToAppendTo, String name, String text ) (enterprise edition only)
Element XmlWriter.addElement ( Element elementToAppendTo, String name, String text, Hashtable attributes ) (enterprise edition only)

Description

Add a node to the XML file.

Parameters

elementToAppendTo (optional) The XmlElement to which the node is being appended.
name The element's name, as a string.
text (optional) Any text to be added inside of the node, as a string.
attributes (optional) Hashtable of attribute names and their associated values, for the node.

Return Values

Returns the added element object.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Add Nodes to XML File

// Import package
import com.screenscraper.xml.*;

// Create XmlWriter
xmlWriter = new XmlWriter( "C:/students.xml", "students" );

// Add Student Node
student = xmlWriter.addElement( "student" );

// Add Name Node Under the Student
address = xmlWriter.addElement( student, "name", "John Smith" );

// Close XmlWriter
xmlWriter.close();

addElements

Element XmlWriter.addElements ( Element elementToAppendTo, String name, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( Element elementToAppendTo, String name, String text, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( Element elementToAppendTo, String name, String text, Hashtable attributes, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( String name, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( String name, String text, Hashtable subElements ) (enterprise edition only)
Element XmlWriter.addElements ( String name, String text, Hashtable attributes, Hashtable subElements ) (enterprise edition only)
void XmlWriter.addElements ( String containingTagName, DataSet dataSet ) (enterprise edition only)
void XmlWriter.addElements ( String containingTagName, String containingTagText, DataSet dataSet ) (enterprise edition only)
void XmlWriter.addElements ( String containingTagName, String containingTagText, Hashtable attributes, DataSet dataSet ) (enterprise edition only)

Description

Add multiple nodes under a single node (new or already in existence).

Parameters

elementToAppendTo The XmlElement to which the node is being appended.
name The element's name, as a string.
text (optional--pass in null to omit) Any text to be added inside of the node, as a string.
attributes (optional--pass in null to omit) Hashtable of attribute names and their associated values, for the node.
subElements (optional--pass in null to omit) Hashtable children nodes with node names as keys and text as values.

name The element's name, as a string.
text (optional--pass in null to omit) Any text to be added inside of the node, as a string.
attributes (optional--pass in null to omit) Hashtable of attribute names and their associated values, for the node.
subElements (optional--pass in null to omit) Hashtable children nodes with node names as keys and text as values.

containingTagName The element's name, as a string.
containingTagText (optional--pass in null to omit) Any text to be added inside of the containing node, as a string.
attributes (optional--pass in null to omit) Hashtable of attribute names and their associated values, for the node.
dataSet A dataSet object.

Return Values

Returns the main added element object, if one was created. It there was not a main element that was added then it returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Add Nodes to XML File

// Import package
import com.screenscraper.xml.*;
import java.util.Hashtable;

// Create XmlWriter
xmlWriter = new XmlWriter( "C:/students.xml", "students" );

// Student Information
info = new Hashtable();
info.put("name", "John Smith");
info.put("phone", "555-0135");
info.put("gender", "male");

// Add Student Node
student = xmlWriter.addElements( "student", info );

// Close XmlWriter
xmlWriter.close();

close

void XmlWriter.close ( ) (enterprise edition only)

Description

Close the XmlWriter.

Parameters

This method does not receive any parameters.

Return Values

Returns void.

Change Log

Version	Description
4.5	Available for enterprise edition.

Examples

Close XmlWriter

REST API

Overview

The REST API was first released in the stable version 5.0 (alpha 4.5.18a). It is not a true REST API but rather an API accessible via GET requests. But for the sake of naming we call it the screen-scraper REST API. It will allow you to issue web interface commands through GET requests.

Using REST API

The basic structure to all REST API requests is to specify the action GET parameter with what you want to do. Some actions will require other parameters to be set as well. Here are some available actions and their parameters.

For any of this to work screen-scraper has to be running in server mode.

This feature is only available to Enterprise editions of screen-scraper.

Change General Settings for Scraping Sessions

http://localhost:8779/ss/rest?action=save_settings&default_timeout=89&default_repeat_days=9&default_repeat_hours=8&default_repeat_minutes=7&default_repeat_seconds=6&default_threshold_time=4&default_threshold_record_count=3

default_timeout The number of minutes the scraping session is allowed to run before a request to stop is inserted.
default_repeat_days The number of days that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
default_repeat_hours The number of hours that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
default_repeat_minutes The number of minutes that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
default_repeat_seconds The number of seconds that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
default_threshold_time The percentage of time whereby two runs of a scraping session may differ without being flagged as a possible error.
default_threshold_record_count The percentage of records scraped whereby two runs of a scraping session may differ without being flagged as a possible error.

Disable/Enable Scheduled Scraping Session

http://localhost:8779/ss/rest?action=disable_enable_scheduled_scraping_session&scheduled_scraping_session_id=110&enable=false

scheduled_scraping_session_id The id of the scheduled scraping session. Omit this parameter or leave it blank if you want to generate a new scheduled scraping session.
enable Whether the scheduled scrape should be enabled (true) or disabled (false).

Get Memory Usage

http://localhost:8779/ss/rest?action=get_memory_usage

Get Runnable Scraping Sessions

http://localhost:8779/ss/rest?action=get_runnable_scraping_sessions

Get Scheduled Scrapes

http://localhost:8779/ss/rest?action=get_scheduled_scraping_sessions

Get Scrapeable Sessions

http://localhost:8779/ss/rest?action=get_scrapeable_sessions

Get Session Variable on Scraping Session

http://localhost:8779/ss/rest?action=get_session_variable_from_scrapeable_session&scrapeable_session_id=3&key=foo

scrapeable_session_id The id of the scrapeable session.
key The name of the session variable.

Import a File

http://localhost:8779/ss/importFile

This call is a bit different from the others in that it needs to be a multi-part POST request (i.e., a file upload) to the above URL, with a single parameter that is a file. The parameter name should be fileToImport. The uploaded file can be either an exported scraping session or script (i.e., a ".sss" file).

Peek at a Scraping Session Log

http://localhost:8779/ss/rest?action=peek_scrapeable_session_log&scrapeable_session_id=42&num_lines=50

scrapeable_session_id The id of the scraping session.
num_lines The number of lines to show up in the log peek.

Reload Settings

http://localhost:8779/ss/rest?action=reload_settings

Delete a Scraping Session

http://localhost:8779/ss/rest?action=remove_scraping_session&scraping_session_name=ScrapeName

scraping_session_name The name of the scraping session to delete from the server.

Remove a Completed or Running Scraping Session

http://localhost:8779/ss/rest?action=remove_scrapeable_session&scrapeable_session_id=29

scrapeable_session_id The id of the scraping session as returned when the scrape was launched.

Remove Scheduled Scraping Session

http://localhost:8779/ss/rest?action=remove_scheduled_scraping_session&scheduled_scraping_session_id=0

scheduled_scraping_session_id The id of the scheduled scraping session.

Run Scraping Session

http://localhost:8779/ss/rest?action=run_scraping_session&scraping_session_name=Shopping+Site&settable_session_variables=this%3Dthat%26foo%3Dbar

The returned file now contains the scrapeable_session_id of the scrape to ease in manipulating it with other REST Interface actions.

scraping_session_name The name of the scraping session to run.
settable_session_variables URL encoded parameters string of session variables.

Set Scheduled Scraping Session Settings

http://localhost:8779/ss/rest?action=set_scheduled_scraping_session&scheduled_scraping_session_id=3&scraping_session_name=Shopping+Site&timeout=123&schedule_date=08%2F20%2F2009&schedule_time=11:22:33&repeat_days=4&repeat_hours=3&repeat_minutes=2&repeat_seconds=1&threshold_time=21&threshold_record_count=43&settable_session_variables=this%3Dthat%26foo%3Dbar

scheduled_scraping_session_id The id of the scheduled scraping session. If this parameter is empty or omitted a new scheduled scraping session will be created.
scraping_session_name Name of the scraping session to be scheduled.
timeout The number of minutes the scraping session is allowed to run before a request to stop is inserted.
schedule_date The calendar date when the scraping session is to run next. It is in teh format of month/day/year (MM/DD/YYYY) and should be URL encoded (%2F instead of /)
schedule_time The time of day when the scraping session is to run. This should be a 24-hour (military) time.
repeat_days The number of days that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
repeat_hours The number of hours that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
repeat_minutes The number of minutes that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
repeat_seconds The number of seconds that should pass from the start of the scraping session until it runs again (added with all other repeat time settings).
threshold_time The percentage of time whereby two runs of a scraping session may differ without being flagged as a possible error.
threshold_record_count The percentage of records scraped whereby two runs of a scraping session may differ without being flagged as a possible error.
settable_session_variables URL encoded parameters string of session variables.

Set Session Variable on Scraping Session

http://localhost:8779/ss/rest?action=set_session_variable_on_scrapeable_session&scrapeable_session_id=3&key=foo&value=bap

scrapeable_session_id The id of the scrapeable session.
key The name of the session variable.
value The value to associate to the session variable.

Stop a Scraping Session

http://localhost:8779/ss/rest?action=stop_running_scraping_session&scrapeable_session_id=43

scrapeable_session_id The id of the running scraping session.

Stop all Scraping Sessions

http://localhost:8779/ss/rest?action=stop_all_running_scraping_session

Anonymization REST Interface

Controlling Anonymization Externally

All requests require that you pass your registered email address, which will be determined when you sign up for the anonymization service. This is passed as a URL-encoded string in the URL query string using the key registered_email. Your password will also be required, which is passed to the server via the password parameter.

Managing Anonymous Proxies

Each call to the server is done via a GET request. The possible requests are described below:

Expect an average delay of around 20 seconds before receiving a response from the system for reach request made.

Update anonymization settings

https://www.screen-scraper.com/screen-scraper/proxy/update_settings/?registered_email=foo%40bar.com&password=mypass&ip_addresses_allowed_to_connect=123.45.67.89%2C98.75.54.321&max_running_proxies=5

ip_addresses_allowed_to_connect: This is a URL-encoded comma-delimited list of IP addresses that should be allowed to connect to the HTTP proxy servers.
max_running_proxies: The maximum number of proxies that should be allowed. It's important that this be set, as lag between terminating and spawning proxies could otherwise cause more proxies than desired to be spawned.

Get the current number of running proxies

https://www.screen-scraper.com/screen-scraper/proxy/get_num_proxies/?registered_email=foo%40bar.com&password=mypass

Get a current list of running proxies

https://www.screen-scraper.com/screen-scraper/proxy/get_current_proxies/?registered_email=foo%40bar.com&password=mypass

Here's an example of what would be returned from this request:
ec2-75-101-238-93.compute-1.amazonaws.com:3128 i-61955e08
ec2-75-131-250-53.compute-1.amazonaws.com:3128 i-6e955e07

Each proxy gets its own line. The host and port are given first, then a space character, then the instance ID.

You'll use the instance ID if you want to report a proxy as bad (so that it will be terminated and one will be spawned in its place).

Spawn proxies

https://www.screen-scraper.com/screen-scraper/proxy/spawn_instances/?registered_email=foo%40bar.com&password=mypass&num_instances=5

num_instances: The number of proxies to be spawned.

Terminate a single proxy and spawn a new one in its place

https://www.screen-scraper.com/screen-scraper/proxy/report_bad_proxy/?registered_email=foo%40bar.com&password=mypass&instance_id=i-2cfc3541

After terminating a proxy, it will take a minute or two to spawn one in its place. You'll want to query the server periodically in order to refresh your current pool of proxies.

Terminate all proxies

https://www.screen-scraper.com/screen-scraper/proxy/terminate_instances/?registered_email=foo%40bar.com&password=mypass

Alpha API

Overview

When writing scripts within screen-scraper, there are a number of objects and methods available to you. You can view the stable objects and classes available to scripts in the API section of our documentation. This sections only documents those methods that are in a current alpha release. You are welcome to use them but know that they are prone to change. We always work for backwards compatibility of stable features but with alpha features we will not guarantee compatibility until they appear in a stable version.

Alpha methods and objects are only available if you have screen-scraper upgrade to unstable versions. We don't guarantee that the methods will not change after their introduction if improvements are required, desired, or purposes change.

The examples are given using Interpreted Java as the scripting language. This is in accordance with the stable API.

scrapeableFile

Changes to scrapeableFile

scrapeableFile.applyXPathExpression

void scrapeableFile.applyXPathExpression ( String xPathExpression ) (professional and enterprise editions only)

Description

Applies an XPath expression to the current HTML response. If tidying the response failed this method will also fail.

Parameters

xPathExpression A valid XPath expression.

Return Values

An XmlNode. See example for usage.

Change Log

Version	Description
6.0.1a	Available in Professional and Enterprise editions.

Examples

Apply an XPath Expression

outputNode( node, indent )
{
// e.g., in the case of a <td> tag the getName call
// will return the text "td".
if( node.getName()!=null )
{
openTag = new StringBuffer();
openTag.append( indent + " <" + node.getName() );
// The getAttributes method returns an ArrayList of
// KeyValue objects.
for( iter = node.getAttributes().iterator(); iter.hasNext(); )
{
attribute = iter.next();
openTag.append( " " + attribute.getKey() + "=\"" + attribute.getValue() + "\"" );
}
openTag.append( ">" );
session.log( openTag.toString() );
}

// e.g., in the case of <td>foo</td> the getValue method
// call will return the text "foo".
if( node.getValue()!=null )
{
session.log( indent + " " + node.getValue() );
}

// getChildNodes returns an ArrayList of XmlNode objects.
for( iter = node.getChildNodes().iterator(); iter.hasNext(); )
{
childNode = iter.next();
outputNode( childNode, indent + "--" );
}

if( node.getName()!=null )
{
session.log( indent + " </" + node.getName() + ">" );
}
}

// Match all <td> tags.
node = scrapeableFile.applyXPathExpression( "//td" );

// Note that there is an equivalent sutil
// method call: sutil.applyXPathExpression( String content, String expression )
// If the content parameter doesn't contain a well-formed
// block of XML an exception will be thrown.

outputNode( node, "" );

API

Overview

screen-scraper Object APIs

Java Libraries/Classes of Note

Other screen-scraper APIs

Scraping Engine API

Overview

Objects

dataRecord

Overview

DataRecord

Description

Parameters

Return Values

Change Log

Class Location

Examples

Create New DataRecord

get

Description

Parameters

Return Values

Change Log

Examples

Retrieve DataRecord Information

put

Description

Parameters

Return Values

Change Log

Examples

Add/Change DataRecord Field

remove

Description

Parameters

Return Values

Change Log

Examples

Add/Change DataRecord Field

dataSet

Overview

DataSet

Description

Parameters

Return Values

Change Log

Class Location

Examples

Manually Create DataSet

Create DataSet from Array List

addDataRecord

Description

Parameters

Return Values

Change Log

Examples

Add Data Record to DataSet

See Also

clearDataRecords

Description

Parameters

Return Values

Change Log

Examples

Remove DataRecords from DataSet

See Also

deleteDataRecord

Description

Parameters

Return Values

Change Log

Examples

Remove one DataRecords from DataSet

See Also

findValue

Description

Parameters

Return Values

Change Log

Examples