Scripting in screen-scraper

Scripting in screen-scraper

Using Session Variables

Using Session Variables

Overview

Session variables allow you to persist values across the life of a scraping session.

Setting session variables

There are a few different ways to set session variables. The first is within a script using the session.setVariable( String identifier, Object value ) method. A second is to designate that the value extracted by a specific token in an extractor pattern should be saved in a session variable (see using extractor patterns for more on this). Third, session variables can be set when using RemoteScrapingSession objects from external sources (such as a PHP or ASP script) via their setVariable methods.

Retrieving values from session variables

As with setting session variables, there are two ways to retrieve values of session variables. The first is within a script using the session.getVariable( String identifier ) method. The second is to embed the identifier for the session variable, surrounded by ~# and #~ delimeters. For example, if you have a session variable identified by QUERY_PARAM you might embed it into the URL field of a scrapeable file like this:

http://www.mydomain.com/myscript.php?query=~#QUERY_PARAM#~

screen-scraper will automatically replace the ~#QUERY_PARAM#~ text with the actual value of the corresponding session variable.


From here:

Scripting in Interpreted Java

Scripting in Interpreted Java

screen-scraper uses the BeanShell library to allow for scripting in Java. If you've done some programming in C or JavaScript you'll probably find BeanShell's syntax familiar. Documentation for BeanShell is excellent, and we'd recommend referring to it as you program.

See the using scripts page for details on objects and methods that you can make use of in a script.

Remember that you can access external Java libraries by placing .jar files inside the "ext" directory found in the "lib" folder of your screen-scraper installation. You will need to use at least Java version 1.5.


From here:

Scripting in VBScript

Scripting in VBScript

If you've programmed in Visual Basic or Active Server Pages you should find scripting in screen-scraper to be similar. Using VBScript within screen-scraper can only be done on a Windows platform, and requires that the VBScript runtime be installed. The chances are good that you've already got the VBScript runtime on your system, but if not you can download it from Microsoft's Script Downloads page. screen-scraper will automatically detect if the VBScript runtime is installed, which you can see by selecting a script within screen-scraper (from the tree on the left of the application) and clicking on the "Language" drop-down list. If you don't see "VBScript" in the list then the runtime needs to be installed.

Please be aware that because of a bug in the third-party library that allows screen-scraper to integrate with the Microsoft Scripting Engine problems can occur if multiple VBScript scripts are run simultaneously. If you're using the professional edition of screen-scraper and plan on running multiple scraping sessions simultaneously you should use Interpreted Java, JavaScript, or Python as a scripting language.

Because screen-scraper uses the native VBScript engine, all Active X objects installed on the computer (such as ADO or the FileSystemObject) can be accessed. Additionally, all of the objects mentioned on the Using scripts page are also available.

Java classes can also be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession for the "Weather" scraping session (which is found in the default screen-scraper installation) and run it:

' Generate a new "Weather" scraping session.
Set runnableScrapingSession = CreateBean( "com.screenscraper.scraper.RunnableScrapingSession", "Weather" )

' Put the zip code in a session variable so we can reference it later.
runnableScrapingSession.SetVariable "ZIP_CODE", "90001"

' Tell the scraping session to scrape.
runnableScrapingSession.Scrape


From here:

Scripting in JavaScript

Scripting in JavaScript

Mozilla's Rhino scripting engine is used by screen-scraper to allow for scripts to be written in JavaScript. Documentation for Rhino is sparse, but the interpreter does adhere strictly to the established ECMAScript standard, so just about any reference on JavaScript could be referred to. If you try writing scripts using JavaScript, and run into difficulties (because of lack of documentation), you may want to consider using Interpreted Java instead, which has very similar syntax and provides significantly better documentation.

If you've worked with client-side JavaScript in web programming, you'll probably be comfortable using JavaScript in screen-scraper. One "gotcha" to be aware of is the method for using external classes. If you'd like to reference a class in the standard Java library, you'd do it like this:

// Declare an ArrayList.
var myArrayList = new java.util.ArrayList();

// Add two elements.
myArrayList.add( "one" );
myArrayList.add( "two" );

// Log the size.
session.log( "Size: " + myArrayList.size() );

However, packages outside of the standard Java library must be prefaced with the "Packages" keyword. Here's an example of creating and using a DataRecord object:

// Declare a new DataRecord object.
var myDR = new Packages.com.screenscraper.common.DataRecord();

// Give it a key/value pair.
myDR.put( "foo", "bar" );

// Log the value of the key.
session.log( "foo: " + myDR.get( "foo" ) );


From here:

Scripting in JScript

Scripting in JScript

Writing scripts in JScript gives you the familiarity of a widely used language, while still providing access to commonly useed Windows libraries. Using JScript within screen-scraper can only be done on a Windows platform, and requires that the JScript runtime be installed. The chances are good that you've already got the JScript runtime on your system, but if not you can download it from Microsoft's Script Downloads page. screen-scraper will automatically detect if the JScript runtime is installed, which you can see by selecting a script within screen-scraper (from the tree on the left of the application) and clicking on the "Language" drop-down list. If you don't see "JScript" in the list then the runtime needs to be installed.

Please be aware that because of a bug in the third-party library that allows screen-scraper to integrate with the Microsoft Scripting Engine problems can occur if multiple VBScript scripts are run simultaneously. If you're using the professional edition of screen-scraper and plan on running multiple scraping sessions simultaneously you should use Interpreted Java, JavaScript, or Python as a scripting language.

Because screen-scraper uses the native JScript engine, all Active X objects installed on the computer (such as ADO or the FileSystemObject) can be accessed. Additionally, all of the objects mentioned on the Using scripts page are also available.

Java classes can also be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession for the "Weather" scraping session (which is found in the default screen-scraper installation) and run it:

// Generate a new "Weather" scraping session.
var runnableScrapingSession = CreateBean( "com.screenscraper.scraper.RunnableScrapingSession", "Weather" );

// Put the zip code in a session variable so we can reference it later.
runnableScrapingSession.setVariable( "ZIP_CODE", "90001" );

// Tell the scraping session to scrape.
runnableScrapingSession.scrape();


From here:

Scripting in Perl

Scripting in Perl

screen-scraper uses ActiveState's ActivePerl library to allow for scripts to be written in Perl. Using Perl within screen-scraper can only be done on a Windows platform, and requires that the ActivePerl runtime be installed, which can be downloaded from ActiveState's download page for free. screen-scraper will automatically detect if the ActivePerl runtime is installed, which you can see by selecting a script within screen-scraper (from the tree on the left of the application) and clicking on the "Language" drop-down list. If you don't see "Perl" in the list then the runtime needs to be installed.

Java classes can be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession for the "Weather" scraping session (which is found in the default screen-scraper installation) and run it:

# Generate a new "Weather" scraping session.
$runnableScrapingSession = CreateBean( "com.screenscraper.scraper.RunnableScrapingSession", "Weather" );

# Put the zip code in a session variable so we can reference it later.
$runnableScrapingSession->setVariable( "ZIP_CODE", "90001" );

# Tell the scraping session to scrape.
$runnableScrapingSession->scrape();


From here:

Scripting in Python

Scripting in Python

The Jython interpreter is used by screen-scraper to allow for scripting in Python. Jython is a very fast interpreter, and we'd recommend using it if you're familiar with the Python programming language.

When scripting in Python all of the standard Java classes can be used. Classes must be imported using a special directive, which is also required if you'd like to create one of screen-scraper's RunnableScrapingSession objects. Here's an example that will run the "Weather" scraping session (which is found in the default screen-scraper installation):

# Import the RunnableScrapingSession class.
from com.screenscraper.scraper import RunnableScrapingSession

# Generate a new "Weather" scraping session.
runnableScrapingSession = RunnableScrapingSession( "Weather" )

# Put the zip code in a session variable so we can reference it later.
runnableScrapingSession.setVariable( "ZIP_CODE", "90001" )

# Tell the scraping session to scrape.
runnableScrapingSession.scrape()

Notice that before the RunnableScrapingSession class can be used it first must be imported.


From here:

Writing extracted data to XML (enterprise edition only)

Writing extracted data to XML (enterprise edition only)

Overview

Oftentimes once you've extracted data from a page you'll want to write it out to an XML file. screen-scraper contains a special XmlWriter class that makes this a snap.

To use the XmlWriter class you'll generally follow these steps:

  1. Create an instance of XmlWriter in a script, storing it in a session variable.
  2. Extract data.
  3. In a script, get a reference to the XmlWriter object stored in step one, then call addElement or addElements to write out XML nodes.
  4. Repeat steps 2 and 3 as many times as you'd like.
  5. In a script, get a reference to the XmlWriter class, then call the close method on it.

The trickiest part is understanding which of the various addElement and addElements methods to call.

Examples

If you're scripting in Interpreted Java, the script in step 1 might look something like this:

// Create an instance of the XmlWriter class.
// Note the forward slash (as opposed to a back slash after
// the "C:". This is a more Java-friendly way of handling the
// directory delimiter.
xmlWriter = new com.screenscraper.xml.XmlWriter( "C:/my_xml_file.xml", "root_element", "This is the root element" );

// Save the XmlWriter object in a session variable.
session.setVariable( "XML_WRITER", xmlWriter );

In subsequent scripts, you can get a reference to that same XmlWriter object like this:

xmlWriter = session.getVariable( "XML_WRITER" );

You could then add elements and such to the XML file. The following three examples demonstrate the various ways to go about that. Each of the scripts are self-contained in that they create, add to, then close the XmlWriter object. Bear in mind that this process could be spread across multiple scripts, as described above.

Example 1

// Import the class we'll need.
import com.screenscraper.xml.XmlWriter;

// Instantiate a writer with a root node named "simple-root".
XmlWriter xmlWriter = new XmlWriter("./simple.xml", "simple-root");

// Create four identical tags with different inner text.
for (int i = 0; i < 4; i++) {
// Appends to root element.  No attributes.
xmlWriter.addElement( "one child", Integer.toString(i) );
}

// Close up the XML file.
xmlWriter.close();

This script would produce the following XML file:

<simple-root>
<one_child>0</one_child>
<one_child>1</one_child>
<one_child>2</one_child>
<one_child>3</one_child>
</simple-root>

Example 2

// Import the classes we'll need.
import java.util.Hashtable;
import com.screenscraper.xml.XmlWriter;

// First set up the various attributes.
Hashtable attributes = new Hashtable();
attributes.put("attrib1", "1");
attributes.put("attrib2", "2");
attributes.put("attrib3", "3");

// These are the children we'll be adding.
Hashtable children = new Hashtable();
children.put("child1", "1");
children.put("child2", "2");
children.put("child3", "3");
children.put("child4", "4");
children.put("child5", "5");

// Instantiate a writer with a root node named "difficult-root".
XmlWriter xmlWriter = new XmlWriter("./difficult.xml", "difficult-root");

firstElement = xmlWriter.addElement("first child", "first child text", attributes);

// Add more info to the first element.
secondElement = xmlWriter.addElements(firstElement, "second child", "second child text", children);

// Add more elements to root.  This time add text, attributes, and children.
thirdElement = xmlWriter.addElements("third child", "third child text", attributes, children);

// Illegal Example: Cannot add elements to the second Element
// since it was closed when thirdElement was added to the root.
// fourth = xmlWriter.addElement(secondElement, "wrong");

// Adds hashtable to attributes.  Appends to root element.
fifth = xmlWriter.addElement("another", "test", attributes );

// Adds hashtable to children elements, appends to the fifth element.
sixth = xmlWriter.addElements(fifth, "other", "test2", children );

// Adds attributes and children.  Appends to the sixth element.
seventh = xmlWriter.addElements(sixth, "complex", "example", attributes, children);

// Adds hashtable to attributes with children.  Appends to root element.
eighth = xmlWriter.addElements("eight", "ocho", attributes, children );

// Close up the XML file.
xmlWriter.close();

This script would produce the following XML file:

<difficult-root>
<first_child attrib3="3" attrib2="2" attrib1="1">
first child text
<second_child>second child text
<child5>5</child5>
<child4>4</child4>
<child3>3</child3>
<child2>2</child2>
<child1>1</child1>
</second_child>
</first_child>
<third_child attrib3="3" attrib2="2" attrib1="1">
third child text
<child5>5</child5>
<child4>4</child4>
<child3>3</child3>
<child2>2</child2>
<child1>1</child1>
</third_child>
<another attrib3="3" attrib2="2" attrib1="1">
test
<other>
test2
<child5>5</child5>
<child4>4</child4>
<child3>3</child3>
<child2>2</child2>
<child1>1</child1>
<complex attrib3="3" attrib2="2" attrib1="1">
example
<child5>5</child5>
<child4>4</child4>
<child3>3</child3>
<child2>2</child2>
<child1>1</child1>
</complex>
</other>
</another>
<eight attrib3="3" attrib2="2" attrib1="1">
ocho
<child5>5</child5>
<child4>4</child4>
<child3>3</child3>
<child2>2</child2>
<child1>1</child1>
</eight>
</difficult-root>

Example 3

// Import the classes we'll need.
import java.util.Hashtable;
import com.screenscraper.xml.XmlWriter;

Hashtable attributes = new Hashtable();
attributes.put("attrib1", "1");
attributes.put("attrib2", "2");
attributes.put("attrib3", "3");

// Create a new file (complex.xml) with a root element
// of 'complex-root' and text 'complex text'.
XmlWriter xmlWriter = new XmlWriter("./complex.xml", "complex-root", "complex text", attributes);

DataSet dataSet = new DataSet();

DataRecord dataRecord = null;

// Create 5 datarecords with different data.
for (int i = 0; i &lt; 5; i++){
dataRecord = new DataRecord();

for (int j = 0; j &lt; 5; j++) {
dataRecord.put("tag" + Integer.toString(j), Integer.toString(i * j));
}

dataSet.addDataRecord(dataRecord);
}

// Writes the data set to xml.  The datarecords are surrounded by the tag
// defined by 'data set container'.  Notice that the tag automatically
// reformats to: data_set_container, since xml tag names cannot have spaces.
writer.addElements("data set container", dataSet);

// Must be called after all writing is done.  Will close the file and any
// open tags in the xml.
writer.close();

This script would produce the following XML file:

<complex-root attrib3="3" attrib2="2" attrib1="1">
complex text
<data_set_container>
<tag4>0</tag4>
<tag3>0</tag3>
<tag2>0</tag2>
<tag1>0</tag1>
<tag0>0</tag0>
</data_set_container>
<data_set_container>
<tag4>4</tag4>
<tag3>3</tag3>
<tag2>2</tag2>
<tag1>1</tag1>
<tag0>0</tag0>
</data_set_container>
<data_set_container>
<tag4>8</tag4>
<tag3>6</tag3>
<tag2>4</tag2>
<tag1>2</tag1>
<tag0>0</tag0>
</data_set_container>
<data_set_container>
<tag4>12</tag4>
<tag3>9</tag3>
<tag2>6</tag2>
<tag1>3</tag1>
<tag0>0</tag0>
</data_set_container>
<data_set_container>
<tag4>16</tag4>
<tag3>12</tag3>
<tag2>8</tag2>
<tag1>4</tag1>
<tag0>0</tag0>
</data_set_container>
</complex-root>


From here:

Related stuff: