![]() |
Scripting in screen-scraper |
![]() |
Using Session Variables |
Overview
Session variables allow you to persist values across the life of a scraping session.
Setting session variables
There are a few different ways to set session variables. The first is within a script using the session.setVariable( String identifier, Object value ) method. A second is to designate that the value extracted by a specific token in an extractor pattern should be saved in a session variable (see using extractor patterns for more on this). Third, session variables can be set when using RemoteScrapingSession objects from external sources (such as a PHP or ASP script) via their setVariable methods.
Retrieving values from session variables
As with setting session variables, there are two ways to retrieve values of session variables. The first is within a script using the session.getVariable( String identifier ) method. The second is to embed the identifier for the session variable, surrounded by ~# and #~ delimeters. For example, if you have a session variable identified by QUERY_PARAM you might embed it into the URL field of a scrapeable file like this:
http://www.mydomain.com/myscript.php?query=~#QUERY_PARAM#~
screen-scraper will automatically replace the ~#QUERY_PARAM#~ text with the actual value of the corresponding session variable.
From here:
![]() |
Scripting in Interpreted Java |
screen-scraper uses the BeanShell library to allow for scripting in Java. If you've done some programming in C or JavaScript you'll probably find BeanShell's syntax familiar. Documentation for BeanShell is excellent, and we'd recommend referring to it as you program.
See the using scripts page for details on objects and methods that you can make use of in a script.
Remember that you can access external Java libraries by placing .jar files inside the "ext" directory found in the "lib" folder of your screen-scraper installation. You will need to use at least Java version 1.5.
From here:
![]() |
Scripting in VBScript |
If you've programmed in Visual Basic or Active Server Pages you should find scripting in screen-scraper to be similar. Using VBScript within screen-scraper can only be done on a Windows platform, and requires that the VBScript runtime be installed. The chances are good that you've already got the VBScript runtime on your system, but if not you can download it from Microsoft's Script Downloads page. screen-scraper will automatically detect if the VBScript runtime is installed, which you can see by selecting a script within screen-scraper (from the tree on the left of the application) and clicking on the "Language" drop-down list. If you don't see "VBScript" in the list then the runtime needs to be installed.
Please be aware that because of a bug in the third-party library that allows screen-scraper to integrate with the Microsoft Scripting Engine problems can occur if multiple VBScript scripts are run simultaneously. If you're using the professional edition of screen-scraper and plan on running multiple scraping sessions simultaneously you should use Interpreted Java, JavaScript, or Python as a scripting language.
Because screen-scraper uses the native VBScript engine, all Active X objects installed on the computer (such as ADO or the FileSystemObject) can be accessed. Additionally, all of the objects mentioned on the Using scripts page are also available.
Java classes can also be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession for the "Weather" scraping session (which is found in the default screen-scraper installation) and run it:
' Generate a new "Weather" scraping session.
Set runnableScrapingSession = CreateBean( "com.screenscraper.scraper.RunnableScrapingSession", "Weather" )
' Put the zip code in a session variable so we can reference it later.
runnableScrapingSession.SetVariable "ZIP_CODE", "90001"
' Tell the scraping session to scrape.
runnableScrapingSession.Scrape
From here:
![]() |
Scripting in JavaScript |
Mozilla's Rhino scripting engine is used by screen-scraper to allow for scripts to be written in JavaScript. Documentation for Rhino is sparse, but the interpreter does adhere strictly to the established ECMAScript standard, so just about any reference on JavaScript could be referred to. If you try writing scripts using JavaScript, and run into difficulties (because of lack of documentation), you may want to consider using Interpreted Java instead, which has very similar syntax and provides significantly better documentation.
If you've worked with client-side JavaScript in web programming, you'll probably be comfortable using JavaScript in screen-scraper. One "gotcha" to be aware of is the method for using external classes. If you'd like to reference a class in the standard Java library, you'd do it like this:
// Declare an ArrayList.
var myArrayList = new java.util.ArrayList();
// Add two elements.
myArrayList.add( "one" );
myArrayList.add( "two" );
// Log the size.
session.log( "Size: " + myArrayList.size() );
However, packages outside of the standard Java library must be prefaced with the "Packages" keyword. Here's an example of creating and using a DataRecord object:
// Declare a new DataRecord object.
var myDR = new Packages.com.screenscraper.common.DataRecord();
// Give it a key/value pair.
myDR.put( "foo", "bar" );
// Log the value of the key.
session.log( "foo: " + myDR.get( "foo" ) );
From here:
![]() |
Scripting in JScript |
Writing scripts in JScript gives you the familiarity of a widely used language, while still providing access to commonly useed Windows libraries. Using JScript within screen-scraper can only be done on a Windows platform, and requires that the JScript runtime be installed. The chances are good that you've already got the JScript runtime on your system, but if not you can download it from Microsoft's Script Downloads page. screen-scraper will automatically detect if the JScript runtime is installed, which you can see by selecting a script within screen-scraper (from the tree on the left of the application) and clicking on the "Language" drop-down list. If you don't see "JScript" in the list then the runtime needs to be installed.
Please be aware that because of a bug in the third-party library that allows screen-scraper to integrate with the Microsoft Scripting Engine problems can occur if multiple VBScript scripts are run simultaneously. If you're using the professional edition of screen-scraper and plan on running multiple scraping sessions simultaneously you should use Interpreted Java, JavaScript, or Python as a scripting language.
Because screen-scraper uses the native JScript engine, all Active X objects installed on the computer (such as ADO or the FileSystemObject) can be accessed. Additionally, all of the objects mentioned on the Using scripts page are also available.
Java classes can also be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession for the "Weather" scraping session (which is found in the default screen-scraper installation) and run it:
// Generate a new "Weather" scraping session.
var runnableScrapingSession = CreateBean( "com.screenscraper.scraper.RunnableScrapingSession", "Weather" );
// Put the zip code in a session variable so we can reference it later.
runnableScrapingSession.setVariable( "ZIP_CODE", "90001" );
// Tell the scraping session to scrape.
runnableScrapingSession.scrape();
From here:
![]() |
Scripting in Perl |
screen-scraper uses ActiveState's ActivePerl library to allow for scripts to be written in Perl. Using Perl within screen-scraper can only be done on a Windows platform, and requires that the ActivePerl runtime be installed, which can be downloaded from ActiveState's download page for free. screen-scraper will automatically detect if the ActivePerl runtime is installed, which you can see by selecting a script within screen-scraper (from the tree on the left of the application) and clicking on the "Language" drop-down list. If you don't see "Perl" in the list then the runtime needs to be installed.
Java classes can be instantiated within a script using the CreateBean function. For example, the following script will instantiate a RunnableScrapingSession for the "Weather" scraping session (which is found in the default screen-scraper installation) and run it:
# Generate a new "Weather" scraping session.
$runnableScrapingSession = CreateBean( "com.screenscraper.scraper.RunnableScrapingSession", "Weather" );
# Put the zip code in a session variable so we can reference it later.
$runnableScrapingSession->setVariable( "ZIP_CODE", "90001" );
# Tell the scraping session to scrape.
$runnableScrapingSession->scrape();
From here:
![]() |
Scripting in Python |
The Jython interpreter is used by screen-scraper to allow for scripting in Python. Jython is a very fast interpreter, and we'd recommend using it if you're familiar with the Python programming language.
When scripting in Python all of the standard Java classes can be used. Classes must be imported using a special directive, which is also required if you'd like to create one of screen-scraper's RunnableScrapingSession objects. Here's an example that will run the "Weather" scraping session (which is found in the default screen-scraper installation):
# Import the RunnableScrapingSession class.
from com.screenscraper.scraper import RunnableScrapingSession
# Generate a new "Weather" scraping session.
runnableScrapingSession = RunnableScrapingSession( "Weather" )
# Put the zip code in a session variable so we can reference it later.
runnableScrapingSession.setVariable( "ZIP_CODE", "90001" )
# Tell the scraping session to scrape.
runnableScrapingSession.scrape()
Notice that before the RunnableScrapingSession class can be used it first must be imported.
From here:
![]() |
Writing extracted data to XML (enterprise edition only) |
Overview
Oftentimes once you've extracted data from a page you'll want to write it out to an XML file. screen-scraper contains a special XmlWriter class that makes this a snap.
To use the XmlWriter class you'll generally follow these steps:
The trickiest part is understanding which of the various addElement and addElements methods to call.
Examples
If you're scripting in Interpreted Java, the script in step 1 might look something like this:
|
// Create an instance of the XmlWriter class. |
In subsequent scripts, you can get a reference to that same XmlWriter object like this:
|
|
You could then add elements and such to the XML file. The following three examples demonstrate the various ways to go about that. Each of the scripts are self-contained in that they create, add to, then close the XmlWriter object. Bear in mind that this process could be spread across multiple scripts, as described above.
Example 1
|
// Import the class we'll need. |
This script would produce the following XML file:
|
<simple-root> |
Example 2
|
// Import the classes we'll need. |
This script would produce the following XML file:
|
<difficult-root> |
Example 3
|
// Import the classes we'll need. |
This script would produce the following XML file:
|
<complex-root attrib3="3" attrib2="2" attrib1="1"> |
From here:
Related stuff: