Invoking screen-scraper from Python

Overview

A Python script interacts with screen-scraper via a Python class called RemoteScrapingSession. You can utilize this class by importing the module remote_scraping_session.py (found in the misc/python directory of your screen-scraper installation) within your Python script.

screen-scraper needs to be running as a server before invoking screen-scraper from a Python script.

Methods

The following is a reference for all of the methods found in the RemoteScrapingSession class.

  • initialize( name ). Initializes a >RemoteScrapingSession identified by name. If this constructor is called the default host (localhost) and port (8778) will be used.

    session.initialize( "Shopping Site" )

  • initialize( name, host, port ). Instantiates a RemoteScrapingSession identified by name, and connecting to the server found at host listening on port.

    session.initialize( "Shopping Site", "192.168.0.5", 8778 )

  • setVariable( var_name, value ). Sets a session variable using the given var_name and value.

    session.setVariable( "SEARCH", search_term )

  • scrape(). Causes the session to scrape. This is equivalent to clicking the Run Scraping Session button from within screen-scraper on the General tab of a scraping session.

    session.scrape()

  • getVariable( var_name ). Gets the value of a session variable that was set during the course of the scraping session. If the object identified by var_name is a data record an associative array will be returned. If the object identified by var_name is a data set a two-dimensional ordinal array of associative arrays will be returned (see our fourth tutorial for an illustration of this).

    Currently only Strings, DataRecords, and DataSets can be accessed by this method.

    data_set = session.getVariable( "PRODUCTS" )

  • setBufferSize( buffer_size ). Explicitly sets the size of the buffer (in bytes) that will be used when reading data from screen-scraper. The default buffer size is 1024 bytes, so if you're anticipating a large amount of data (such as when receiving a full data set) you'll want to increase this value.

    session.setBufferSize( 64000 )

  • resetBufferSize(). Resets the size of the buffer back to its default size of 1024 bytes.

    session.resetBufferSize( )

  • isError(). Indicates whether or not an error has occurred in the scraping process.

    session.isError()

  • getErrorMessage(). Returns the last error message returned from the server, if one was returned.

    session.getErrorMessage()

  • disconnect(). Disconnects from the remote server. This should be called once a scraping session is complete so that system resources can be freed up.

    session.disconnect()

  • getNumDataRecordsInDataSet( data_set_name ). Returns the number of data records found in the data set named by data_set_name.

    session.getNumDataRecordsInDataSet( "PRODUCTS" )

  • getDataRecordFromDataSet( data_set_name, index ). Returns a single data record (a hash array) from the data set named by data_set_name at the given index.

    session.getNumDataRecordsInDataSet( "PRODUCTS", 2 )

  • setDoLazyScrape( doLazyScrape ). Indicates whether or not a scraping session should be run in a separate thread. By default this value is false.

    Calling this method will only have an effect if it's done before calling the scrape method. If this value is set to true, after the scrape method is called, program flow will return immediately, but the scraping session will still be run by screen-scraper.

    session.setDoLazyScrape( true )

Examples

For an example of using the Python driver please see Tutorial 4: Scraping a Shopping Site from External Programs.