Scraping Engine API

Overview

The scraping engine is the backbone of screen-scraper and provides four built-in objects. These objects are: session, scrapeableFile, dataSet, and dataRecord. We have also included the RunnableScrapingSession class as it best pertains to the engine.

For details on which objects are available to scripts in the context of a scrape see the variable scope section of the documentation.

Objects

  • dataRecord: This gives access to the most recently extracted data record. This will most likely only be used in scripts that get accessed after each time an extractor pattern is applied. This object simply extends Hashtable, and documentation on the Hashtable's methods can be found in Java's documentation.

    The dataRecord object is populated using the names of tokens from extractor patterns.

  • dataSet: The dataSet object holds all data records extracted by an extractor pattern after it has been applied as many times as possible to the HTML retrieved by a scrapeable file. A data set is analogous to a result or record set that would be returned from a database query. A data set contains any number of data records, which are analogous to rows in a database.
  • log: Methods used for logging information.
  • RunnableScrapingSession (com.screenscraper.scraper.RunnableScrapingSession): This is a class that can be instantiated within a script in order to run a scraping session. The Maximum number of concurrent running scraping sessions in the settings dialog box will control how many scraping sessions can be run simultaneously.
  • scrapeableFile: This refers to the scrapeable file that is currently being requested and analyzed.
  • session: This variable refers to the currently running scraping session.
  • sutil: General methods for checking and manipulating data.