Scraping Engine

Description

The scripting engine requests files which it then parses, manipulates, and/or stores according to user defined processes. It is the heart of screen-scraper and has been optimized at all points of development to be as efficient as possible. It is made up of multiple parts which can be manipulated using the workbench:

  • Scraping Sessions: Series of scrapeable files, that screen-scraper will request in a designated sequence. It may also contain calls to scripts.
  • Scrapeable Files: File interaction (usually a webpage) that can be called from a scraping session or script. They may contain variables for file location or url parameters (GET or POST).
  • Extractor Patterns: Snippits used to gather information from scrapeable files. They contain any number of extractor tokens.
  • Extractor Tokens: Named regular expression matches used to indicate what information from the extractor pattern to make available to subsequent scripts and scrapeable files.
  • Scripts: User defined custom scripts that process the information gathered in the progression of the scrape.

The rest of this section contains information about using screen-scraper, through the workbench, to achieve different goals. These can be difficult to understand without some exposure to the software. That is why we would like to encourage you to go through our first few tutorials before continuing.