Extractor Patterns

Overview

Extractor patterns allow you to pinpoint snippets of data that you want extracted from a web page. They are made up of text (usually HTML), extractor tokens, and possibly even session variables. The text and session variables give context to the tokens that represent the data that you want to extract from the page.

Extractor patterns can be difficult to understand at first. We recommend that you read about using extractor patterns or go through our first tutorial before continuing.

Managing Extractor Patterns

When creating extractor patterns you should use the HTML that will be found under the last response tab associated with a scrapeable file. By default, screen-scraper will tidy the HTML once it's been scraped, meaning that it will format it in a consistent way that makes it easier to work with. If you use the HTML by viewing the source for a page in your web browser it will likely be different from the HTML that screen-scraper generates.

Adding

  • Click the Add Extractor Pattern button in the extractor patterns tab of the scrapeable file
  • Select desired text in the last response tab of the scrapeable file, right click and select Generate extractor pattern from selected text.

Removing

  • Click the Delete on the desired extractor pattern.