1: Overview
How it works
In many ways working with screen-scraper is like working with a database, such as mySQL or SQL Server. With databases, you'll generally use an interface (often a graphical interface) to create objects such as tables, columns, and indexes. Once you've set up the database you'll often write programming code to populate it with data as well as to pull information from it. Likewise with screen-scraper you'll use its graphical user interface to create objects needed to extract information from web sites. Once you've set up these objects you'll write programming code to interact with screen-scraper and make use of the data it extracts.
Extracting information from web sites using screen-scraper typically involves four main steps:
- Use the proxy server to determine the exact files that need to be requested in order to get the information you're after.
- Create a scraping session with scrapeable files that define the sequence of pages screen-scraper will request.
- Generate extractor patterns to define the exact information you need screen-scraper to grab from each page.
- Write small scripts or programming code to invoke screen-scraper and/or work with the data it extracts.
scraper on 07/16/2010 at 4:23 pm
- Printer-friendly version
- Login or register to post comments
