Tutorial 1: Page 5: Generating a Scrapeable File

Generating a Scrapeable File

At this point we're ready to start creating the objects that screen-scraper will use to extract data from the page. We start by creating a scraping session. A scraping session is simply a container for all of the files and other objects that will allow us to extract data from a given web site. Either click the "New Scraping Session" button (looks like a gear) or click on the "File" menu, then select "New Scraping Session". After the scraping session appears rename it to "Hello World" (note that if you imported the scraping session at the beginning of the tutorial you'll want to name it something else--perhaps "My Hello World"). Your window should now look like this:



Now return back to our "Hello World" proxy session by clicking on it in the tree on the left (the one with the globe by it), then click on the "Progress" tab. Click on the second or last row in the "HTTP Transactions" table. In the lower pane make sure "Hello World" is selected from the drop-down list labeled "Generate scrapeable file in:", then click the "Go" button. A scrapeable file is a web page that contains information we're interested in extracting. First off, let's rename our scrapeable file "Form submission". Your screen should now look like this:



Just to make sure things are good so far let's run a quick test. Run the "Hello World" scraping session by clicking on it in the tree on the left, then clicking the "Run Scraping Session" button. Now click on the "Log" tab. It should just take a moment to run, after which the log should show the following:

Starting scraper.
Running scraping session: Hello World
Processing scripts before scraping session begins.
Scraping file: "Form Submission"
Form Submission: Preliminary URL: http://www.screen-scraper.com/tutorial/basic_form.php
Form
Submission: Using strict mode.
Form Submission: Resolved URL: http://www.screen-scraper.com/tutorial/basic_form.php?text_string=Hello+... Submission: Sending request.
Processing scripts after scraping session has ended.
Scraping session "Hello World" finished.

The log is an invaluable tool in debugging scraping sessions, which you'll want to use often. In this case it shows that screen-scraper requested the only scrapeable file in our scraping session ("Form submission"). You can view the text of the file that was scraped by clicking on "Form submission" in the tree on the left, then clicking the "Last Response" tab. Click the "Display Response in Browser" button to ensure that the page looks like the one in your browser (it may not look exactly like it, but should resemble it closely). It's often helpful to view the last response for a scrapeable file after running a scraping session so that you can ensure that screen-scraper requested the right page.

QUICK TIP!!!!
A good principle of software design is to run code often as you make changes. Likewise, with screen-scraper it is a good idea to run your scraping session frequently and watch the log and last responses to ensure that things are working as you intend them to.

Now would be a good time to save your work. Click the "Save" button (looks like a disk) or select the "Save" option from the "File" menu.