Generate Scrapeable File
Creating the Scraping Session
To this point we have gathered information on how the pages we will be scraping work. Now we're ready to start creating the scrape. For all scrapes, we start by creating a scraping session. A scraping session is simply a container for all of the files and other objects that will allow us to extract data from a given web site.
To create a scraping session, either click the (Add a new scraping session) button or click on the New Scraping Session in the .
When scraping session appears rename it to Hello World using the Name textbox.
If you imported the scraping session at the beginning of the tutorial you'll want to name it something else, perhaps My Hello World.

Generating Scrapeable Files from Proxy Transactions
Now return to our Hello World proxy session by clicking on Hello World in the objects tree on the left. Click on the Progress tab to view our HTTP transactions from earlier.
Any of the transactions in the table can be made into scrapeable files in our scrape. In this case, we are interested in the Form Submission transaction. Click on it so that its information loads in the Request tab.
To create a scrapeable file from this transaction you just need to select the scraping session that you want the file to be created in. In the Generate scrapeable file in drop-down, select Hello World then click the Go button. If you click back to your Hello World scraping session, you will now see a Form Submission scrapeable file under it in the objects tree.
If you do not see the (scrapeable file) in the tree you might need to uncollapse the scraping session by clicking on the arrow just to the left of it.
The new scrapeable file contains all the information in the HTTP transaction.

Test Run
Just to make sure things are correctly setup let's run a quick test. For this test we are going to run the scraping session.
To start the scrape, click on the Hello World scraping session in the objects tree, then click the Run Scraping Session button. This will start the scraping session and transition you to the Log tab. It should just take a moment to run, after which the log should show the following:
Running scraping session: Hello World
Processing scripts before scraping session begins.
Scraping file: "Form Submission"
Form Submission: Preliminary URL:http://www.screen-scraper.com/tutorial/basic_form.php
Form Submission: Using strict mode.
Form Submission: Resolved URL:http://www.screen-scraper.com/tutorial/basic_form.php?text_string=Hello+World%21
Form Submission: Sending request.
Processing scripts after scraping session has ended.
Scraping session "Hello World" finished.
The log is an invaluable tool in debugging scraping sessions. We encourage you to use it often. In this case it shows that screen-scraper requested the only scrapeable file in our scraping session (Form submission).
Viewing the Scrapeable File Response
You can view the text of the file that was scraped by clicking on the Form submission scrapeable file in the objects tree, then on the Last Response tab. This will show the whole of the HTTP response that the server sent back to screen-scraper. You can view what the page looks like when it is rendered by clicking the Display Response in Browser button. It's often helpful to view the last response for a scrapeable file after running a scraping session so that you can ensure that screen-scraper requested the right page.
A good principle of software design is to run code often as you make changes. Likewise, with screen-scraper it is a good idea to run your scraping session frequently and watch the log and last responses to ensure that things are working as you intend them to.
Saving Your Scrapes
Now would be a good time to save your work. Click the (Save) icon or select Save from the .
- Printer-friendly version
- Login or register to post comments
