Tutorial 5: Page 3: Setting Up the Scraping Session

Setting Up the Scraping Session

We'll first modify our existing scraping session a bit to get it ready to save the scraped data to our database. First, click on the "Details page" scrapeable file in the tree on the left, then on the "Extractor Patterns" tab, then click the "Sub-Extractor Patterns" tab for our "DETAILS" extractor pattern. We're going to update each of our extractor pattern tokens so that they save their extracted values in a session variable. Do this by double-clicking each of them (e.g., on ~@TITLE@~) or right-clicking (control-clicking on Mac OS X) and selecting "Edit token". In the "Edit Token" box click the "Save in session variable?" check box, then close the "Edit Token" window. Do that for each extractor pattern token (~@TITLE@~, ~@PRICE@~, etc.).

We need to save the values in a session variable so that we can use them as POST parameters in the scrapeable file that POSTS's to our PHP file.

Let's create that scrapeable file now. Click on the "Shopping Site" scrapeable file in the tree on the left, then click the "Add Scrapeable File" button, found on the "General" tab. Once the scrapeable file appears give it the name "Save product". In the URL field enter:

http://www.screen-scraper.com/support/tutorials/tutorial5/db/save_product.php

Check the box labeled "This scrapeable file will be invoked manually from a script".

Click on the "Parameters" tab for the new scrapeable file, and give it five POST parameters, as shown in the screen-shot below:


You might remember that the ~# #~ delimiters indicate that the value of the corresponding session variable should be substituted in. For example, in our case the value of the TITLE session variable (e.g., "A Bug's Life") will be substituted in for the ~#TITLE#~ token. This value will be the one that gets submitted to the PHP file so that it can be inserted into the database.

Finally, we need to create a simple script that will invoke our new scrapeable file. Click on the "New Script" button (looks like a pencil and paper) in the button bar. Give the script the name "Save product", and give it the "Script Text":

session.scrapeFile( "Save product" );

The script simply tells screen-scraper to invoke the "Save product" scrapeable file.

Now we need to tell screen-scraper when to invoke the scrapeable file. We need it invoked for each product, so that they all get saved to the database. As such, we'll invoke the script after the "Details page" is requested. Do this by clicking on the "Details page" scrapeable file in the tree on the left, then on the "Scripts" tab. Click the "Add Script" button, and in the "Script Name" column select "Save product". Under the "When to Run" column select "After file is scraped".

Okay, we're done setting up screen-scraper, so we're ready give our scraping session a run. Before we invoke it, let's make one minor tweak so that the session doesn't take quite so long to run. In the "Shopping Site--initialize session" script, change the value for the "SEARCH" session variable from "dvd" to "bug". We'll get the two "Bug's Life" DVD's rather than every DVD in the system. Once you've done that click on the "Shopping Site" scraping session in the tree on the left, then on the "Run Scraping Session" button.

Once the scraping session has run it's course click on the "Save product" scrapeable file, then on the "Last Response" tab. You should see something like this for the response:

<?xml version="1.0" encoding="UTF-8"?>
<result>
<status>Success</status>
<product>
<title>A Bug\'s Life \"Multi Pak\"</title>
<price>35.99</price>
<manufactured_by>Warner</manufactured_by>
<model>DVD-ABUG</model>
<shipping_weight>7.00 lbs.</shipping_weight>
</product>
</result>

Which indicates that the last product was successfully inserted.

Now it's time to take a closer look at the PHP file...