Tutorial 2: Page 5: Creating the Script to Initialize the Scraping Session

Creating the Script to Initialize the Scraping Session

We're now going to create a small script to initialize our scraping session. It's a common practice to run a script at the very beginning of a scraping session that can initialize variables and such. That's what we'll be doing here.

Generate the script either by clicking the "New Script" button (looks like a pencil and paper) or by selecting "New Script" from the "File" menu. In the "Name" field type "Shopping Site--initialize session". You'll remember from the first tutorial that screen-scraper scripts get invoked when certain events occur. We'll be invoking this script before the scraping session begins, as we did in the second tutorial.

If you prefer to code in Java (or JavaScript), select "Interpreted Java" from the "Language" drop-down, then copy and paste the following text into the "Script Text" box:

// Set the session variables.
session.setVariable( "SEARCH", "dvd" );
session.setVariable( "PAGE", "1" );


If you prefer to code in VBScript, select "VBScript" from the "Language" drop-down, then copy and paste the following text into the "Script Text" box:

' Set the session variables.
Call session.SetVariable( "SEARCH", "dvd" )
Call session.SetVariable( "PAGE", "1" )


We set two session variables on our current scraping session. The one item to note is the "PAGE" session variable. We start at 1 so that the first search results page will get requested first.

Before trying out this script let's modify the parameters for our scrapeable file so that they make use of the session variables. Click on the "Search results" scrapeable file, then on the "Parameters" tab. Change the value of the "keyword" parameter from "dvd" to "~#SEARCH#~" (without the quotes), and change the value of the "page" parameter from "2" to "~#PAGE#~" (again, omit the quotes).

The ~#SEARCH#~ and ~#PAGE#~ tokens will be replaced at runtime with the values of the corresponding session variables. As such, the first URL will be as follows:

http://www.screen-scraper.com/shop/index.php?main_page=advanced_search_result&keyword=dvd&sort=2a&page=1

That is, screen-scraper will take all of our "GET" parameters, append them to the end of the URL, then replace any embedded session variables (surrounded by the ~# #~ markers) with their corresponding values.

Note that we could achieve the same effect by deleting all of the parameters from the "Parameters" tab, and replacing our URL with this:

http://www.screen-scraper.com/shop/index.php?main_page=advanced_search_result&keyword=~#SEARCH#~&sort=2a&page=~#PAGE#~

Breaking out the parameters under the "Parameters" simply makes them easier to manage, which is why we take that approach.

We'll now need to associate our script with our scraping session so that it gets invoked before the scraping session begins. To do that, click on the scraping session in the tree on the left, then on the "Scripts" tab. Click the "Add Script" button to add a script. In the "Script Name" column select "Shopping Site--initialize session". The "When to Run" column should show "Before scraping session begins", and the "Enabled" checkbox should be checked. This will cause our script to get executed at the very beginning of the scraping session so that the two session variables can get set.

All right, we're ready to try it all out. This scraping session will generate a larger log than the one we worked on earlier, so it may be a good idea to increase the number of lines screen-scraper will display in its log. To do that, click on the scraping session in the tree on the left, then on the "Log" tab. In the text box labeled "Show only the following number of lines" enter the number 1000.

Run the scraping session by selecting it in the tree on the left, then click the "Run Scraping Session" button. View the progress of the scraping session by clicking on it in the tree on the left, then clicking on the "Log" tab. You'll notice that the URL of the requested file is the one given above. You can also verify that the correct URL was requested by clicking on the "Search results" scrapeable file, then on the "Last Response" tab, then on the "Render HTML" or "Display Response in Browser" buttons. The page should resemble the one you saw in your web browser.

Remember that it's a good idea to run scraping sessions often as you make changes, and watch the log and last responses to ensure that things are working as you expect them to. You'll also want to save your work frequently. Do that now by hitting the "Save" button (the one with the disk icon).