Tutorial 2: Page 9: Saving the Data

Saving the Data

Once screen-scraper extracts data there are a number of things that can be done with it. For example, you might be invoking screen-scraper from an ASP script, which, after telling screen-scraper to extract data, might display it to the user. In our case we'll simply write the data out to a text file. To do this, we'll once again write a script. Create a new script, call it "Write data to a file", and use either the following Interpreted Java:

FileWriter out = null;

try
{
session.log( "Writing data to a file." );

// Open up the file to be appended to.
out = new FileWriter( "dvds.txt", true );

// Write out the data to the file.
out.write( dataRecord.get( "TITLE" ) + "\t" );
out.write( dataRecord.get( "PRICE" ) + "\t" );
out.write( dataRecord.get( "MODEL" ) + "\t" );
out.write( dataRecord.get( "SHIPPING_WEIGHT" ) + "\t" );
out.write( dataRecord.get( "MANUFACTURED_BY" ) );
out.write( "\n" );

// Close up the file.
out.close();
}
catch( Exception e )
{
session.log( "An error occurred while writing the data to a file: " + e.getMessage() );
}

Or the following VBScript (remember to select "VBScript" from the "Language" drop-down box):

' Generate objects to write data to a file.
Set objFSO = CreateObject( "Scripting.FileSystemObject" )
' The "8" indicates that we want to append data to the file.
Set objDVDFile = objFSO.OpenTextFile( "dvds.txt", 8, True )

' Write out the data to the file.
objDVDFile.Write dataRecord.Get( "TITLE" ) + vbTab
objDVDFile.Write dataRecord.Get( "PRICE" ) + vbTab
objDVDFile.Write dataRecord.Get( "MODEL" ) + vbTab
objDVDFile.Write dataRecord.Get( "SHIPPING_WEIGHT" ) + vbTab
objDVDFile.Write dataRecord.Get( "MANUFACTURED_BY" ) + vbTab
objDVDFile.Write vbCrLf

' Close the file and clean up.
objDVDFile.Close
Set objFSO = Nothing

Our script simply takes the contents of the current data record (which for us will be the data record that constitutes a single DVD) and appends it to a "dvd.txt" text file.

If you're familiar with VBScript or Java, hopefully the scripts make sense. There is one important point worth noting, though. You'll notice that each script makes use of a "DataRecord" object (referenced as the "dataRecord" variable in the scripts). This object refers to the current DataRecord as the script is executed. Again, think of the spreadsheet. When the script gets invoked, a specific DataRecord (or row in the spreadsheet) will be current. This DataRecord automatically becomes a variable you can use in your script. The DataRecord object has a "get" method, which allows you to retrieve the value for a key it contains (i.e., you're referencing a specific cell in the spreadsheet). Again, you can read more about objects available in scripts and their scope in our documentation, at the Using Scripts and API Documentation pages.

Click on the "Details page" scrapeable file, then on the "Extractor Patterns" tab. Below the extractor pattern text click the "Add Script" button. In the "Script Name" column, select "Write data to a file" and in the "When to Run" column select "After each pattern application" (even though there will only be one match per page). For each DVD we'll execute the script that will write the information out to a file.

To clarify a bit further, because we're invoking the script "After each pattern application", the "dataRecord" variable will be in scope. In other words, for each row in the spreadsheet (which happens to be a single row in this case) screen-scraper will execute the "Write data to a file" script. Each time it gets invoked a DataRecord will be current (again, think of it walking through each row in the spreadsheet). As such, we have access to the current row in the spreadsheet by way of the "dataRecord" variable. Had we indicated that the script was to be invoked "After pattern is applied", the "dataRecord" would not be in scope. Again using the spreadsheet analogy, scripts that get invoked "After pattern is applied" would run after screen-scraper had walked through all of the rows in the spreadsheet, so no DataRecord would be in scope (i.e., it's at the end of the spreadsheet--after the very last row). See the Variable scope section in our documentation for more detail on which variables are in scope depending on when a given script is run.

Once again, run the scraping session. This time if you check the directory where screen-scraper is installed you'll notice a dvds.txt file that will grow as the DVD details pages get scraped.

Note that as an alternative to the above scripts you could do the following in Interpreted Java (professional and enterprise editions only):

dataSet.writeToFile( "dvds.txt" );

Or in VBScript:

Call dataSet.WriteToFile( "dvds.txt" )

We included the first example to demonstrate referencing data records in scripts.

If you would like more information on saving extracted data to a database please consult our FAQ on the topic here.