Scripts

Overview

screen-scraper has a built-in scripting engine to facilitate dynamically scraping sites and working with data once it's been extracted. Scripts can be helpful for such things as interacting with databases and dynamically determining which files get scraped at when.

Invoking scripts in screen-scraper is similar to other programming languages in that they're tied to events. Just as you might designate a block of code to be run when a button is clicked in Visual Basic, in screen-scraper you might run a script after an HTML file has been downloaded or data has been extracted from a page. For more information see our documentation on scripting triggers.

Depending on your preferences, there are a number of languages that scripts can be written in. You can learn more in the scripting in screen-scraper section of the documentation.

If you haven't done so already, we'd highly recommend taking some time to go through our tutorials in order to get more familiar with how scripts are used.

Managing Scripts

Adding

  • Select New Script from the File menu.
  • Click on the pencil and paper icon in the button bar.
  • Right click on a folder in the objects tree and select New Script.
  • Use the keyboard shortcut Ctrl-L.

Removing

  • Press the Delete key when it is selected in the objects tree.
  • Right-click on the script in the objects tree and select Delete.
  • Click the Delete button in the main pane of the script.

Importing

  • Right-click on the folder in the objects tree that you want to import the script into (other than the root folder) and select Import Into. In the window that opens, navigate to and select the script you want to import.
  • Select Import from the File menu. In the window that opens, navigate to and select the script you want to import.
  • Add the script to the import folder in screen-scraper's install directory

    If screen-scraper is running when you copy the files into the import folder they will be imported and hot-swapped in the next time a scraping session is invoked. They will also be imported if you start or stop screen-scraper.

Exporting

  • Right-click on the script in the objects tree and select Export.
  • Click the Export button in main pane of the script.

Scripts: Main Pane

  • Export: Export the script to a file so that it can be backed up or transferred to other instances of screen-scraper.
  • Delete: Delete the script.
  • Show Script Instances: Display any locations where this script is invoked in the format scraping session: scrapeable file: extractor pattern (opens in a new window).
  • Name: A unique name so that you can easily indicate when it should be invoked.
  • Language: Select the language in which the script is written.
  • Overwrite this script on import (professional and enterprise editions only): Determines whether or not the current script can be overwritten by another that gets imported.

    For example, scripts attached to a scraping session are exported along with it. When you subsequently import that scraping session into another instance of screen-scraper it might overwrite existing scripts in that instance. For more information read our documentation on script overwriting.

  • Script Text: A text box in which to write your script.
  • Find: Opens a search window to help locate text in your script.
  • Wrap text: Determines whether single lines of code should be displayed on multiple lines when they are wider than the Script Text area.

Script Triggers

Overview

You designate a script to be executed by associating it with some event. For example, if you click on a scraping session, you'll notice that you can designate scripts to be invoked either before a scraping session begins or after it completes. Other events that can be used to invoke scripts relate to scrapeable files and extractor patterns.

Available associations (based on object location) are listed with a brief description of how they can be useful.

  • Scraping Session
    • Before scraping session begins - Script to initialize or debug work well here.
    • After scraping session ends - This association is good for closing any open processes or finishing data processes.
    • Always at the end - Forces scripts to run at the end of a scraping session, even if the scraping session is stopped prematurely.
  • Scrapeable File
    • Before file is scraped - Helpful for files used with iterators to get product lists and such.
    • After file is scraped - Good for processing the information scraped in the file.
  • Extractor Pattern
    • Before pattern is applied - Good for giving default values to variables, in case they don't match.
    • After pattern is applied - Good if you want to work with the data set as a whole and it's methods.
    • Once if pattern matches - Simplifies the issue of matching the same link multiple times but only wanting to follow it once.
    • Once if no matches - Helpful in catching and reporting possible errors.
    • After each pattern match - Gives access to data records and their associated methods.

Managing Associations

Adding

All objects that can have scripts associated with them have buttons to add the script association with the exception of scripts. To create a association between scripts you would use the executeScript method of the session object.

Locations to specify script associations are listed below.

Removing

  • Press the Delete key when the association is selected.
  • Right-click the association and select Delete.

Ordering

Script associations are ordered automatically in a natural order based on their relation to the object they are connected to: scripts called after the file is scraped cannot be ordered before associations the are called before the file is scraped. Beyond the natural ordering you can specify the order of the scripts using the Sequence number.

Enable/Disable

You can selectively enable and disable scripts using the Enabled checkbox in the rightmost column. It's often a good practice to create scripts used for debugging that you'll disable once you run scraping sessions in a production environment.