Sample Scraping Sessions

Some of the best scraping session examples are available from our main site. We always keep these scraping sessions up-to-date, so they should work if you download and import them into your own screen-scraper instance. You can get the scrapes by visiting each of these pages and clicking the Download Scrape button:

Tutorial 1: Hello World!

Used with Tutorial 1: Hello World!.

Attachment Size
Hello World (Scraping Session).sss 2.27 KB

Tutorial 2: Shopping Site

Used with Tutorial 2: Shopping Site

Attachment Size
dvds.txt 897 bytes
Shopping Site (Scraping Session).sss 11.36 KB

Tutorial 3: Extending Hello World

Used with Tutorial 3: Extending Hello World

Attachment Size
dvds.txt 897 bytes
Shopping Site (Scraping Session).sss 11.36 KB

Tutorial 4: Scraping a Shopping Site from an External Program

Used with Tutorial 4: Scraping a Shopping Site from an External Program

Attachment Size
Shopping Site (Scraping Session).sss 11.63 KB

Tutorial 5: Saving Scraped Data to a Database

Used with Tutorial 5: Saving Scraped Data to a Database

Attachment Size
Shopping Site (Scraping Session).sss 13.18 KB

Tutorial 6: Generating an RSS/Atom Feed from a Product Search

Used with Tutorial 6: Generating an RSS/Atom Feed from a Product Search

Attachment Size
Shopping Site (Scraping Session).sss 12.37 KB

Tutorial 7: Scraping a Site Multiple Times Based on Search Terms

Used with Tutorial 7: Scraping a Site Multiple Times Based on Search Terms

Attachment Size
Shopping Site (Scraping Session).sss 13.06 KB

Using RunnableScrapingSesssion Class

Example implementation of the RunnableScrapingSession Class.

Import both scraping sessions.

Run the "RunnableScrapingSession Example Starter" scraping session. It will set a variable name "Var1" and will spawn the "RunnableScrapingSession Example" scraping session where the value of "Var1" will be referenced.

CAPTCHA User input

Takes the session variable CAPTCHA_URL, generates a user input window, then saves the output to CAPTCHA_TEXT.

CAPTCHA--Automated response using decaptcher.com

This scraping session downloads CAPTCHA image from Google's recaptcha.com, passes image to decaptcher.com service and receives response as TEXT.

Using OCR with screen-scraper

Within screen-scraper you have the ability to call outside programs directly from your scripts. The following is an example scraping session that makes use of Tesseract OCR and Imagemagick in order to take an image from the internet and attempt to read the text of the image.

As is, the scraping session is intended to run on Linux. However, it is possible to run both dependent programs under Windows either directly or using Cygwin.

To use:

Download and import the following scraping session.

Attachment Size
ocr (Scraping Session).sss 5.96 KB