Invoking screen-scraper from the Command Line

Overview

Scraping sessions created within screen-scraper can be invoked by running screen-scraper from a Unix terminal or a DOS command prompt. This allows for possibilities such as scraping information at regular intervals via something like cron or a scheduled task. The basic syntax is as follows:

jre\bin\java -jar screen-scraper.jar -s "scraping_session_name" [-p "url-encoded_variable_string"]

If you installed a version of screen-scraper that includes a Java Virtual Machine (currently Windows and Linux), you'll want to preface the command with "jre\bin\" on Windows or "jre/bin/" on Linux.

Windows Version

{screen-scraper-install-folder}\jre\bin\java -jar screen-scraper.jar -s "scraping_session_name" [-p "url-encoded_variable_string"]

You could also do it in two steps. In which case the two commands are represented below.

cd {screen-scraper-install-folder}

jre\bin\java -jar screen-scraper.jar -s "scraping_session_name" [-p "url-encoded_variable_string"]

{screen-scraper-install-folder} is the location where you installed screen-scraper, such as "C:\Program Files\screen-scraper professional edition\".

Examples

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -jar screen-scraper.jar -s "Google search" -p "search_string=screen+scraper"

This would invoke the Google search scraping session and pass in a parameter named search_string containing the value screen scraper. This will cause a session variable named search_string to be created, which would hold the value screen scraper.

Passed-in parameters need to be URL-encoded strings, just like the query string in a URL.

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -jar screen-scraper.jar -s "Hotmail mail retrieval" --params "user_name=uname&password=mypass"

This one would invoke the Hotmail mail retrieval scraping session and pass in two parameters: user_name containing the value uname and password containing the value mypass. These parameters will become session variables.

Piping Log to a File

While running screen-scraper from command line, you can have the log written to a file by piping it. In order to do that, you need to change the code from the above examples.

For the first example, lets say you want to write a log file with a name google_search.log, the code would change to:

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -jar screen-scraper.jar -s "Google search" -p "search_string=screen+scraper" > "log\google_search.log"

The only difference is at the end of the request: > "log\google_search.log". This instructs the log of the scrape to be written to the log\google_search.log file.

The bat file with the above code needs to be inside the folder where screen-scraper is installed. But if you want your bat file somewhere else other than the screen-scraper installed directory, you have to make some changes to the code. First, you have to cd to the directory where screen-scraper is installed. The code will look like this:

cd "C:\Program Files\screen-scraper professional edition"

"jre\bin\java" -jar screen-scraper.jar -s "Google search" -p "search_string=screen+scraper" > "log\google_search.log"

Similarly, for the second example the code to write a log file will be:

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -jar screen-scraper.jar -s "Hotmail mail retrieval" --params "user_name=uname&password=mypass" > "log/hotmail_mail_retrieval.log"

The above code will write a log file hotmail_mail_retrieval.log inside the log directory.

If your bat file is not inside the screen-scraper installed directory, the code should be like this:

cd "C:\Program Files\screen-scraper professional edition"

"jre\bin\java" -jar screen-scraper.jar -s "Hotmail mail retrieval" --params "user_name=uname&password=mypass" > "log/hotmail_mail_retrieval.log"

When running on Windows, any % character needs to be doubled because this character is treated in a special way in DOS. For example, the parameter "string=hello%21world" would need to be passed in as "string=hello%%21world".

Xmx-flag

While running screen-scraper from command line, there is one thing we need to consider: Memory size. Java runs with a fixed amount of heap memory, which happens to be 64Mb by default. If you get an error message that says it's out of memory then this is because screen-scraper consumed all the heap memory and requires more in order to continue its job.

You can increase the heap memory with the -Xmx flag. To set the heap memory size to 1024 megabytes, use the flag below.

-Xmx1024M

Lets say, we got an error message out of memory size, while running the Hotmail mail retrieval scraping session (from the examples above). The code to increase the heap memory size will be:

"C:\Program Files\screen-scraper professional edition\jre\bin\java" -Xmx1024M -jar "screen-scraper.jar" -s "Hotmail mail retrieval" --params "user_name=uname&password=mypass" > "log/hotmail_mail_retrieval.log"

This code will increase the heap memory size of java to 1024 megabytes.

Remember not to set the heap memory size larger than the physical memory of the machine you are running on.