API
 |
API Documentation |
Overview
When writing scripts within screen-scraper, there are a number of objects and methods available to you. The Using Scripts page provides an overview of working with scripts, where this page provides details on specific objects and methods you'll use when scripting within screen-scraper.
The examples given here assume you're using "Interpreted Java" as the scripting language, but there should be very little difference in syntax if you decide to use another language. For example, if you're scripting in VBScript, you would simply omit the semi-colon at the end of each line, and for methods that don't return a value you would precede them with the VBScript keyword "Call" (either that, or omit the parentheses around the method parameters).
Built-in objects
screen-scraper provides four built-in objects, which will be available in scripts depending on when they're run. These objects are: session, scrapeableFile, dataSet, and dataRecord. See the "Variable scope" section on the Using Scripts page for details on which objects are available based on when scripts are run. For example, if you're running a script associated with an extractor pattern "After pattern is applied", you won't have access to the "dataRecord" object, but you will have access to the "dataSet" object.
The various built-in objects are detailed below:
* session: This variable refers to the currently running scraping session.
* scrapeableFile: This refers to the scrapeable file that is currently being requested and analyzed.
* dataSet: A data set is analogous to a result or record set that would be returned from a database query. A data set contains any number of data records, which are analogous to rows in a database. The dataSet object holds all data records extracted by an extractor pattern after it has been applied as many times as possible to the HTML retrieved by a scrapeable file.
* dataRecord: This gives access to the most recently extracted data record. This will most likely only be used in scripts that get accessed after each time an extractor pattern is applied. This object simply extends Hashtable, and documentation on its methods can be found here. Note that this object is populated using the names of tokens from extractor patterns. So, for example, if your extractor pattern uses a token named "CITY_CODE" the data extracted by that extractor pattern would be retrieved like so: cityCode = dataRecord.get( "CITY_CODE" );.
* com.screenscraper.scraper.RunnableScrapingSession: This is a class that can be instantiated within a script in order to run a scraping session. The "Maximum number of concurrent running scraping sessions" in the "Settings" dialog box will control how many scraping sessions can be run simultaneously.
* com.screenscraper.xml.XmlWriter: Oftentimes you want to write extracted data directly to an XML file. This class facilitates doing that.
API documentation
Use the quick links here to jump to more extended details of each of the methods below:
runnableScrapingSession
 |
runnableScrapingSession Methods |
RunnableScrapingSession
RunnableScrapingSession( String name ) (professional and enterprise editions only) RunnableScrapingSession( String name, ScrapingSession inheritedScrapingSession ) (professional and enterprise editions only) |
| Description |
| The RunnableScrapingSession class allows you to invoke a scraping session from within another scraping session. This documentation is for the constructors of the class. They take the name of an existing scraping session to be run. If the inheritedScrapingSession parameter is passed, the scraping session that will be run will inherit the session variables and logging from the inherited scraping session. For example, within a running scraping session, if you were to generate a new RunnableScrapingSession and pass it the current session (via the "session" object in a script), when the new scraping session is run it would have access to all of the session variables from the inherited scraping session, and would also log to the location. |
 |
| Example |
// Creates a new runnable session for the scraping session "My Session". myScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session" );
// Creates a new runnable session for the scraping session "My Session" // and passes it the current scraping session from which it will inherit // session variables and logging. myScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "My Session", session ); |
|
getName
|
runnableScrapingSession.getName()
|
| Description |
| Returns the name of the scraping session. |
 |
| Example |
// Stores the name of the scraping session in the variable sessionName.
sessionName = myScrapingSession.getName();
|
|
getTimeout
| runnableScrapingSession.getTimeout() |
| Description |
| Gets the timeout, in minutes, of the session. |
 |
| Example |
// Outputs the value of the timeout of the runnable scraping session // to the log. session.log( "Session timeout: " + runnableScrapingSession.getTimeout() ); |
|
getVariable
|
runnableScrapingSession.getVariable( String variableName )
|
| Description |
Gets the the value of the session variable named by variableName. This method should be called after the RunnableScrapingSession.scrape() method has returned. |
 |
| Example |
|
// Outputs the value of the variable "FOO" to the log.
session.log( "FOO: " + runnableScrapingSession.getVariable( "FOO" ) );
|
|
scrape
| runnableScrapingSession.scrape() |
| Description |
| Starts the session scraping. This is equivalent to clicking the "Run Scraping Session" button on the scraping session "General" panel. When this method is called it will return immediately, unless setDoLazyScrape has been set to "false". That is, the line just following the one executing this method will be run without waiting for the scraping session to finish scraping. Internally, screen-scraper spawns a separate thread to handle the scraping session so that the script can continue executing (and so that multiple scraping sessions can be run simultaneously). |
 |
| Example |
// Tells the session to start scraping. myScrapingSession.scrape(); |
|
setDoLazyScrape
| runnableScrapingSession.setDoLazyScrape( boolean doLazyScrape ) |
| Description |
| Indicates whether or not the scraping session should run concurrently (at the same time as) other scraping sessions. Note that we recommend not setting this value to "false" when running scraping sessions in the workbench as it will cause the interface to freeze up until sessions have completed. If you'd like to run multiple scraping sessions serially (one after another), the best option is to set the "Maximum number of concurrent running scraping sessions" to "1" in the "Settings" window. |
 |
| Example |
// Indicates that the runnable scraping session should not be run // in a separate thread. runnableScrapingSession.setDoLazyScrape( false ); |
|
setTimeout
| runnableScrapingSession.setTimeout( int timeout ) |
| Description |
| Sets the timeout, in minutes, of the session. That is, after the given number of minutes have passed the session will automatically terminate. This can be useful in cases where an infinite loop might occur (e.g., the same pages get scraped over and over). This method must be called before RunnableScrapingSession.scrape(). |
 |
| Example |
// Sets the timeout of the session to 60 minutes. runnableScrapingSession.setTimeout( 60 ); |
|
setVariable
|
runnableScrapingSession.setVariable( String identifier, Object value )
|
| Description |
Designates that value should be saved for the duration of the session, and can be accessed through the session object (described above) using the getVariable method. |
 |
| Example |
// Sets the session variable "LOGIN_USERNAME" with the value
// "my_username".
myScrapingSession.setVariable( "LOGIN_USERNAME", "my_username" );
|
|
scrapeableFile
 |
scrapeableFile Methods |
addHTTPParameter
| scrapeableFile.addHTTPParameter( HTTPParameter parameter ) |
| Description |
| Dynamically adds an HTTPParameter to the current scrapeable file. The HTTPParameter constructor is as follows: HTTPParameter( String key, String value, int sequence, int type ). Valid types for the constructor are TYPE_GET, TYPE_POST, and TYPE_FILE. Calling this method will have no effect unless it's invoked before the file is scraped. |
 |
| Example |
// Adds a new POST HTTP parameter to the current file. scrapeableFile.addHTTPParameter ( new com.screenscraper.common.HTTPParameter ( "key", "value", 1, com.screenscraper.common.HTTPParameter.TYPE_POST ) ); |
|
extractData
| scrapeableFile.extractData( String text, String name ) (professional and enterprise editions only) |
| Description |
| Manually invokes an extractor pattern, returning the extracted data in a DataSet object. The text parameter should be a string containing the HTML you'd like to extract information from. The name parameter should be the name of an extractor pattern of the form [scraping session]:[scrapeable file]:extractor pattern where the scraping session and scrapeable file portions of the name are optional. For example, if you passed in "My Scraping Session:My Scrapeable File:My Extractor Pattern" screen-scraper would find the extractor pattern named "My Extractor Pattern" inside the scrapeable file "My Scrapeable File", which it would look for inside the scraping session called "My Scraping Session". You could also pass in "My Scrapeable File:My Extractor Pattern", which would cause screen-scraper to look in the current running scraping session for the scrapeable file "My Scrapeable File" where it would look for the extractor pattern "My Extractor Pattern". If the extractor pattern you want to use is associated with the current scrapeable file you can simply pass in its name (e.g., "My Extractor Pattern"). |
 |
| Example |
// Applies the "PRODUCT" extractor pattern to the text found in the // productDescriptionText variable. The resulting DataSet from // extractData is stored in the variable productData.
import com.screenscraper.common.*;
DataSet productData = scrapeableFile.extractData( productDescriptionText, "PRODUCT" ); |
| Example |
// Expanded example using the "PRODUCT" extractor pattern to the text found in the // productDescriptionText variable. The resulting DataSet from // extractData is stored in the variable myDataSet, which has multiple dataRecords. // Each myDataRecord has a PRICE and a PRODUCT_ID.
import com.screenscraper.common.*;
myDataSet = scrapeableFile.extractData( productDescriptionText, "PRODUCT" );
for (i = 0; i < myDataSet.getNumDataRecords(); i++) {
myDataRecord = myDataSet.getDataRecord(i);
session.setVariable("PRICE", myDataRecord.get("PRICE"));
session.setVariable("PRODUCT_ID", myDataRecord.get("PRODUCT_ID"));
}
|
|
See also, How to manually extract data using the session.extractData method
extractOneValue
|
scrapeableFile.extractOneValue( String text, String name ) (professional and enterprise editions only) scrapeableFile.extractOneValue( String text, String name, String token ) (professional and enterprise edition version 4.0.20a and above only)
|
| Description |
| This method is similar to extractData except that it assumes only a single string will be returned. When the first method is invoked the first column in the first row of the resulting DataSet object will be returned and when the second method is invoked the column named token in the first row of the resulting DataSet object will be returned. The text parameter should be a string containing the HTML you'd like to extract information from. The name parameter should be the name of an extractor pattern associated with the current scrapeable file. The token parameter should be the name of the token in the extractor pattern from name. |
 |
| Example |
// Applies the extractor pattern "PRODUCT_NAME" to the data found in // the variable productDescriptionText. The extracted string is // stored in the productName variable. // Returns the value found in the first token found in the extractor pattern // or null if no token is found. productName = scrapeableFile.extractOneValue( productDescriptionText, "PRODUCT_NAME" ); |
 |
| Example |
// Applies the extractor pattern "PRODUCT_NAME" to the data found in // the variable productDescriptionText. The extracted string is // stored in the productName variable. // Returns the value found in the token "NAME" found in the extractor pattern // or null if no token is found. productName = scrapeableFile.extractOneValue( productDescriptionText, "PRODUCT_NAME", "NAME" ); |
|
getContentAsString
| scrapeableFile.getContentAsString() |
| Description |
| Gets the content that was retrieved when the scrapeable file was requested. |
 |
| Example |
// Sends the HTML of the current file to the log. session.log( scrapeableFile.getContentAsString() ); |
|
getCurrentPOSTData
| scrapeableFile.getCurrentPOSTData() |
| Description |
| Returns the POST data for the scrapeable file. Note that if this method is invoked after the scrapeable file is requested it will contain the POST data with all of the session variable tokens resolved. |
 |
| Example |
// Stores the POST data from the scrapeable file in the // currentPOSTData variable. currentPOSTData = scrapeableFile.getCurrentPOSTData(); |
|
getCurrentURL
| scrapeableFile.getCurrentURL() |
| Description |
| Returns the URL of the scrapeable file. Note that if this method is invoked after the scrapeable file is requested it will contain the URL with all of the session variable tokens resolved. |
 |
| Example |
// Stores the current URL in the variable currentURL. currentURL = scrapeableFile.getCurrentURL(); |
|
getName
|
scrapeableFile.getName()
|
| Description |
| Gets the name of the current scrapeable file. |
 |
| Example |
// Outputs the name of the scrapeable file to the log.
session.log( "Current scrapeable file: " + scrapeableFile.getName() );
|
|
getNonTidiedHTML
| scrapeableFile.getNonTidiedHTML() (enterprise edition only) |
| Description |
| If screen-scraper has been configured to retain non-tidied HTML, this method will return the original HTML sent from the web server before it was tidied by screen-scraper. This can be useful in debugging in cases where sometimes tidying succeeds and sometimes it doesn't. |
 |
| Example |
// Outputs the non-tidied HTML from the scrapeable file // to the log. session.log( "Non-tidied HTML: " + scrapeableFile.getNonTidiedHTML() ); |
|
getRetainNonTidiedHTML
|
scrapeableFile.getRetainNonTidiedHTML() (enterprise edition only) |
| Description |
| Indicates whether or not non-tidied HTML is to be retained for this scrapeable file. See scrapeableFile.getNonTidiedHTML for more details. |
 |
| Example |
// Outputs to the log whether or not non-tidied HTML is
// being retained.
session.log( "Retaining non-tidied HTML: " + scrapeableFile.getRetainNonTidiedHTML() );
|
|
getStatusCode
| scrapeableFile.getStatusCode() (professional and enterprise editions only) |
| Description |
| If this method is invoked after the HTTP request has been made for a scrapeable file, it will return the HTTP status code sent by the server (e.g., 200, 403, 404, 500). |
 |
| Example |
// Check for a 404 response (file not found). if( scrapeableFile.getStatusCode()==404 ) { session.log( "Warning! The server returned a 404 response." ); } |
|
noExtractorPatternsMatched
| scrapeableFile.noExtractorPatternsMatched() |
| Description |
| Will return true if no extractor patterns associated with the scrapeable file found a match. This can be a useful error-handling mechanism. |
 |
| Example |
// If no patterns matched, outputs a message indicating such // to the session log. if( scrapeableFile.noExtractorPatternsMatched() ) { session.log( "Warning! No extractor patterns matched." ); } |
|
removeAllHTTPParameters
| scrapeableFile.removeAllHTTPParameters() (professional and enterprise editions only) |
| Description |
| Removes all of the HTTP parameters from the current scrapeable file. This can be useful in cases where scrapeable files are requested multiple times and parameters are added dynamically using the addHTTPParameter method. |
 |
| Example |
// Removes all of the HTTP parameters from the current scrapeable file. scrapeableFile.removeAllHTTPParameters(); |
|
removeHTTPParameter
| scrapeableFile.removeHTTPParameter( int sequence ) |
| Description |
| Dynamically removes an HTTPParameter indicated by the parameter's sequence from the current scrapeable file. The order of the remaining parameters are automatically adjusted immediately upon calling the method. (NOTE: If calling this method more than once in the same script, and when used in conjunction with the addHTTPParameter method, it is important to keep track of how the list is reorderd before calling either method again.) This method can be used for both GET and POST parameters. Calling this method will have no effect unless it's invoked before the file is scraped. |
 |
| Example |
| // Removes the eighth HTTP parameter from the current file. scrapeableFile.removeHTTPParameter( 8 ); |
|
saveFileBeforeTidying
| scrapeableFile.saveFileBeforeTidying( String filePath ) (professional and enterprise editions only) |
| Description |
| Calling this method will cause screen-scraper to output to filePath the original HTML sent from the web server before it was tidied by screen-scraper. This can be useful in debugging in cases where sometimes tidying succeeds and sometimes it doesn't. This method must be called before the file is scraped. |
 |
| Example |
// Causes the non-tidied HTML from the scrapeable file // to be output to the file path. scrapeableFile.saveFileBeforeTidying( "C:/non-tidied.html" ); |
|
saveFileOnRequest
| scrapeableFile.saveFileOnRequest( String pathToSaveTo ) (enterprise edition only) |
| Description |
| Causes the file to be saved to the local file system after being requested by screen-scraper. This method must be called before the file is scraped. That is, the script calling this method should be associated with the scrapeable file, and should be invoked "Before file is scraped". Note that the preferred method for downloading files to the file system is session.downloadFile, but this method is useful in cases where a POST request is required to request the file. For example, if you'd like to download and save a PDF that is accessible only through a POST request it would be appropriate to use this method. |
 |
| Example |
// When the current file is requested it will be saved to the // local file system as "sample.pdf". scrapeableFile.saveFileOnRequest( "C:/downloaded_files/sample.pdf" ); |
|
setContentType
| scrapeableFile.setContentType( String contentType ) (professional and enterprise editions only) |
| Description |
| In certain rare cases it may be necessary to explicitly set the content type of the POST data of an HTTP request. This may be required in cases where a site is using AJAX, and the POST payload of a request is sent as XML (e.g., using the setRequestEntity method). This method must be invoked before the HTTP request is made (e.g., "Before file is scraped" for a scrapeable file). |
 |
| Example |
// Sets the type of the POST entity to XML. scrapeableFile.setContentType( "text/xml" ); |
|
setReferer
| scrapeableFile.setReferer( String url ) (professional and enterprise editions only) |
| Description |
| Dynamically sets the HTTP header referer for the current scrapeable file. This method must be called before the file is scraped. That is, the script calling this method should be associated with the scrapeable file, and should be invoked "Before file is scraped". |
 |
| Example |
// Sets the value of url as the HTTP header // referer for the current scrapeable file. import java.net.URL;
URL url = new URL( "http://www.foo.com/" ); scrapeableFile.setReferer( url ); |
|
setRequestEntity
| scrapeableFile.setRequestEntity( String requestEntity ) (professional and enterprise editions only) |
| Description |
| Sets the complete value that will be sent in the POST payload portion of the request. This method allows you to set the entity portion of a POST request that would otherwise be set by designating parameters under the "Parameters" tab for a scrapeable file. This is rarely necessary, but can be useful in cases where an entire string of XML must be sent (e.g., in many AJAX applications). |
 |
| Example |
// Sets the request entitiy to an XML document. scrapeableFile.setRequestEntity( "<outerNode><innerNode>my data</innerNode></outerNode>" ); |
|
setRetainNonTidiedHTML
|
scrapeableFile.setRetainNonTidiedHTML( boolean retainNonTidiedHTML ) (enterprise edition only) |
| Description |
| Sets whether or not non-tidied HTML is to be retained for the current scrapeable file. This defaults to false. See scrapeableFile.getNonTidiedHTML for more details. |
 |
| Example |
// Tells screen-scraper to retain tidied HTML for the current
// scrapeable files.
scrapeableFile.setRetainNonTidiedHTML( true );
|
|
setUserAgent
| scrapeableFile.setUserAgent( String userAgent ) (professional and enterprise editions only) |
| Description |
| In certain rare cases it may be desirable to explicitly set the "User-Agent" header screen-scraper will send for a given HTTP request. That is, screen-scraper will identify itself as if it were a specific web browser. If unspecified, the user agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" will be used. Note that this method must be invoked before the file is scraped. |
 |
| Example |
// Causes screen-scraper to identify itself as Firefox // running on Linux. scrapeableFile.setUserAgent( "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826" ); |
|
wasErrorOnRequest
| scrapeableFile.wasErrorOnRequest() |
| Description |
| Indicates if the server responds with a status code other than those in the 200 or 300 entity range or if the connection to the server timed out. Each time a server responds to a request made by screen-scraper it sends back a three digit code indicating the status of the response. Responses in either the 200 or 300 range indicate that there is no error in the transaction. Responses in either the 400 or 500 range indicate some kind of error. This method responds to such an occurance. |
 |
| Example |
// If an error occurred when the file was requested, an error // message indicating such gets output to the log. if( scrapeableFile.wasErrorOnRequest() ) { session.log( "Connection error occurred." ); } |
|
session
 |
session Methods |
addToNumRecordsScraped
| session.addToNumRecordsScraped( Object value ) (enterprise edition only) |
| Description |
| Adds to the number of records that have currently been scraped. If the number of records scraped value has not been set, calling this method will set it to value. This method is used in conjunction with record thresholds in the Enterprise edition. The value parameter can be either an int, an Integer or a String. See also the setNumRecordsScraped and getNumRecordsScraped methods. |
 |
| Example |
// Adds 10 to the value of the number of records scraped. session.addToNumRecordsScraped( 10 ); |
|
addToVariable
| session.addToVariable( String variable, int value ) (professional and enterprise editions only) |
| Description |
| Adds a value to a session variable. Session variables are generally stored as Strings, so it's normally more difficult than it should be to simply add a number to one. This method takes the name of the variable, which can either hold a String or Integer, and adds a number to it. The number added to it can be positive or negative. |
 |
| Example |
// Increments the session variable "PAGE_NUM" by one. session.addToVariable( "PAGE_NUM", 1 ); |
|
appendErrorMessage
| session.appendErrorMessage( String errorMessage ) (enterprise edition only) |
| Description |
| Appends an error message to any existing error messages. The error message will be displayed in the web interface. This should be used in conjunction with the setFatalErrorOccurred method. |
 |
| Example |
// First set the flag indicating that an error occurred. session.setFatalErrorOccurred( true );
// Append an error message. session.appendErrorMessage( "An error occurred in the scraping session." ); |
|
breakpoint
| session.breakpoint() (professional and enterprise editions only) |
When the session.breakpoint() method is called at any point in a script, the scraping session will pause and screen-scraper will display the following window:
| Description |
| Displays the "Breakpoint" frame. See the "Debugging scripts" on this page for more details. |
 |
| Example |
// Causes the breakpoint window to be displayed. session.breakpoint(); |
|
clearCookies
| session.clearCookies() (enterprise edition only) |
| Description |
| This will clear any cookies that have been stored by screen-scraper in the course of running a scraping session. |
 |
| Example |
// Clears any current cookies. session.clearCookies(); |
|
currentProxyServerIsBad
| session.currentProxyServerIsBad() (professional and enterprise editions only) |
| Description |
| If screen-scraper is currently using a pool of proxies (either set explicitly or via the automated anonymization feature), calling this method will cause the current proxy server to be removed from the pool. If the automated anonymization feature is being used, this will also cause a new proxy server to be spawned in place of the bad one. This method would be called in situations where a web site starts blocking requests not through some kind of HTTP error (e.g., a status code of 403), but instead by something manifested in the HTML response (e.g., it might contain a message saying, "This IP address has made too many requests."). See the help file on Anonymization for more detail. |
 |
| Example |
// Indicates that the current proxy server is bad. session.currentProxyServerIsBad(); |
|
downloadFile
session.downloadFile( String url, String fileName ) (professional and enterprise editions only) session.downloadFile( String url, String fileName, int maxNumAttempts ) (professional and enterprise editions only) session.downloadFile( String url, String fileName, int maxNumAttempts, boolean doLazy ) (enterprise edition only) |
| Description |
| Downloads the file found at the url and saves it to a local file system at the path designated by fileName. If the second version of the method is called, you can designate a maximum number of attempts, in the event that a single attempt fails. The third method allows for the file to be downloaded in a separate thread. If "true" is passed as the value for the third parameter the method call will return immediately, and the file will be downloaded while the scraping session continues on. |
 |
| Example |
// Downloads the image pointed to by the URL to the local C: drive. // A maximum number of 5 attempts will be made to download the file, // and the file will be downloaded in its own thread. session.downloadFile( "http://www.foo.com/imgs/puppy_image.gif", "C:/images/puppy.gif", 5, true ); |
|
executeScript
| session.executeScript( String scriptName ) (professional and enterprise editions only) |
| Description |
| Executes the script named by scriptName. |
 |
| Example |
// Executes the script "My Script". session.executeScript( "My Script" ); |
|
getElapsedRunningTime
| session.getElapsedRunningTime() (professional and enterprise editions only) |
| Description |
| Gets the number of milliseconds the current session has been running.. |
 |
| Example |
// Output the elapsed running time to the log. session.log( "Elapsed running time: " + session.getElapsedRunningTime() ); |
|
getErrorMessage
| session.getErrorMessage() (enterprise edition only) |
| Description |
| Returns the error message set via appendErrorMessage and setErrorMessage. |
 |
| Example |
// Output the current error message to the log. session.log( "Error message: " + session.getErrorMessage() ); |
|
getFatalErrorOccurred
| session.getFatalErrorOccurred() (enterprise edition only) |
| Description |
| Indicates whether or not the "fatal error" flag was set via the setFatalErrorOccurred method. |
 |
| Example |
// Output the "fatal error" state to the log. session.log( "Fatal error occurred: " + session.getFatalErrorOccurred() ); |
|
getImageHeight
| session.getImageHeight( String imagePath ) (enterprise edition only) |
| Description |
| This will return the height of an image found at imagePath. |
 |
| Example |
// Output the height of the image to the log. session.log( "Image height: " + session.getImageHeight( "C:/my_image.jpg" ) ); |
|
getImageWidth
| session.getImageWidth( String imagePath ) (enterprise edition only) |
| Description |
| This will return the width of an image found at imagePath. |
 |
| Example |
// Output the width of the image to the log. session.log( "Image width: " + session.getImageWidth( "C:/my_image.jpg" ) ); |
|
getLogFileName
| session.getLogFileName() (professional and enterprise editions only) |
| Description |
| If screen-scraper is running in server mode, this will return the name of the file to which screen-scraper is logging. This can be useful in determining which log file corresponds to the current session. |
 |
| Example |
// Output the name of the log file to the session log. session.log( "Current log file: " + session.getLogFileName() ); |
|
getName
| session.getName() |
| Description |
| Gets the name of the current scraping session. |
 |
| Example |
// Outputs the name of the scraping session to the log. session.log( "Current scraping session: " + session.getName() ); |
|
getNumRecordsScraped
| session.getNumRecordsScraped() (enterprise edition only) |
| Description |
| Gets the number of records that have currently been scraped. This method is used in conjunction with record thresholds in the Enterprise edition. See also the setNumRecordsScraped and addToNumRecordsScraped methods. |
 |
| Example |
// Outputs the number of records that have been scraped to the log. session.log( "Num records scraped so far: " + session.getNumRecordsScraped() ); |
|
getRetainNonTidiedHTML
| session.getRetainNonTidiedHTML() (enterprise edition only) |
| Description |
| Indicates whether or not non-tidied HTML is to be retained for all scrapeable files in this scraping session. See scrapeableFile.getNonTidiedHTML for more details. |
 |
| Example |
// Outputs to the log whether or not non-tidied HTML is // being retained. session.log( "Retaining non-tidied HTML: " + session.getRetainNonTidiedHTML() ); |
|
getVariable
| session.getVariable( String identifier ) |
| Description |
| Retrieves the value of a saved session variable designated by identifier. |
 |
| Example |
// Places the session variable "CITY_CODE" in the local // variable "cityCode". cityCode = session.getVariable( "CITY_CODE" ); |
|
loadVariables
| session.loadVariables( String fileToReadFrom ) (enterprise edition only) |
| Description |
| Loads in session variables from the file path indivated by fileToReadFrom. This method will most often be invoked after variables have been saved via the saveVariables method. The format of the file should be a hard return-delimited list of key/value pairs, where the key and value are both URL-encoded. |
 |
| Example |
// Reads in variables from the file located at C:\myvars.txt. // Note that a forward slash is used instead of a back slash // as a folder delimiter. If back slashes were used, they // would need to be doubled so that they're properly escaped // out for the script interpreter. session.loadVariables( "C:/myvars.txt" ); |
|
log
| session.log( String message ) |
| Description |
| Causes message to be sent to the log. When the workbench is running, this will be found under the "Log" tab for the scraping session. When screen-scraper is running in server mode, the message will get sent to the corresponding ".log" file found in screen-scraper's "log" folder. When screen-scraper is invoked from the command line, the message will get sent to standard out. |
 |
| Example |
// Sends the message to the log. session.log( "Inserting extracted data into the database." ); |
|
logCurrentDateAndTime
| session.logCurrentDateAndTime() (professional and enterprise editions only) |
| Description |
| Outputs to the log the current date and time in a human-friendly format. |
 |
| Example |
// Output the current date and time to the log. session.logCurrentDateAndTime(); |
|
logCurrentTime
| session.logCurrentTime() (professional and enterprise editions only) |
| Description |
| Outputs to the log the current time in a human-friendly format. |
 |
| Example |
// Output the current time to the log. session.logCurrentTime(); |
|
logElapsedRunningTime
| session.logElapsedRunningTime() (professional and enterprise editions only) |
| Description |
| Outputs to the log the amount of time the scraping session has been running in a human-friendly format. |
 |
| Example |
// Output the running time to the log. session.logElapsedRunningTime(); |
|
pause
| session.pause( long time ) (professional and enterprise editions only) |
| Description |
| Causes the scraping session to pause for the given number of milliseconds. |
 |
| Example |
// Pauses the scraping session for 5 seconds. session.pause( 5000 ); |
|
reformatDate
session.reformatDate( String date, String dateFormatFrom, String dateFormatTo ) (professional and enterprise editions only) session.reformatDate( String date, String dateFormatTo ) (enterprise edition only) |
| Description |
| These two methods change the format of a date. The first allows you to specify the existing format of the data, where the second attempts to guess the format. |
| In the first method, the date parameter would be a scraped date as text (e.g., 01/01/2010). The dateFormatFrom and dateFormatTo parameters are special strings containing tokens that reflect the format of the dates. Internally we make use of Sun's SimpleDateFormat class to handle the formatting, and you should look at their documentation to determine how the strings should be formatted (see especially the "Examples" section). |
| In the second method, the date parameter would be a scraped date as text (e.g., 01/01/2010). The datFormatTo parameter indicates the resulting format of the date. Internally we use PHP's date method to handle the formatting. You'll want to use the tokens on the PHP documentation page to indicate the format you'd like the date to be. You can also indicate "timestamp" for the "dateFormatTo" parameter, which will return a time stamp corresponding to the original date. |
 |
| Example |
// Reformats the date shown to the format "2010-01-01". session.reformatDate( "01/01/2010", "dd/MM/yyyy", "yyyy-MM-dd" ); session.reformatDate( "01/01/2010", "Y-m-d" ); |
|
resizeImage
session.resizeImageFixWidth( String originalFile, String newFile, int newWidthSize, boolean deleteOriginalFile ) (enterprise edition only) session.resizeImageFixHeight( String originalFile, String newFile, int newHeightSize, boolean deleteOriginalFile ) (enterprise edition only) session.resizeImageFixWidthAndHeight( String originalFile, String newFile, int newWidthSize, int newHeightSize, boolean deleteOriginalFile ) (enterprise edition only) |
| Description |
| Causes an image file to be resized. Depending on the method that gets invoked the height, width, or both can be fixed. If one of the dimensions is fixed, the aspect ratio of the resulting image will still be retained. The originalFile refers to the existing path on the file system to the image and the newFile is the path to the resized image. The deleteOriginalFile indicates whether or not the original file should be deleted once the image has been resized. |
 |
| Example |
// Resizes a JPG to 100 pixels wide, maintaining the // aspect ratio. After the image is resized, the original // will be deleted. session.resizeImageFixWidth( "C:/my_image.jpg", "C:/my_image_thumbnail.jpg", 100, true ); |
|
saveVariables
| session.saveVariables( String fileToSaveTo ) (enterprise edition only) |
| Description |
| Saves all current String and Integer variables to the file indicated by fileToSaveTo. This is useful if session state is to be saved across multiple scraping sessions. |
 |
| Example |
// Saves the current session variables out to C:\myvars.txt. // Note that a forward slash is used instead of a back slash // as a folder delimiter. If back slashes were used, they // would need to be doubled so that they're properly escaped // out for the script interpreter. session.saveVariables( "C:/myvars.txt" ); |
|
scrapeFile
| session.scrapeFile( String scrapeableFileIdentifier ) |
| Description |
| Causes the scrapeable file identified by scrapeableFileIdentifier to be scraped. |
 |
| Example |
// Causes the scrapeable file "Login" to be requested. session.scrapeFile( "Login" ); |
|
sendDataToClient
| session.sendDataToClient( String key, Object value ) (enterprise edition only) |
| Description |
| When screen-scraper is invoked from an external application, this causes data to be sent back to the client while the scraping session is still in process. This isn't currently supported in all drivers (i.e., remote scraping sessions), so check the documentation page for your paticular language to see if it is. This method can be especially useful in cases where a relatively large set of data is to be extracted, where otherwise the data would been to be stored in memory or cached so that it could be accessed after the scraping session completed. The types of objects that can be sent include Strings, Integers, DataRecords, and DataSets. |
 |
| Example |
// Causes the current DataRecord object to be sent to the client // for processing. session.sendDataToClient( "MyDataRecord", dataRecord ); |
|
sendMail
| session.sendMail( String subject, String body, String recipients, String attachments, String headers ) (enterprise edition only) |
| Description |
| Sends an email using the given subject and body to the given comma-separated recipients. The attachments (optional) property should be a comma-separated list of paths designating files that should be sent as attachments with the email. The headers (optional) parameter allows you to designate arbitrary SMTP headers to be used when sending the email. Note that in order for this to work properly a valid mail server must have been previously designated in the settings dialog box. |
 |
| Example |
// Sends an email message with the parameters shown. session.sendMail( "Test message", "This is the body of the email", "my_friend@mydomain.com", null, null ); |
|
setCookie
| session.setCookie( String domain, String key, String value ) (professional and enterprise editions only) |
| Description |
| Causes a cookie to be manually set on the current session state. Note that this method should be rarely used, given that screen-scraper automatically manages cookies. It might be necessary in cases where a site sets cookies via JavaScript. |
 |
| Example |
// Sets a cookie associated with "mydomain.com", using the // key "cookie_key" and the value "cookie_value". session.setCookie( "mydomain.com", "cookie_key", "cookie_value" ); |
|
setErrorMessage
| session.setErrorMessage( String errorMessage ) (enterprise edition only) |
| Description |
| Sets an error message. The error message will be displayed in the web interface. This should be used in conjunction with the setFatalErrorOccurred method. |
 |
| Example |
// First set the flag indicating that an error occurred. session.setFatalErrorOccurred( true );
// Append an error message. session.setErrorMessage( "An error occurred in the scraping session." ); |
|
setFatalErrorOccurred
| session.setFatalErrorOccurred( boolean fatalErrorOccurred ) (enterprise edition only) |
| Description |
| Indicates that a fatal error occurred while the scraping session was running. If this flag is set, in the web interface the scraping session will be displayed as having experienced an error. |
 |
| Example |
// Set the flag indicating that an error occurred. session.setFatalErrorOccurred( true ); |
|
setNumRecordsScraped
| session.setNumRecordsScraped( Object value ) (enterprise edition only) |
| Description |
| Sets the number of records that have currently been scraped. This method is used in conjunction with record thresholds in the Enterprise edition. The value parameter can be either an int, an Integer or a String. See also the addToNumRecordsScraped and getNumRecordsScraped methods. |
 |
| Example |
// Sets the value of the number of records scraped to 10. session.setNumRecordsScraped( 10 ); |
|
setRetainNonTidiedHTML
| session.setRetainNonTidiedHTML( boolean retainNonTidiedHTML ) (enterprise edition only) |
| Description |
| Sets whether or not non-tidied HTML is to be retained for all scrapeable files in this scraping session. This defaults to false. See scrapeableFile.getNonTidiedHTML for more details. |
 |
| Example |
// Tells screen-scraper to retain tidied HTML for its // consituent scrapeable files. session.setRetainNonTidiedHTML( true ); |
|
setVariable
| session.setVariable( String identifier, Object value ) |
| Description |
| Designates that value should be saved for the duration of the session, and can be accessed using the getVariable method using identifier. Note that the dataSet and dataRecord objects can be stored in session variables, and later accessed using a RemoteScrapingSession (see the links at the bottom of the Running screen-scraper as a server page for more details on this). |
 |
| Example |
// Sets the session variable "CITY_CODE" with the value found // in the first dataRecord (at index 0) pointed to by the // identifier "CITY_CODE". session.setVariable( "CITY_CODE", dataSet.get( 0, "CITY_CODE" ) ); |
|
stopScraping
| session.stopScraping() |
| Description |
| When invoked, screen-scraper will halt the current scraping session. |
 |
| Example |
// Stops scraping if an error response was received // from the server. if( scrapeableFile.wasErrorOnRequest() ) { session.stopScraping(); } |
|
xmlWriter
 |
XmlWriter Methods |
Before working with the methods below, you may wish to read our documentation about Writing extracted data to XML, which contains examples of scripts that utilize these methods.
addElement
(enterprise edition only) Element XmlWriter.addElement( String name ) Element XmlWriter.addElement( String name, String text ) Element XmlWriter.addElement( String name, String text, Hashtable attributes ) Element XmlWriter.addElement( Element elementToAppendTo, String name ) Element XmlWriter.addElement( Element elementToAppendTo, String name, String text ) Element XmlWriter.addElement( Element elementToAppendTo, String name, String text, Hashtable attributes ) |
| Description |
| Writes out a single element to the XML file. Each of these method calls return an Element object, which can subsequently be passed to other calls in order to add sub-nodes to an element. |
 |
| Example |
| The best way to see this in action is to refer to the examples in the main XmlWriter documentation page. |
|
addElements
(enterprise edition only) Element XmlWriter.addElements( Element elementToAppendTo, String name, Hashtable subElements ) Element XmlWriter.addElements( Element elementToAppendTo, String name, String text, Hashtable subElements ) Element XmlWriter.addElements( Element elementToAppendTo, String name, String text, Hashtable attributes, Hashtable subElements ) Element XmlWriter.addElements( String name, Hashtable subElements ) Element XmlWriter.addElements( String name, String text, Hashtable subElements ) Element XmlWriter.addElements( String name, String text, Hashtable attributes, Hashtable subElements ) Element XmlWriter.addElements( String containingTagName, DataSet dataSet ) Element XmlWriter.addElements( String containingTagName, String containingTagText, DataSet dataSet ) Element XmlWriter.addElements( String containingTagName, String containingTagText, Hashtable attributes, DataSet dataSet ) |
| Description |
| Writes out multiple elements to the XML file. Each of these method calls return an Element object, which can subsequently be passed to other calls in order to add sub-nodes to an element. |
 |
| Example |
| The best way to see this in action is to refer to the examples in the main XmlWriter documentation page. |
|
close
|
XmlWriter.close() (enterprise edition only) |
| Description |
|
Writes out the closing root element tag and closes up the XML file.
|
 |
| Example |
// Assuming the variable "xmlWriter" points to an existing
// instance of XmlWriter, this will close up the file.
xmlWriter.close();
|
|
xmlWriter
(enterprise edition only) XmlWriter( String fileName, String rootElementName ) XmlWriter( String fileName, String rootElementName, String rootElementText ) XmlWriter( String fileName, String rootElementName, String rootElementText, Hashtable attributes ) |
| Description |
| Creates a new XmlWriter. The fileName is the path to the XML file that should be written. The root element can be given a name and text as well as attributes by using the various constructors. |
 |
| Example |
// Creates a new writer that will write it's XML file to // C:\myfile.xml. The root element will be named "root_element" // and the text of the root element will be "This is the root." // Note that the forward slash after the "C:" is necessary // as it must be escaped out for the script interpreter. xmlWriter = new com.screenscraper.xml.XmlWriter( "C:/myfile.xml", "root_element", "This is the root." ); |
|