NavigationUser loginscreen-scraper.com welcomes...
Currently online
There are currently 0 users and 5 guests online.
|
screen-scraper FAQ - FAQGeneral Non-Technical
General Technical
Tips & Suggestions
Troubleshooting
There are three editions of screen-scraper: Basic and Professional. The Basic Edition is completely free. It costs nothing and never will. The Professional and Enterprise Editions carry a licensing cost, which cost can currently be found here. Our screen-scraper Professional and Enterprise Editions are licensed on a per machine basis. A single instance of screen-scraper can scrape as many sites as the underlying hardware allows (i.e., we don't charge per site scraped). For the Enterprise Edition, we do allow a single license to be used for two machines under certain circumstances. If you have one machine you use for development, and another used for production/deployment, it is acceptable to use the same license for both machines. For a copy of the screen-scraper Professional and Enterprise Editions license please see our online copy here. We offer discounts to students and academic institutions. Please contact us directly if this is of interest. We also offer volume discounts, as follows:
When you purchased your license you would have entered in an email address. This address is used to unlock your local copy of screen-scraper. To do that, simply select "Enter registration information" from the "Options" menu, then enter that email address. We have a full offline version of our site in a zip file (which may be somewhat dated) here. Simply download it, decompress it, and open the index.html file. See our comparison matrix for this. In order to upgrade from version 3.x to version 4, you'll need to install version 4 fresh using one of our installers, accessible from this page. This is necessary because we're now using an upgraded version of the Java Runtime Environment, which can't be upgraded using the normal "Check for updates..." method of upgrading within screen-scraper. Those who licensed screen-scraper Professional Edition prior to the release of version 4.0 are entitled to a free upgrade. Please note that version 4.0 of the Professional Edition lacks some features that were available in version 3.0 (and subsequent alphas). If you licensed screen-scraper Professional Edition before version 4.0 was release, just send us a support request indicating such. As part of this upgrade you'll be entitled to a license for screen-scraper Enterprise Edition, but will not be entitled to the phone and email support that current licensees of the Enterprise Edition get. Steps to upgrading to version 4.0 from any version 3.x or below. 1. Make a back up of your existing installation and export your scraping sessions and scripts by following instructions here. 2. For Windows users, uninstall the old version version using Add/Remove Programs in the Control Panel. 3. Download and install the latest version from this page. 4. Import your work into your new installation following instructions on the lower half of this page. If you license the professional or enterprise edition of screen-scraper today you are entitled to major and minor upgrades for that particular edition forever at no cost. It's possible that we'll change this policy down the road, but that's how it stands today. Yes. To do so, though, you'll need to add and modify a few settings to screen-scraper's "resource\conf\screen-scraper.properties" file so that the port bindings for the various instances don't conflict with each other. You'll also want to be sure that screen-scraper is completely closed before copying any files or editing any properties files. Here are the properties you'll need to add and change in the screen-scraper.properties file, along with sample values: InstallDirectory=C:\Program Files\screen-scraper professional edition 2\ The ServerPort and ProxyPort settings should already be in your properties file. The rest will need to be added. Just be sure that you select different numbers for each of the ports across the various instances you have installed. Also note that you'll want to alter the properties file only when screen-scraper is not running. Regarding licensing issues for this type of setup, you're free to use the same license on each of the instances running on the same machine. However, you would need separate licenses for instances running across multiple machines (one per machine). If you're running screen-scraper on Windows, you'll follow the instructions above, but the screen-scraper server is run in a slightly different way than for normal installations. When screen-scraper is installed via the installer, it creates an NT service that can be used to start and stop the server. To install further instances on the same machine, you'll simply copy the originally installed version. For each of the new instances you'll also want to change the install location and port numbers, as described above. Running the server on the new instances will need to be done from a DOS window as the NT service will only control the first instance. To do that, open a DOS window, cd to the directory where the instance of screen-scraper is located that you'd like to run, then issue this command: jre\bin\java -Xmx128M -jar screen-scraper.jar --start-server --interactive The "-Xmx128M" indicates the maximum amount of RAM that will be allocated to screen-scraper, and can be changed to whatever value you'd like. After you issue this command screen-scraper will be running in "interactive" mode. To stop the server simply type "quit" at the prompt. No. screen-scraper is designed only to scrape data from web sites. If you're looking for a solution that can extract data from older mainframe-type applications, we'd recommend looking at Jagacy. In order to install screen-scraper on a machine, you'll likely need administrative or root access. Generally this is not the case with virtual hosting, so you likely will not be able to run screen-scraper on your server. Oftentimes this won't preclude you from using it, however. A common scenario is to scrape data on a local machine, write the data to a CSV file, then upload it to a server to be imported. If you have a database running on the server, you may also still be able to run screen-scraper from a local machine, then insert the scraped data into your database using the technique we describe in our fifth tutorial. The easiest way to install on a Unix-based operating system that isn't Linux, such as Solaris or BSD, is to first install screen-scraper on a Linux machine. Once installed, you'll want to zip or tarball up the directory where it's installed, then copy it over to the target machine (running something like Solaris or BSD). On the target machine, decompress the archive into the directory where you'd like screen-scraper to live. Once that's done, edit the "resource/conf/screen-scraper.properties" file so that the "InstallDirectory" contains the path to the directory. At this point you'll need to install a Java Virtual Machine whose version is at least 1.5. Once installed, edit both tthe "server" and "screen-scraper" shell scripts found in screen-scraper's install folder. In both files un-comment the "INSTALL4J_JAVA_HOME_OVERRIDE" parameter at the top, and give it the path to your JRE install location. From here you can launch screen-scraper as you normally do with the "server" and "screen-scraper" scripts. Sort of, yes. See this blog posting. The short answer to this one is, "Sometimes." Most all widgets (applets, etc.) that communicate with their server via HTTP can be sccraped by screen-scraper. Oftentimes, however, they'll use a proprietary protocol. Most of the time Adobe Flash movies use HTTP when they need to communicate with a server, but Java applets and ActiveX controls don't always. The easiest way to find out is to use screen-scraper's proxy server when interacting with a page containing one of these elements. Take a close look at the HTTP requests and responses passing between the web browser and the server. If you see text in there (often XML or URL-encoded lists of parameters) then the chances are good that screen-scraper can extract the information being passed between the client and server. Note, however, that there may be text that the widget is displaying that doesn't get passed between the client and server. Unfortunately, in such cases, screen-scraper is unable to extract that information. The only utilities we're aware of that may allow for scraping that type of information would be IBM's Rational Robot and OpenSpan. If you're using the Enterprise Edition of screen-scraper, this can be done via the web interface. For the Basic and Professional editions, the best way to go about this is to use an external scheduler, such as the Windows Task Scheduler or the Unix cron daemon. You'll typically set up one of these schedulers to either invoke screen-scraper from the command line or to invoke a separate application, which in turn invokes screen-scraper while it's running as a server. Unfortunately, the short answer to this question is, "it depends." If you're doing only very simple things with screen-scraper (e.g., scraping a few files once in a while) it could run comfortably in 64MB of RAM with a 500MHz processor. On the other end of the spectrum, if you're running multiple lengthy scraping sessions in parallel the memory and CPU requirements could climb quite a bit. Allocating the right amount of memory to screen-scraper invariably involves some experimentation. For example, you might run your scraping sessions in as realistic a scenario as possible, then use tools such as the Windows Task Manager or top to monitor CPU and memory usage. Remember that you can adjust the amount of memory screen-scraper is allocated by opening the "Settings" dialog box (click on the wrench icon), then altering the value labeled "Maximum memory allocation in megabytes". It might also be helpful to look over the question below on optimizing scraping sessions. screen-scraper will automatically follow certain redirects, so it just depends on what type the web site is making use of. There are three types of redirects that are typically used on the web: 1. 3xx HTTP responses. These are probably the most common, and are the ones screen-scraper will automatically follow. For example, instead of responding with a 200: OK HTTP response, the server will respond with 302: Moved Temporarily, then supply the URL the browser is to redirect to in a "Location" HTTP header. In these cases you shouldn't need to do anything at all; screen-scraper will simply follow them as a browser would. 2. META refresh tags. These are special HTML tags that are often embedded in a web page which contain the URL the browser is to redirect to. screen-scraper will not automatically follow these, so you'd need to create a separate scrapeable file to send screen-scraper to them. This might also involve extracting certain parameters from the URL before going to the redirected page. 3. JavaScript redirects. Occasionally sites will utilize client-side JavaScript to send the browser to a new location. As it pertains to screen-scraper, the technique for dealing with these is basically the same as that described in #2. Yes. This is a common situation, and generally just requires that you create a scrapeable file to handle logging in. This scrapeable file should be run first in the scraping session, allowing the web site to set cookies, which screen-scraper will then track for you. For example, if you wanted to scrape a list of all auctions you're watching from the ebay web site, you would create a scrapeable file that would first log you in (issue a POST request with your username and password), then you would create subsequent scrapeable files that would scrape the information you're interested in. There is also a special type of authentication known as "BASIC" or "WWW-Authenticate". You'll know a web site is using this when, upon attempting to access a particular URL, you are presented with a small dialog box requesting a username and password. When setting up screen-scraper to scrape a page using this type of authentication you simply need to enter in the username and password in the "Properties" tab under "BASIC Authentication Parameters" for the scrapeable file you set up to scrape the page. Note that you generally only need to enter the username and password once for a given site on a single scrapeable file, as screen-scraper will retain the username and password for you. We give an example of configuring screen-scraper to log in to a site in our third tutorial. screen-scraper supports HTTPS on all supported platforms except certain early versions of Mac OS X. If you're using the screen-scraper proxy server to access a site that uses HTTPS follow the directions found under the "Viewing encrypted transactions" found on this documentation page: Using the Proxy Server. In setting up scrapeable files to access pages that use HTTPS you don't need to treat them any differently than those that use HTTP. Absolutely. screen-scraper handles cookies (and BASIC authentication tokens) transparently behind the scenes. When setting up screen-scraper to scrape information from your site you rarely need to take any thought for cookies. In certain cases, sites will set cookies in JavaScript. In such cases, you can set them within a screen-scraper script via the session.setCookie method. Yes. To do so, though, you'll need to have separate copies of screen-scraper installed (copying an already installed instance works, too). You'll also need to add and modify a few settings to screen-scraper's "resourceconfscreen-scraper.properties" file so that the port bindings for the various instances don't conflict with each other. You'll also want to be sure that screen-scraper is completely closed before copying any files or editing any properties files. Here are the properties you'll need to add and change in the screen-scraper.properties file, along with sample values: #Change to match the new install directory #Change default values #Add these ports The ServerPort and ProxyPort settings should already be in your properties file. The rest will need to be added. Just be sure that you select different numbers for each of the ports across the various instances you have installed. Also note that you'll want to alter the properties file only when screen-scraper is not running. If you're on a Unix-based system, such as Linux or Mac OS X, you'll want to modify the following three files: start_server.sh, stop_server.sh, resource/conf/wrapper.conf. Each of these files contain the path to where screen-scraper is installed. Edit them so that they reflect the correct path. Regarding licensing issues for this type of setup, you're free to use the same license on each of the instances running on the same machine. However, you would need separate licenses for instances running across multiple machines (one per machine). If you're having trouble starting screen-scraper in server mode or running scraping sessions in server mode run the following command in a batch or shell script as an alternate way to start the server.
You also have two commands you can use within the console/command window:
Unfortunately, yes. This is a bug that slipped past our testing prior to the release of version 4.0. Because we do not offer alpha release ("unstable") upgrades in basic edition we are unable to resolve this issue until the next public release, version 5.0. We do not have set schedules for our public releases and can not say when the next release will be. Though we have not done extensive testing on Microsoft® Windows Server 2003 we have had reports of unusually slow performance. We attribute this to the implementation of Sun's JavaTM code with extra security restrictions. One possible solution would be to install screen-scraper to run under Windows 2000 compatibility mode. Instructions on how to set the compatibility mode during installation can be found here: http://support.microsoft.com/kb/324265. In cases where you're dealing with large numbers of scraping sessions, it becomes too cumbersome to retain them all in the workbench. Even if you organize them neatly into folders, there will likely still be too many to viably work with. Rather than keep all scraping sessions in the workbench at once, we generally find it useful to export and save them all to a central directory, which, ideally is under version control using something like Subversion or CVS. When you need to work with a particular scraping session, you simply import it from the repository. Every once in a while, you export the scraping session back to the central directory. Ideally the directory also gets backed up once in a while so that you don't lose any work. When working with a project where there are a large number of scraping sessions, you'll also often have a series of "general" scripts that get used by most, if not all, of your scraping sessions. For example, you might have one script that gets invoked by every scraping session, which is in charge of opening a database connection or initializing a file to which extracted data will be written. We typically handle these "general" scripts by storing them in a separate folder, alongside where all of the scraping sessions are stored. This directory should get versioned and backed up as well. The difference with the "general" scripts is that it's typically a good idea to keep them all in the workbench in their own folder. Usually there aren't very many of them, and they get used often enough that you'll typically want to just retain them in the screen-scraper workbench. If you've gone through our first few tutorials, you know that session variables can be embedded in URL's by using a token like this: ~#FOO#~ (see this page for a detailed example of this). Well, the very same technique can be used with POST variables. When you create a scrapeable file that uses POST parameters, they'll be displayed under the "Parameters" tab for that scrapeable file. In any of those POST parameters you can use the same type of token mentioned before. For example, if you're logging in to a web site (as described here), instead of hard-coding the username and password, you might instead substitute the tokens ~#USERNAME#~ and ~#PASSWORD#~ in the "Value" column, for the respective parameters. Prior to invoking that scrapeable file, you could then set two session variables corresponding to the username and password, which values would then be substituted for the ~#USERNAME#~ and ~#PASSWORD#~ tokens. This is known as a CAPTCHA mechanism, and is intended to discourage automated form submissions. There are essentially two ways of working around these:
This obviously isn't ideal, but, unfortunately, there may not be another way. The CAPTCHA images are designed such that they can't be read by a machine. As such, human intervention is required. This isn't a scenario you'll run into too often, but it's common enough that we decided to include it in the FAQ. At times you may run into a page containing various tables of data. All of the tables are essentially identical in structure, but when you extract the data you want to be able to tell which rows of data came from which tables. For example, consider this page. If you view the HTML from the page you'll notice that the structure of the two tables is basically the same. If you use a normal extractor pattern that matches a row of data, though, you're going to get all four rows of data, and won't be able to tell which row came from which table. That is, your first inclination might be to use an extractor pattern like this: It matches the data just fine, but you don't know which table each row came from. Once you've set up screen-scraper to extract data from a web site there's a good chance the web site will change at some point. Oftentimes cosmetic changes such as the addition of a font tag or changing text from bold to italic won't affect anything, but if the site makes more dramatic changes, such as altering their navigation system, then your scraping session will break. This generally results causes screen-scraper to either fail to extract records from the site entirely, or scrape significantly fewer records than it had previously. It also usually means that you'll need to update your scraping session to account for the changes in the web site. There are two approaches we generally take to addressing this issue. The first (and best) approach is to track the number of records screen-scraper extracts each time the scraping session is run. Let's suppose you're extracting records from a site that, on average, will yield about 100 records. If you run the scrape one day and it suddenly only extracts 10 records then something has likely changed with the site, so you'll probably need to adjust your scraping session to account for it. The second approach is to have a special extractor pattern or two that checks for a specific piece of text that you know should be present every time you scrape. This approach is most useful in cases where a site doesn't yield a consistent number of records. If your special extractor pattern doesn't match the text it's looking for then something has likely changed on the site. Along with all of this you'll likely want some kind of notification system so that you can be made aware when the site changes. To do this you might consider something like screen-scraper's sendMail function. Even better would be to set up an external application that monitors the number of records scraped each time, then logs an error in a database or log file if something comes up. As with any work you do on your computer, it's good to back it up once in a while. The preferred method for doing this in screen-scraper is to export your scraping sessions and scripts as XML files (note that you only need to back up the scripts that aren't referenced in scraping sessions--any scripts called from within scraping sessions will be automatically exported along with the scraping session). Once the files have been exported you might also consider storing them in a versioning system such as CVS or Subversion. screen-scraper will automatically back up your database periodically to ensure that you don't lose any work. You can also manually invoke this backup process by selecting "Backup Database" from the "File" menu. The database backups are stored in the "resourcedbbackup" folder. The directories within that folder contain previous versions of your database. If your database has somehow become corrupted, you may be able to simply revert back to a previous version. Help on that can be found here. Follow these steps:
Note that in doing this you'll be copying the entire screen-scraper database from one machine to another, so along with the licensing information it will also copy any scraping sessions, proxy sessions, and scripts. This will also mean overwriting any of those objects found on the unlicnesed instance. Before copying the database over, care should be taken to export any objects from the unlicensed instance that you'd like to retain. When screen-scraper normally updates itself it downloads a zip file from our server, decompresses it, copies the files it contains on top of the existing files, then updates its version number. You'll instead need to do this manually. To do so follow these steps:
The next time you launch screen-scraper you'll have the updated version. We're in the process of creating a browser-based interface for screen-scraper that will allow you to update screen-scraper without having to go through this manual process. If you're using the Enterprise Edition of screen-scraper, this can be done via the web interface. In the Professional or Enterprise Editions of screen-scraper, create a text file in screen-scraper's folder named register.txt file that contains a single line with the email address under which you registered screen-scraper. Start up either the screen-scraper server or invoke screen-scraper from the command line. screen-scraper will read in that file, validate the license, then write the result of the validation to a file called register_result.txt. Once the license has been validated, the register_result.txt file can be deleted. Here are some tips:
When screen-scraper applies extractor patterns to HTML it first strips out "unnecessary" white space. This makes the extraction process significantly faster; however, on rare occasions the white space may not be quite so "unnecessary". Circumventing this requires a bit of a workaround that involves replacing white space characters (such as hard returns) with temporary markers, applying the extractor pattern, then replacing the temporary markers with the white space characters. This is best illustrated by an example scraping session, which you can download here. Note that this should be considered a temporary solution. We'll address this issue more elegantly in an upcoming version of screen-scraper. There are several ways this can be done:
As a side note, it is by design that screen-scraper doesn't insert information automatically into a database for you. The approach we've taken to the design of screen-scraper is to ensure that it does one thing very well: extract information from web sites. Generally related to that process, however, are subsequent steps that involve manipulating and cleaning up the information, as well as storing it in some persistent mechanism (such as a database or text file). All of those things can be done by screen-scraper, but we've designed screen-scraper primarily to handle data extraction. The problem lies in the "Install4j" installer that we use. For some installations, you may have to make an alteration to the startup script. An example of the error from the command line is as follows: The usual solution is to follow these steps:
If problems persist, post a help request to our forum, or email us at support@screen-scraper.com This is a feature of screen-scraper that is slated for removal in future editions. Until then, follow these step to correct it if you have unchecked this box and imported the script to a machine running without a GUI.
Oftentimes you'll do your work on a development machine, then need to transfer objects up to a production machine. This generally includes scraping sessions and scripts. To do so you have two options: 1. Export your scraping session(s) and script(s) from one machine, then import them into the other. Instructions on doing this can be found here. 2. The second (and easier) possibility would be to simply copy your database from one machine to the other. The database consists of everything inside of the "resourcedb" directory of your screen-scraper installation. Due to certain changes introduced in Microsoft Windows Vista you will need to follow the steps below to ensure that screen-scraper works properly.
As an alternative to executing the screen-scraper binary in Linux you may need to execute a shell script containing the following code. This shell script works only in launching the screen-scraper workbench. To work with screen-scraper in server mode use start_server.sh and stop_server.sh. Execute this shell script from the same location where screen-scraper was installed.
This is most likely because the character set you're currently using is set to something screen-scraper's file exporter can't deal with. We're working on a fix for this, but in the meantime try changing the "Default character set" in the "Settings" dialog box to "UTF-8". This is an issue related to the installer software we use (InstallAnywhere). To remedy the problem, try the following:
First check to ensure that the screen-scraper server is running. Details on doing that can be found here. This may also be occurring because the IP address of the machine that is connecting to screen-scraper isn't listed in screen-scraper's list of allowed hosts. You can correct this in one of two ways:
After making either of the changes mentioned above, you'll need to restart screen-scraper. If that still doesn't help, check to ensure that you're trying to connect to screen-scraper using the port on which screen-scraper is listening. The default for the screen-scraper server is 8777, and the default for the SOAP server is 8779. These port numbers can both be altered via the "Settings" dialog box in the workbench (click the wrench icon), under the "Servers" section. This error is caused by two possible scenarios. Cause: Port Blocked. In order for screen-scraper to function properly it will need to open a series of local ports on your computer. There are occasions when these ports may be blocked by other software running on your machine, such as firewalls. If screen-scraper is telling you it can't bind to specific ports, you'll either need to free those particular ports up on your machine, or select different ports for screen-scraper to use. To free up the ports you may need to configure a firewall so that it allows for the ports to be bound. You may also need to quit another application that's using the same port (which could even be another instance of screen-scraper running on the same machine). If you'd like to configure screen-scraper to use different ports, see this FAQ. Cause: Crash. You might also get this error message if the screen-scraper workbench or server crashed, but the database process remains alive. If after the port number in the message it shows "(for the database)", this may be the cause. To remedy this, you'll need to kill the database process manually, then start screen-scraper again. The process to kill will be called "java" on Linux and Mac OS X, and "java.exe" on Windows. If you're running Linux, you likely already know how to kill a process. To kill a process in Windows open the "Windows Task Manager" (hit Ctrl-Shift Escape), click on the "Processes" tab, then kill any "java.exe" processes you know you don't need. On rare occasions the main screen-scraper database can become corrupted. This might happen if your computer crashes while screen-scraper is running, for example. Fortunately, as of version 2.8 screen-scraper will automatically back up your database periodically. Even if your database has become corrupted, it's likely you haven't lost much work. In the directory where screen-scraper is installed (e.g., "C:Program Filesscreen-scraper professional edition"), you'll find the following directory path: "resourcedbbackup". This "backup" folder should contain a series of folders with dates and times, each of which will contain a backup of your database. You'll use these to restore your database, by following the steps below:
We use InstallAnywhere for our installer, and it seems to have trouble with more recent versions of the Linux kernel. You may experience errors indicating an "error while loading shared libraries". A user of screen-scraper reported that this will resolve the issue: cp setup_ss_pro.bin setup_ss_pro.bin.bak (so we have a working copy) That is not a typo for export above (#xport). You must use the same number of characters or else the installer thinks that the file is corrupt. Now just make sure that /tmp/setup_ss_pro.bin is executable and then run. You may also need to perform the same trick with the "screen-scraper" binary file used to launch screen-scraper. If none of that helps, you might try installing via our Linux tarball, which can be downloaded here There are a few possible causes for this issue:
While we're still refining screen-scraper's ability to handle international character sets, based on our testing, it should handle most situations just fine. When scraping sites with international character sets, though, there are a few extra steps you'll need to take:
If you're having trouble with a particular site, please feel free to contact us so that we can look into it for you. By default screen-scraper will only allow updates to stable versions (e.g., 2.6 as opposed to 2.6.0.5a). In order to upgrade to unstable versions you need to open the "Settings" dialog box (click on the wrench icon in the button bar), then check the box labeled "Allow upgrading to unstable versions". After closing the Settings window, go again under Options and choose "Check for updates." If you're interested in upgrading or downgrading to a specific version (including alpha releases) please see the following instructions. Unfortunately, screen-scraper's prxoy server isn't perfect, and, on occasion, you'll encounter sites that it has difficulty with. Frequently the issue can be resolved by using a different web browser, such as Firefox or Opera. Depending on your operating system, instead of designating "localhost" in your web browser, you may need to enter "127.0.0.1" or the IP address of your computer. If you normally connect to the Internet through a proxy server (outside of screen-scraper), you'll need to configure screen-scraper to use that proxy server. This can be done in the "Settings" window (click on the wrench icon), under the "External Proxy Server" section. If changing your browser doesn't help, it's still possible that you can proxy the site enough that you can create scrapeable files from the requests. It simply needs to be done in a more piecemeal fashion. If you need to resort to this, try the following for each page you need to scrape:
Note also that you typically only need to proxy forms that use POST requests. Scrapeable files corresponding to normal links and forms that use the GET method can be created by simply copying the URL from your web browser. 3rd Party Options Alternatively, if the screen-scraper proxy freezes entirely and does not record any of the transaction you can access the HTTP header information within your browser by utilizing one of the following.
For additional instructions please see our page on Using Scrapeable Files. The most likely cause to this problem is that you don't have the latest Microsoft Virtual Machine installed on that computer. This is especially a problem with Windows Server 2003, as it does not ship with the Microsoft Virtual Machine. Please note that this is Microsoft's Virtual Machine, and not Sun's Java Virtual Machine. Microsoft's Virtual Machine can be downloaded from these locations: http://java-virtual-machine.net/download.html Certain versions of the Microsoft Windows Script environment don't contain all of the objects you might want to refer to in scripts (such as the FileSystemObject), which might cause screen-scraper to crash. If this occurs we would recommend installing the latest version of the Microsoft Windows Script environment, which can be downloaded here. This can also happen if you're running multiple scraping sessions in parallel that all use scripts written in VBScript or JScript. Unfortunately, this is an issue outside of our control. The Microsoft Scripting engine poses a limit such that if multiple instances of external scripts are run within it simultaneously unpredictable results can occur. If you need to run multiple scraping sessions in parallel we recommend that you script in Interpreted Java, JavaScript, or Python. This is most likely happening because screen-scraper is running out of memory. For example, if you're scraping large amounts information from a web site with multiple concurrent running scraping sessions, screen-scraper may need to keep a lot of information in memory while it does so. Here are possible ways to remedy this problem:
If those suggestions don't seem to help don't hesitate to email us. Certain scripts are meant to be invoked only in the context of a running scraping session. That is, a script might be invoked to be run after data is extracted by an extractor pattern (by selecting "After pattern is applied" when associating the script with the extractor pattern). In other words, only certain objects (e.g. dataSet and session) are in scope depending on when the script is run. For more details on this see the "Variable scope" section of this documentation page: Using Scripts. screen-scraper supports any character sets supported by the 1.5 Java Virtual Machine. A complete list can be found here: http://java.sun.com/j2se/1.5/docs/guide/intl/encoding.doc.html. about us | blog | contact us | legal |
SearchNew Video!Tags Throughout this SiteTags in FAQ_Vocab |


Recent comments
3 hours 46 min ago
3 hours 53 min ago
5 hours 59 min ago
1 day 1 hour ago
1 day 1 hour ago
1 day 2 hours ago
1 day 2 hours ago
1 day 2 hours ago
1 day 2 hours ago
2 days 23 hours ago