This is known as a CAPTCHA mechanism, and is intended to discourage automated form submissions. There are essentially two ways of working around these:
Oftentimes sites will use a poorly implemented CAPTCHA such that it can be determined up front what the text will read. For example, the site may actually have only four or five images, and it simply cycles through them. By looking at the names of the images one could determine what the corresponding text will be. The text could then be used to populate the appropriate HTML form.
Assuming the CAPTCHA mechanism works as it should (i.e., that a human being would have to type in the text shown in the image), it gets a bit trickier to deal with. The best route would probably be to run a scraping session as you normally would, then, once you arrive at the page containing the CAPTCHA, follow these steps:
- Download the CAPTCHA image to the local hard drive (e.g., using the session.downloadFile method).
- Using a screen-scraper script, pop up a dialog box using Java code that displays the image, and contains a text box that will accept user input. Within a script you have full access to the Java API, so you could pop up something like a custom JDialog containing the image and text box.
- Have a person type into the text box the characters displayed in the image.
- Accept the text entered by the user, then drop it into a screen-scraper session variable.
- Use the value in the session variable to populate the HTML form element.
This obviously isn't ideal, but, unfortunately, there may not be another way. The CAPTCHA images are designed such that they can't be read by a machine. As such, human intervention is required.
Recent comments
13 hours 37 min ago
13 hours 44 min ago
15 hours 50 min ago
1 day 11 hours ago
1 day 11 hours ago
1 day 12 hours ago
1 day 12 hours ago
1 day 12 hours ago
1 day 12 hours ago
3 days 9 hours ago