Site with Frames, Login, Servlets and JavaScript

The site I am scraping had a main page with several frames. I've found the one I'm after which POSTs to a servlet (I've provided correct params) but the response I get (on the scrapable file) is just another page with frames in it and it has no siginifciant content (i.e. a text box).

Perhaps this means the login failed, but how can you really tell, do I need a script to pass around the session/params rather than just running each page as part of a sequence or might it be a configuration issue?

In a browser each page contains several frames, one which you login to which POSTs userID and password to a servlet. The response (if accepted) will be a similar looking page with similar frames, this second page is a search page but this time it calls function findCustomer() in an attached JavaScript file. How can I call that function, should it be automatic, or from a script?

I configured my browser to use the proxy and viewed the logs, it makes a stupid number of calls to different pages (one per frame I guess) and many more calls to JavaScript files and Servlets. I've tried several different approaches but I think the hardest part is telling if the login has worked, is there any easy way? Perhaps you could also suggest a particular approach might be suited to this kind of scrape?

Site with Frames, Login, Servlets and JavaScript

mubbers,

screen-scraper uses its own SSL certificate, the expiration date of which has been updated for releases later than 3.0. However, it has not been our experience that the error message caused by the certificate being out of date would cause the connection to fail all together.

Could you provide the URL where this is happening? If you prefer, you can either private message it to me or send me an email using my name with last name initial w @ our domain.

Thank you,
Scott

Site with Frames, Login, Servlets and JavaScript

Thanks for that I will look into it.

One thing I have noticed is that the site has an out-of-date certificate, which when using FireFox fails completely when trying to log in. Does screen scraper perform a similar check and perhaps fail without seeing it as an error?

Site with Frames, Login, Servlets and JavaScript

mubbers,

HTTPS is sometimes an issue when proxying the site but should have no bearing on the session itself. If it makes proxying impossible I'd recommend the following Firefox add-ons [url=https://addons.mozilla.org/en-US/firefox/addon/125]SwitchProxy[/url] and/or [url=https://addons.mozilla.org/en-US/firefox/addon/3829]Live HTTP Headers[/url]. [url=http://www.opera.com/]Opera[/url] is always worth having on hand, as well.

-Scott

Site with Frames, Login, Servlets and JavaScript

mubbers,

A couple different things could be going on here. If you're certain you're passing the login parameters to the correct file then you may also need to first include, in sequence, the page the occurs before you post your login parameters. This is usually the page where the blank username & password fields are.

The reason could be two-fold. One, that page sets a cookie (often a session cookie) that the login page needs. Two, the login page sometimes needs that previous page as the referrer.

Be sure to include any other post parameters along with your username and password. Generate your scrapeable files from transactions in the proxy server.

If it doesn't add to the confusion you can see even more of what is going on by using an add-on to Firefox called Live HTTP Headers.

https://addons.mozilla.org/en-US/firefox/addon/3829

Try not to get distracted with all the frames going on. screen-scraper should handle any talking back and forth going on without needed intervention.

Please let us know what other issues you may be having.

Thanks,
Scott

Site with Frames, Login, Servlets and JavaScript

One other thing I forgot to mention, the site uses HTTPS.

Thanks