Can't login to site using s-s proxy

I'm having trouble with a secure site login - if I use the s-s proxy the login seems to fail, though doesn't explicitly say so, and redirects back to the blank login page. I have tried using IE and Firefox.

[I have of course tested I can login fine without proxy]

I am happy to provide data files/responses etc if it helps.

Any help appreciated.

DAvid

Can't login to site using s-s proxy

David,

Well, the reason it wouldn't proceed past the page after you submit your login credentials is due to that page using JavaScript to redirect the client. Currently the default HTTP Client that screen-scraper uses does not follow page redirects that use JavaScript. And, using IE as the HTTP Client is not recommended due to a memory leak which we're still working out.

The solution to this is to set the page it is trying to redirect to as a subsequent scrapeable file. In my test this worked and maintained the log in session.

However, after you're logged in you may be overwhelmed with how to scrape any of the data that you can see in your browser. I am unable to load the main content within Firefox or Opera. They both sit loading data and eventually redirect to the homepage. IE7 prompts me to install MSXML 5.0 in order to display the content but this is as far as I went.

I personally have yet to encounter a site whose client-side code is this unique.

If you would like our company to look further into scraping the data from this site please let me know. Up front, this may not be the easiest and therefore not the cheapest to scrape. But we'll still offer you a free quote.

Thanks,
Scott

Can't login to site using s-s proxy

Hi Scott

Thanks for your advice.

What you suggest is pretty much what I've tried already but I went through it again anyway - with same result.

So I have sent you the log file/s by email so you can have a look at what happens.

I hope I'm not being stupid!

David

Can't login to site using s-s proxy

David,

Before you send it along let me just tell you that for a majority of sites out there screen-scraper should handle any additional HTTP transactions after you submit your credentials. You may be seeing the server sending redirecting your client here and there but screen-scraper is programmed to respond just like your browser is.

So, in your session be sure to include the log-in page as a scrapeable file even if you're not scraping anything from it. This is the page with the blank username and password fields. The reason being that sometimes the page you submit your credentials to needs that page as a referrer. Then screen-scraper should follow any redirects until your Last Response tab contains HTML of the page you're expecting.

Try just sending the correct POST data to the page that Live HTTP Headers shows your credentials being POST to and see if screen-scraper follows the servers direction properly and drops you where you want to be.

If it does not it would be helpful to see what the log shows. You can either PM that to me or send it to my email, my name, last initial w @ our domain.

Thanks,
Scott

Can't login to site using s-s proxy

Thanks for the reply Scott.

I have tried other browsers, including Opera, with same result.

I have also loaded up the Live HTTP Headers extension to Firefox and recorded the headers for both successful (without proxy) and failed (with proxy). The trouble is the login process is a bit complicated and I'm not sure I fully understand what it is doing although I think I know where it is failing.

Could I pm or email you the header files so you can take a look at how the site works? Of course you'll also see the URL/site concerned in there.

Support much appreciated. Thanks, David

Can't login to site using s-s proxy

David,

Occasionally we come across a site that just won't proxy using the screen-scraper proxy. As an alternative to using the screen-scraper proxy I would recommend installing the Firefox add-on, "[url=https://addons.mozilla.org/en-US/firefox/addon/3829]Live HTTP Headers[/url]". Start a capture using that add-on and it will reveal all of the HTTP Header data during your transaction. It works well. The one draw back is if you have a number of post parameters. You'll need to manually enter the key/value pairs under the parameters tab of your scrapeable file(s).

Additionally, you could try using Opera. Some sites have worked only using Opera (a nice browser, too).

http://www.opera.com/

Also, David, for our own records could you provide the URL that you're having issues with? We're always working to correct these issues.

Please let us know if you have any further issues.

Thanks,
Scott