Why will my 1st scraped file not show request/response ?

Hi all

I have a scraping session consisting of 5 files. When I run the scraping session in the workbench, there is no details for the last request/response tabs for the 1st file. The subsequent 4 are fine.

I took the URL from the first file, and created a fresh scraping session for that, ran it, and there is data listed in the "last request/response" tabs.

What am I doing wrong ?

I tried playing with the advanced settings on the scraped session, but it didn't seem to make anydifference.

Incidentally, I can invoke the scraped session from COM no problem ...

Why will my 1st scraped file not show request/response ?

Hi Jason,

This sounds like it could be a result of a bug we fixed in version 3.0.1a ([url]http://blog.screen-scraper.com/2007/01/31/version-301a-of-screen-scraper-available/[/url]). You might try upgrading so that you don't have to do a workaround.

Kind regards,

Todd

Why will my 1st scraped file not show request/response ?

Hi Todd,

thanks for your time on this.

Digging a bit deeper, it appeared that the 1st page had two variables which were being POSTed (username/password). In my scraping session they are referred to as session variables ("~#Username#~"). Because they weren't assigned a value, the log showed a java error. In my scraping session I added a script to set them, and the error went.

However, I still can't see the last response/request BUT I can run extarctor patterns over them, and it returns matches, which is what I wanted to do. I have now advanced my project so that I can catch the "invalid username" text, and respond appropriately.

Unfortunately the app I'm scraping is very specialised - it's designed for intranets and IE exclusively. However, I have managed to get about 25% of what I need to do done, and it wouldn't have been possible without the screenscraper.

thanks again

Why will my 1st scraped file not show request/response ?

Hi,

This one is a bit of a puzzle. We tried replicating what your log shows as closely as we could, and screen-scraper seems to handle the URL's and redirects just fine (we were initially thinking it could have been something related to encoding characters in the URL).

Given your previous postings on this particular web app, is there any chance it could be sending back non-standard HTTP responses? If you proxy the site, do all of the requests and responses come through without any problems? Unfortunately this is obviously a web app running on your local intranet; otherwise, we might be able to hit it directly. As things stand, we're flying a bit blind in trying to help troubleshoot.

Kind regards,

Todd Wilson

Why will my 1st scraped file not show request/response ?

Hi Scott,

No, there is definitely a line saying "Scraping file initial request". No errors on the error log, and the site I'm scraping is internal, and very quick - all 5 files get scraped in a couple of seconds. There are no extractor patterns either. Yet I still don't see anything in the Last request/response tabs.

Here's an extract from the scraping log :

Starting scraper.
Running scraping session: LoginScreenToSearch
Processing scripts before scraping session begins.
Processing script: "Set Login"
Scraping file: "InitialRequest"
InitialRequest: Preliminary URL: http://cgtpony2/onyxemployeeportal_onyx/common/onyxAuthenticate.asp?referrer=/onyxemployeeportal_onyx/powerpage/main_frame.asp
InitialRequest: Using strict mode.
InitialRequest: POST data: username=sa&password=wrong
InitialRequest: Resolved URL: http://cgtpony2/onyxemployeeportal_onyx/common/onyxAuthenticate.asp?referrer=/onyxemployeeportal_onyx/powerpage/main_frame.asp
InitialRequest: Sending request.
InitialRequest: Redirecting to: http://cgtpony2/onyxemployeeportal_onyx/common/onyxlogin.asp?referrer=/onyxemployeeportal_onyx/powerpage/main_frame.asp&errorMessage=User+log+in+failed%2E+Invalid+username+and%2For+password%2E
Scraping file: "File2"

any ideas ?

Re: Why will my 1st scraped file not show request/response ?

Jason,

The first thing I would suggest is to verify that your first scrapeable file is actually being called during your scraping session. To do this consult your log and look for an entry similar to the following. First, I would either remove any entry for the field above the log labeled "Show only the following number of lines..." or set that entry to a large number, say, 2000.

Scraping file: "Scrapeable File 1"

Where "Scrapeable File 1" is your first scrapeable file and the one in question.

If there is an entry in your log indicating that this file is being called as expected then the next step will be to check to for any errors following the file being called.

A slow-responding or non-responsive site may throw an error indicating that the connection timed-out.

Alternatively, if an extractor pattern in your scrapeable file takes too long to process you will get an error that looks like this.

Scrapeable File 1: An input/output error occurred while connecting to 'http://www.that-one-domain-i-like-so-much.com'. The message was Read timed out.

The potential fix for both of these scenarios can be found with the program settings. In the workbench go to Options > Settings > General Settings Tab. Increase the Connection & Data extractor timeouts by a few seconds. You may end up increasing these items to 15 or 30 seconds depending on the responsiveness of the server and the effectiveness of your extractor patterns. It should not hamper your scrapping session to have these set a little high because Screen-Scraper will not utilize all the time provided unless it needs to.

http://www.screen-scraper.com/support/docs/settings.php

Please let me know what you find.

Keep on scrapin',

-Scott
javascript:emoticon(':wink:')
Wink

[quote="JasonLoCascio"]Hi all

I have a scraping session consisting of 5 files. When I run the scraping session in the workbench, there is no details for the last request/response tabs for the 1st file. The subsequent 4 are fine.

I took the URL from the first file, and created a fresh scraping session for that, ran it, and there is data listed in the "last request/response" tabs.

What am I doing wrong ?

I tried playing with the advanced settings on the scraped session, but it didn't seem to make anydifference.

Incidentally, I can invoke the scraped session from COM no problem ...[/quote]