scrap page having session variables and not query string

i have a home page and then a search page, i provided a search page url to the scrapper tool but when the page is requested from tool it is directed to pageexpired pages because the search page has session variable and must be visited through the home page,
will this type of pages can be scrapped where we need to go to home page first and then click the search button to go to search page and then start scrapping?

scrap page having session variables and not query string

pervez,

Please have a look at a blog entry I recently completed on the topic. Hopefully, it will give you some tools to work with.

http://blog.screen-scraper.com/2008/06/04/scraping-aspnet-sites/

-Scott

scrap page having session variables and not query string

Pervez,

Please see my private message regarding your service inquiry.

If anyone else on the forum would like to offer the service that Pervez requires please feel free to offer, as well.

Thanks,
Scott

scrap page having session variables and not query string

Hello Scott,
I still could not scrap the http://espc.com/.
Can you not give me the script with the paging also done.Please.

Will it be possible for some to scrap it and will be paid for it.
If you or anyone ready we can discuss the cost for scrapping the datas for this website.
Please let me know.

Regards,
Pervez Azam

scrap page having session variables and not query string

pervez,

Yes. I believe espc.com can be scraped. Here is the link to my revised sample session.

http://www.screen-scraper.com/xfer/espc.com-2_Scraping-Session.zip

It goes as far as the first page of search results. Now, it will be your charge to make the pagination work however you can.

Good scraping,

-Scott

scrap page having session variables and not query string

Thanks Scott for the reply. Can you kindly make me clear if datas from http://espc.com/. can be scrapped. Hope to hear yes and if yes kindly provide me the example script to scrap datas for all pages of this website. I really need the scrap the website so that I can suggest this tool to my client for scrapping datas from other websites too.

scrap page having session variables and not query string

pervez,

It's a little different but nothing you can't handle. Just replicate what you did for the first page of search results and apply any knowledge gained from implementing the process.

If you have specific questions for the forum someone may feel inclined to share what they know...:wink:

Thanks,
Scott

scrap page having session variables and not query string

Thanks Scott for your suggestion but still I got the problem to scrap from second page and remaining pages. please advice as how can we do paging from the tool. I want to scrap almost 5000 records from http://espc.com/.
Please help me with the script to scrap all the datas from each page for the website http://espc.com/.
Hope you too agree this website is different from the other website and the scrapping the datas are little difficult compared to other sites.
:?

scrap page having session variables and not query string

pervez,

This is only possible when the site offers that feature. the espc.com site has the feature to display up to 50 results per page but on the interface does not offer an output any larger.

If you can identify how they are telling the database to adjust the number of output you may try sending it a request for 500 results. Who knows, you may find a kind of "back door".

Thanks,
Scott

scrap page having session variables and not query string

Hello Scott,
I have a small query regarding the site http://espc.com/.
Here the products are displayed 10 records per page, so cant we write a script to display 5000 records per page. If it is possible then we dont need to bother about paging.
Please suggest....

scrap page having session variables and not query string

Hello Scott,
Thanks for the script, i am successfully getting the response from the page now, but i have still one problem, I need to navigate through the pages and the paging is done at server end, how do i follow the paging in this case. I have successfully done for next (link) for paging but this numerical paging is different one.
Please suggest me because we have almost 500 pages.
url: http://espc.com/

scrap page having session variables and not query string

pervez,

I'm quite sorry about the session I posted not working for you. It has been updated now with a few corrections. There are two key POST parameters that need to be scraped and then passed on to the next page.

They are VIEWSTATE and EVENTVALIDATION. Both of which need to be set as session variables in order to be passed to the subsequent scrapeable file.

Please take a look again and let me know if you have any issues.

http://www.screen-scraper.com/xfer/espc.com-2_Scraping-Session.zip

Thank you,
Scott

scrap page having session variables and not query string

Hello Scott,
I imported the example xml from you but still it goes to pageexpired page.
Is there is solution for these type of website?
Need your help................
Can u provide me the xml for the site
home page url: http://www.espc.com/EspcPublic/UniversalPages/HomePage.aspx
search page url: http://www.espc.com/EspcPublic/UniversalPages/DetailedSearchResults.aspx.

scrap page having session variables and not query string

pervez,

This site is running ASP.net which requires some special attention (and patience on your side). Did you capture the pages using the screen-scraper proxy? If not, I suggest that you do since it will reveal better what is going on here before you attempt making scrapeable files.

In your proxy transactions log, notice how the home page is called twice.

http://www.espc.com/EspcPublic/UniversalPages/HomePage.aspx

The server takes your request to display the page, loads the html, drops a few cookies and then redirected you to the same URL. But the second hit on that page includes post data the values of which are revealed and are available for scraping on the first hit of the page.

In order to avoid getting the "pageexpired" page you'll need to scrape the values of VIEWSTATE & EVENTVALIDATION and use the values in the post to the home page for the second time.

This kind of approach is common in ASP.net. So, you can probably expect to see it here and there again. VIEWSTATE & EVENTVALIDATION are standard variables used by ASP.net.

Please let us know if you have any questions with my instructions. You can download the testing scraping session for this site below.

http://screen-scraper.com/xfer/espc.com_Scraping-Session.zip

Please let us know if you have any questions.

Thanks,
Scott

scrap page having session variables and not query string

Hello Scott,
I tried scrapping the two files in sequential way,first the home page and then then the search page, still the search page is redirected to expired page.
Please suggest...

scrap page having session variables and not query string

Hello Scott,
Thanks for your reply,
I created a seperate scrappable file for home page and another for search page, i can get the correct response from home page but when I run a second scrappable file, page is redirected to pageexpired page.
home page url: http://www.espc.com/EspcPublic/UniversalPages/HomePage.aspx
search page url: http://www.espc.com/EspcPublic/UniversalPages/DetailedSearchResults.aspx.

From home page i have to click on searchnow button.
Please suggest me how can i do this.
Can you please provide me the script for doing this.

scrap page having session variables and not query string

pervez,

You go it. Most of the time that does the trick. So, just create a scrapeable file that runs in sequence before the page that was causing the error and you should be good to go.

Occasionally, they'll may do some odd redirecting or they may set cookies via JavaScript. So, if it doesn't work let us know and we'll have additional suggestions.

Thanks,
Scott