Is Javascript encrypted pages a problem?

Hi,

The site I'm scraping from: http://portal.uspto.gov/external/portal/pair
has Javascript. I'm really not sure how to go about scraping the data off of transaction histories since the url does not have a pattern.

These searches give me:

06/599,702
06/565,333

http://portal.uspto.gov/external/portal/!ut/p/kcxml/04_Sj...
http://portal.uspto.gov/external/portal/!ut/p/kcxml/04_Sj...

How do I go about this?

Is Javascript encrypted pages a problem?

Razzlyrelic,

It sounds like you're going to need a hand with your project. If you would like to have us do the scrape for you please let us know.

Thank you,
Scott

But what if I want to repeat searches?

I need to search 8000 patent numbers, so I need a generic url, but this site seems to have cookies. It uses session id. I'm not sure how to deal with that.

screen-scraper proxy server

Razzlyrelic,

Using the screen-scraper proxy server to set up your scraping session will help make the large amounts of cryptic code seem less daunting. Most of the time it should not matter whether or not the site is using JavaScript. Regardless of how an HTTP transaction is implemented or received it all travels through a structured HTTP header.

Try not to view any of the code from your browser, but instead start the screen-scraper proxy server, set your browser to send its transactions through the proxy server, and navigate your way through the 3 or 4 Web pages that a user accesses to get to the data you're interested in. Then, migrate the files you need from the proxy transactions log into your scraping session and work with those files and the data in those last request/response tabs.

http://www.screen-scraper.com/support/docs/proxy_server_overview.php

-Scott