Can SS extract this 'BrowsePDFServlet' Object?

Hello - thx in advance for your help -

I'm trying to scrape specific files from the US Patent site. They are presented openly one by one, but of course I would like to search through them programmatically.

At this page:
http://portal.uspto.gov/external/portal/!ut/p/_s.7_0_A/7_0_CH/.cmd/ad/.ar/sa.getBib/.c/6_0_69/.ce/7_0_1ET/.p/5_0_18L/.d/1?selectedTab=ifwtab&isSubmitted=isSubmitted&dosnum=09893809

You can find this code:

11-29-2005

onclick="NewWindow('/external/PA_1_0_18L/view/BrowsePdfServlet?objectId=EGNK4OSSPP1GUI4&lang=DINO', 12);return false;"
onMouseOver="window.status='CTFR'; return true"
onMouseOut="window.status=''; return true">Final Rejection

10

type="checkbox" id="cb12" alt='checkBoxLabel12'
label='checkBoxLabel12'
onClick="javascript:updateTotalDownloadSize(this.checked,10);setNPL(this,'CTFR')"
value="CTFR">

On click, the referenced pdf file is displayed in a pdf viewer provided by USPTO. Manually, you can save this file to disk, simply using Acrobat Reader 'Save Copy to Disk' menu options. However, I would like to use SS to save this puppy to disk.

But it looks pretty tricky. I thought I'd first check to see if it's even possible. What do y'all think?

Thanks!

Can SS extract this 'BrowsePDFServlet' Object?

Hi,

It's certainly possible. You just need to extract the URL, then use session.downloadFile to download it. I'm guessing your extractor pattern would look something like this:


"NewWindow('~@PATH@~', 12);

and your script might look like this (assuming you're saving the value ~@PATH@~ extracts into a session variable):


fullURL = "http(//portal.uspto.gov" + session.getVariable( "PATH" ));

session.downloadFile( fullURL, "C:/my_dir/my_doc.pdf" );

Just let me know if I can clarify anything on that.

Kind regards,

Todd Wilson