scraping a PDF?

Is it possible? Any suggestions?

I have a PDF-to-html utility, but this seems sort of a roundabout way to do it.

Thanks for any advice.

Jim

Jimbo on 01/20/2007 at 5:48 am

screen-scraper public support

scraping a PDF?

Thanks, Todd, for the reply and link to the blog. You always seem to take the time to help out the little guy, and I will forever be a grateful customer. You guys rock!

To clarify my earlier comment of 'Saving As' html... I am using Acrobat v6.0 Pro. I just looked at v6.0 Reader and it does not appear to have a 'Save To... html' option, only a 'Save As Text,' or 'Save A Copy.'

I haven't looked at newer versions of Reader. (I think they're up to v8.0?)

I'll try the local scrape in the next few days and report back here.

Jim

Jimbo on 01/22/2007 at 1:55 pm

scraping a PDF?

Hi Jim,

You can scrape a local file. You'll want to create a scraping session, add a scrapeable file, then, in the "URL" field for your scrapeable file, you'll do something like this:

c:\myfile.htm

That is, just give it the local path to your file.

Also, you've essentially already discovered it, but, if it helps, here's a blog posting on scraping data from PDF files:

[url]http://blog.screen-scraper.com/2006/08/02/extracting-data-from-pdf-files/[/url]

Kind regards,

Todd Wilson

todd on 01/22/2007 at 10:06 am

scraping a PDF?

Okay, I've saved it as one long .htm file on my local drive. I created a Proxy Session, but can't get an HTTP Transaction when I use File Open to access it in Internet Explorer.

I can't seem to get screen-scraper to look at it. Can I scrape a locally saved file?

Jim

Jimbo on 01/20/2007 at 7:16 am

scraping a PDF?

Okay, sorry for the premature post. I see that I can Save As html from within Acrobat. Now I need to figure out how to save a 61 page document as one long page. Or do I? If anyone has done this, please advise.

I'll update as I make headway. Thanks.

Jim

Jimbo on 01/20/2007 at 6:02 am

Search

Community

screen-scraper

User login

scraping a PDF?

scraping a PDF?

scraping a PDF?

scraping a PDF?

scraping a PDF?