Scraping through Advanced Search and Java

I am looking at a gold mine of Canadian Business information and I would love to build a scrape that allows me to submit a set of variables to invoke a Detailed Search on the following page:

http://strategis.ic.gc.ca/app/ccc/search/cccSearch.do?language=eng&porta...

The result of the search is a URL, example:

http://strategis.ic.gc.ca/app/ccc/search/search.do;jsessionid=0000V1kqcV...

This URL does not allow for the tidy methods described in tutorials 3 or 7. I am not a programmer but I think I smell Java. :shock:

Once the search is complete I need to scrape the data. To get to the detailed data the user is required to click a "Consolidated Report" link followed by selecting a type of report (short, complete or custom). Finally the detail presents itself. I can go through all of the steps manually until I create a scrapable page and then load that into SS, although I haven't even tried that yet. I just was wondering if the possibility exists to build this from start to finish.
By the way, I'm having loads of fun with this product and it is possibly going to be useful in helping my wife with some of her research. I'm creating contact lists for her from various business sites and doing it manually can be quite tedious. My first real scrape however took me about 3 hours to build and troubleshoot and the end result was only 12 records that she could use. My daughter can type those 12 records in a few minutes so I have a ways to go. If I can get the site above to work I'll blow her doors off!

Scraping through Advanced Search and Java

Not a problem. Good luck with your project.

Best,

Todd

Scraping through Advanced Search and Java

Thanks, Todd. I think that you should call this forum "ask Todd" because you seem to be answering everyone's posts without much help from the community. I hope to teach myself enough to be helpful as well but I've got a long way to go.
Anyway, I can't justify paying for this work. I also believe that I can get to the point where I can get most of what I want from SS. It will take time. I will study the documentation for HTTP because I don't know what a POST request is. I also want to understand more about Java but I don't even know the difference between Interpreted Java, JavaScript, or any other flavours. You don't have to explain this stuff to me, I'll figure out what I need as I go along.
I find the tutorials helpful as an introduction, but there are so many different ways that information is presented on the web, I can see that there is much learning to do. I applaud your efforts to spread the awareness of the power harnessed in this program. Thanks very much for your support.

Scraping through Advanced Search and Java

Hi,

I took a fairly close look at the site, and it looks very scrapeable. It's definitely on the tricky side, but here are some tips that will hopefully help

- It looks like you can get all records by simply submitting the form without specifying any criteria (I think it came up with over 50K).
- Make extensive use of screen-scraper's proxy server to record the pages. The initial form is a POST request, so you'll definitely want to record it using the proxy server, then make a scrapeable file from the proxied page.
- They're using a session ID (jsessionid), which you'll need to extract from the initial form, then propagate on subsequent requests.

Hopefully that's enough to get the ball rolling. If you have trouble with it, you might consider having us do it for you. We'd be happy to provide a free quote. If that's of interest, feel free to drop us a services request

http//www.screen-scraper.com/services/services.php

Kind regards,

Todd Wilson