only follow links that go deeper into a website

Hi,

We have a database with internetsites of hotels. It's always the specific link, so not e.g. www.marriott.com but something like http://marriott.com/property/propertypage/AUAAR. Is there a way to automatically follow only links that go lower into the website instead of links to external sites or links that go up in the website ?

At this moment I searched for all links on every page and then wrote some code in a script to check whether I would allow to follow the link or not. I used the logic that only links with the same prefix part as the page I'm now, can be followed. But of course then I'll visit all hotels of the Mariott group, instead of only the hotel of the given link.

In case there is no automatic way, can I check it in a way that works for all kind of websites ? So for all kind of websites of hotel groups and other individual hotels.

Thanks in advance,
Tamara Vos

only follow links that go deeper into a website

Tamara,

It sounds like you're using the same approach we would use. screen-scraper is not an automated engine but requires customization for each site. As you've described, however, it is possible to write scripts that are generic enough to work for multiple sites.

To your last question, I don't believe we have any one script with any measure of complexity that would actually be applicable to any and all possible sites out there. There are too many variables involved and, anyway, screen-scraper was not engineered with learning attributes built-in.

I say, go forward on the path you've taken. And good luck scrapin'.

Thanks,
Scott