iframe removed by tidy up of html code ?

Hi,

What kind of changes does the screenscraper when tidying up the html code ?

I want to scrape the following site "http://www.hotelsalobrena.com/ing/celebraciones.php". Everything we see on this site, except the header, footer and menu, is put in an iframe. In the source of one of the pages there is this piece of code "" inside a "td" tag. In a script I capture the html body of every page with an extractor pattern. In the log of the scraping, the body of html is written out for every site page, but no iframe tag is occuring there. That particular td tag is empty. Is that iframe removed by tidying up the code ?

Thanks,
Tamara Vos

iframe removed by tidy up of html code ?

Tamara,

The quickest way to find out what changes are made by HTML Tidy is to run a scrape with it on and one with it off. Sorry if that seems like a blunt approach but I don't have access to the algorithm for which options are on and which are off (there are many possibilities).

In any case, you would be better off loading the contents of the iframe as a separate scrapeable file since the contents are, after all, a separate page as far as screen-scraper (and raw HTTP) is concerned.

Thanks,
Scott