Tutorial 6: Page 4: Generating the XML Feed

Generating the XML Feed

Let's run a quick test just to make sure the scraping session works. After that, we'll add a few more bells and whistles. Start up screen-scraper as a server. If you need help on that try this page. Once that's up, assuming you haven't altered the default "SOAP Server" port (which is also the web server port), and that you're running screen-scraper on your local machine, try entering this URL in to your browser:

http://localhost:8779/ss/xmlfeed?scraping_session=Shopping+Site&SEARCH=bug

If all goes well the browser should take a little bit to load, then you should see an XML document appear containing the extracted information. If you got an error message or the document didn't appear as you expected it to, check screen-scraper's log. Just as with scraping sessions run remotely, screen-scraper will create a log file in its "log" folder corresponding to each RSS/Atom scraping session.

Dealing with the URL directly can be a bit cryptic, what with the encoding and all. As such, let's make use of a little HTML file that will allow us to generate feeds using different search parameters and formats. You can access it here. Note that this HTML file assumes that you're running screen-scraper as a server on your local machine on port 8779. If any of that isn't the case you'll want to download the HTML file to your local machine, alter it with your settings, then open it back up in your browser.

Try experimenting with the form a bit. It gives you control over most all of the features that are available, including the format of the feed. Also take a close look at the URL. screen-scraper simply converts the GET parameters in the URL to session variables in the scraping session. If you'd like, you can even open the feed in your favorite RSS/Atom reader to ensure that the format is valid.