capturing URL when parameters are POST

I'm trying to scrape the following website for details

http://www.tennesseeanytime.org/soscorp/sosprog?action=corp&input_box=me...

What I need to log is: the URL of each of the details pages (the page I would reach if I clicked on the 'Details' button for each company). But the problem is, it uses POST parameters to identify the company/page. How can I get a URL in such a case?

The website is not placing any cookies either. Also, I found that, after I went to the page mentioned above, if I went to the following URL next, I was able to access the first details page.

http://www.tennesseeanytime.org/soscorp/sosprog?Details=Details&action=d...

But if I change the value of index in the 2nd URL to any other value, it still shows the same page (details page of first record), though, if you check in the HTML of the search-results page, you will see that it is the index value that determines which details page is to be displayed.

My apologies for posting a very confusing query. If I need to elaborate on the problem, please let me know.

capturing URL when parameters are POST

Hi,

Thanks for the posting. This site appears to be using both a cookie and, at times, a URL parameter to maintain session state. You'll notice that if you go to this URL

http//www.tennesseeanytime.org/soscorp/sosprog?action=corp&input_box=metlife&selection=begin&Submit=Submit+Search

It will redirect you to something like this

http//www.tennesseeanytime.org/soscorp/results.jsp

When that redirect occurs the site is apparently storing all of the search results on the server, then accessing them with subsequent requests. This generally isn't considered to be a very efficient way of building web applications, but such it is in this case. Because the information is stored on the server they're able to access the various details pages via an index. If you use screen-scraper's proxy server when clicking on one of the "Details" buttons you'll notice something like this in the POST data

Details=Details&action=detail&index=13

The "index" parameter refers to the record number that you want to take a look at, which would seem to correspond to the record the server is maintaining in memory. That is, it uses the cookie it plants to associate the index specified in the POST parameter back to the specific record you want to view.

If your desire is to set this up such that a user could easily view any of the details pages at a later time that, unfortunately, probably wouldn't work. Because of the way the web site is structured it forces you first to perform an actual search, then you can have access to the details pages (but only for as long as the information is retained in memory on the server).

Not knowing how familiar you are with web technologies what I just wrote could seem pretty confusing ) If I can clarify anything please feel free to post a reply.

Kind regards,

Todd Wilson