NavigationUser loginscreen-scraper.com welcomes...
Currently online
There are currently 0 users and 4 guests online.
|
Tutorial 2: Page 4: Creating the Scraping Session
Create a scraping session either by clicking the "New Scraping Session" button (looks like a gear) or by selecting "New Scraping Session" from the "File" menu. In the "Name" field enter "Shopping Site" (if you already downloaded and imported the scraping session at the first of this tutorial you'll want to name your scraping session something else--perhaps "My Shopping Site"). This is the scraping session that will hold all of the files we'll be extracting data from. Remember that a scraping session is simply a container for all of the files and other objects that will allow us to extract data from a given web site. We'll now be adding scrapeable files to our scraping session. You'll remember from the first tutorial that a scrapeable file represents a web page you'd like screen-scraper to request. Add the first scrapeable file to the scraping session by clicking the "Shopping Site" proxy session in the tree on the left (the first of the two "Shopping Site" nodes), then on the "Progress" tab. Find the row in the "HTTP Transactions" table with the following URL (probably the second in the table): http://www.screen-scraper.com/shop/index.php?main_page=advanced_search_result&keyword=dvd&sort=2a&page=2This URL corresponds to the second page in the search results. We'll use this file because it should contain all of the parameters in the URL we need to request any of the search results pages (including the first). After clicking on this row in the table, information corresponding to the file will appear in the lower pane. Add the file to the "Shopping Site" scraping session by selecting it in the "Generate scrapeable file in" drop-down list, and clicking the "Go" button next to the "Generate scrapeable file in" drop-down list. After the scrapeable file appears under the scraping session rename it to "Search results". Next, click on the "Parameters" tab. Remember that when we generate a scrapeable in this way screen-scraper pulls out the parameters from the URL and puts them under the "Parameters" tab for us. Because these are "GET" parameters (as opposed to "POST" parameters), when the scrapeable file is invoked by screen-scraper in a running scraping session, the parameters will get appended again to the URL. Let's take a closer look at each of the parameters that were embedded in the URL: * main_page: advanced_search_result The only two that we're likely interested in are "keyword" and "page". We can guess that "keyword" refers to the text we typed into the search box initially. The "page" parameter refers to what page we're on in the search results. We can guess that if we were to replace the "2" in the "page" parameter of the URL it would bring up the first page in the search results. Try this by bringing up the following page in your web browser: http://www.screen-scraper.com/shop/index.php?main_page=advanced_search_result&keyword=dvd&sort=2a&page=1Looks like our theory was correct. You should see the first page of search results. It's also important to note that the "keyword" and "page" parameters are those that will need to be dynamic. We'll get to that in a minute.
|
SearchNew Video!Tags Throughout this Site |
Recent comments
15 hours 41 min ago
15 hours 57 min ago
16 hours 46 min ago
16 hours 54 min ago
17 hours 4 min ago
17 hours 14 min ago
2 days 13 hours ago
3 days 20 hours ago
3 days 20 hours ago
5 days 17 hours ago