Invoking screen-scraper via .NET
screen-scraper needs to be running as a server before invoking it from a .NET class. Please read that section now, if you haven't already. For an example of using the .NET driver please see Tutorial 4: Scraping an E-commerce Site from External Programs.
If you're using Visual Studio 2008 or later, the project 'Target Framework' will need to be set to .NET 3.5 or later. However, do not use any .NET client frameworks since they do not have the required libraries for your project to compile.
A C# application interacts with screen-scraper via the class Screenscraper.RemoteScrapingSession. You can utilize the Screenscraper.RemoteScrapingSession class by compiling with a reference to the "RemoteScrapingSession.dll", which is found in the "misc/dotNET" folder of your screen-scraper distribution.
There are only a handful of methods in the RemoteScrapingSession class, which are documented below:
- RemoteScrapingSession( string identifier ). Instantiates a
RemoteScrapingSessionidentified byidentifier. If this constructor is called the default host (localhost) and port (8778) will be used. - RemoteScrapingSession( string identifier, string host, int port ). Instantiates a
RemoteScrapingSessionidentified byidentifier, and connecting to the server found athostlistening onport. - RemoteScrapingSession( string host, int port ). Instantiates a
RemoteScrapingSessionwhich is connected to the server found athostlistening onport. ARemoteScrapingSessionobject instantiated with this constructor is mainly used for stopping a running scraping session. - Disconnect(). Should be called once you're done interacting with the RemoteScrapingSession object so that screen-scraper can clean up.
- SetVariable( string varName, string value ). Sets a session variable in the session that will be accessible from within a screen-scraper script.
- Scrape(). Causes the session to scrape. This is equivalent to clicking the "Run Scraping Session" button from within screen-scraper on the "General" tab for a scraping session.
- GetVariable( string varName ). Gets the value of a session variable that was set during the course of the scraping session. Note that currently only Strings, DataRecords, and DataSets can be accessed by this method.
- StopServer(). Causes the server connected to by this instance of
RemoteScrapingSessionto stop scraping if it currently was. - Timeout. A property which allows one to set the timeout of this scraping session. The value passed in here will be an
intrepresenting the number of minutes before timing out. - SessionTimedOut. This is a property which allows one to get a
boolvalue which allows one to know if the scraping session timed out when it ended. - LazyScrape. A property which can be set to true or false. The default value is false. When this property is set to true, then requests for pages will be done in different threads, or simultaneously. If one page does not rely on the scraping of another page, this could significantly increase the speed of the scrape.
It is also possible to store data sets and data records in session variables, which can then be accessed via the RemoteScrapingSession class. Data set objects are analogous to database result sets and data records are analogous to individual records within a result set. When an extractor pattern is applied a data set of data record objects is generated. Storing the resulting data set in a session variable (within a screen-scraper script) will allow for it to be accessed via a RemoteScrapingSession.GetVariable call.
The data record class (Screenscraper.DataRecord) simply extends Microsoft's Hashtable. Documentation on methods in the data set class (Screenscraper.DataSet) can be found below:
- AllDataRecords. Is a property which returns all of the
DataRecordobjects as anArrayListofDataRecords. - DataRecord this[ int dataRecordNumber ]. Is an indexer method which will return the
DataRecordat positiondataRecordNumbercontaining data extracted from a single application of anExtractorPattern. - int NumDataRecords. Is a property which gets the number of
DataRecords held by this object. - string this[ int dataRecordNumber, string identifier ]. Another indexer which returns a single item of data identified by
identifierfrom theDataRecordatdataRecordNumber.
- Printer-friendly version
- Login or register to post comments

