Invoking screen-scraper via .NET

screen-scraper needs to be running as a server before invoking it from a .NET class. Please read that section now, if you haven't already. For an example of using the .NET driver please see Tutorial 4: Scraping an E-commerce Site from External Programs.

If you're using Visual Studio 2008 or later, the project 'Target Framework' will need to be set to .NET 3.5 or later. However, do not use any .NET client frameworks since they do not have the required libraries for your project to compile.

A C# application interacts with screen-scraper via the class Screenscraper.RemoteScrapingSession. You can utilize the Screenscraper.RemoteScrapingSession class by compiling with a reference to the "RemoteScrapingSession.dll", which is found in the "misc/dotNET" folder of your screen-scraper distribution.

There are only a handful of methods in the RemoteScrapingSession class, which are documented below:

  • RemoteScrapingSession( string identifier ). Instantiates a RemoteScrapingSession identified by identifier. If this constructor is called the default host (localhost) and port (8778) will be used.
  • RemoteScrapingSession( string identifier, string host, int port ). Instantiates a RemoteScrapingSession identified by identifier, and connecting to the server found at host listening on port.
  • RemoteScrapingSession( string host, int port ). Instantiates a RemoteScrapingSession which is connected to the server found at host listening on port. A RemoteScrapingSession object instantiated with this constructor is mainly used for stopping a running scraping session.
  • Disconnect(). Should be called once you're done interacting with the RemoteScrapingSession object so that screen-scraper can clean up.
  • SetVariable( string varName, string value ). Sets a session variable in the session that will be accessible from within a screen-scraper script.
  • Scrape(). Causes the session to scrape. This is equivalent to clicking the "Run Scraping Session" button from within screen-scraper on the "General" tab for a scraping session.
  • GetVariable( string varName ). Gets the value of a session variable that was set during the course of the scraping session. Note that currently only Strings, DataRecords, and DataSets can be accessed by this method.
  • StopServer(). Causes the server connected to by this instance of RemoteScrapingSession to stop scraping if it currently was.
  • Timeout. A property which allows one to set the timeout of this scraping session. The value passed in here will be an int representing the number of minutes before timing out.
  • SessionTimedOut. This is a property which allows one to get a bool value which allows one to know if the scraping session timed out when it ended.
  • LazyScrape. A property which can be set to true or false. The default value is false. When this property is set to true, then requests for pages will be done in different threads, or simultaneously. If one page does not rely on the scraping of another page, this could significantly increase the speed of the scrape.

It is also possible to store data sets and data records in session variables, which can then be accessed via the RemoteScrapingSession class. Data set objects are analogous to database result sets and data records are analogous to individual records within a result set. When an extractor pattern is applied a data set of data record objects is generated. Storing the resulting data set in a session variable (within a screen-scraper script) will allow for it to be accessed via a RemoteScrapingSession.GetVariable call.

The data record class (Screenscraper.DataRecord) simply extends Microsoft's Hashtable. Documentation on methods in the data set class (Screenscraper.DataSet) can be found below:

  • AllDataRecords. Is a property which returns all of the DataRecord objects as an ArrayList of DataRecords.
  • DataRecord this[ int dataRecordNumber ]. Is an indexer method which will return the DataRecord at position dataRecordNumber containing data extracted from a single application of an ExtractorPattern.
  • int NumDataRecords. Is a property which gets the number of DataRecords held by this object.
  • string this[ int dataRecordNumber, string identifier ]. Another indexer which returns a single item of data identified by identifier from the DataRecord at dataRecordNumber.