Regular Expression Help

Introduction

Regular Expressions, often abbreviated to simply "Regex", are the power and flexibility behind a scraping session. While we won't go into the details about how they work (information that is readily available on the web, for instance at regular-expressions.info), we wanted to give various useful pointers about using them.

There are places where you will use regular expressions in screen-scraper: on extractor tokens and in scripts. Each is slightly different than the other so we will discuss them in more detail according to type.

Extractor Tokens

On your extractor tokens regular expressions will help to only gather the information that you desire. screen-scraper ships with the most common regular expressions for screen scraping already added to the system. They can be selected in the general tab of the extractor token editor.

You may edit screen-scraper's regular expressions at any time by clicking on the Edit regular expressions in the Options menu.

For a detailed list and explanation of the built-in regular expressions for extractor tokens as well as some other helpful expressions see our page on helpful regular expressions.

The Regular Expression parser that is used by screen-scraper internally is a PERL compatible parser. This can be an important to those writing their own expressions.

Scripts

Scripts are parsed and this can have its own implications of how things have to be formatted. This will depend on the language that you are using in screen-scraper. Examples of particular changes that are necessary in Java are available in our java regular expression help.