Ignoring HTML codes

I'm using an evaluation copy of screen-scraper professional (Version 2.7.2).

I would like to setup an extraction pattern that removes the URL from the sample below (without the extra HTML code). I reviewed other entries in this forum and saw a reference to a "Strip HTML" checkbox under the Advanced tab for a given extraction pattern. I do not see that checkbox listed.

Is there also a way to do this with a regular expression? Please describe.

Appreciatively,
Peter

_____________

First try accessing the PHP directly here: http://www.screen-scraper.com/support/tutorials/tutorial5/db/save_product.php.

PeterWest on 10/23/2006 at 11:36 am

screen-scraper public support

Ignoring HTML codes

Hi Todd,

Wonderful explanation! Thanks for presenting different options.

I would never have figured out the token editting procedure without your help.

Best regards,
Peter

PeterWest on 10/23/2006 at 6:17 pm

Ignoring HTML codes

Hi Peter,

The simplest way to do this would be to create a targeted extractor pattern so that it pull only the URL from the HTML. Something like this:

" target="_new">~@URL@~</a>

or perhaps this:

If you use the "Strip HTML" text box on that entire string of text, you'd get something like this:

First try accessing the PHP directly here: http://www.screen-scraper.com/support/tutorials/tutorial5/db/save_product.php

To address your question on the "Strip HTML" option, if you edit an extractor pattern token (double-click it; or select it, right-click and select "Edit token"), under the "Advanced" tab you'll see the option.

Kind regards,

Todd Wilson

todd on 10/23/2006 at 2:48 pm

Search

Community

screen-scraper

User login

Ignoring HTML codes

Ignoring HTML codes

Ignoring HTML codes