How do I ignore information that appears only occasionally?

Hello! I'm trying to scrape a [url=http://www.seetickets.com/see/event.asp?startno=120&e%7Cartist=&re%7Ceventtype=1&RE%7Ceventtype%7C1=2&RE%7Ceventtype%7C2=5&RE%7Ceventtype%7C3=6&RE%7Ceventtype%7C4=14&RE%7Ceventtype%7C5=17&RE%7Ceventtype%7C6=18&RE%7Ceventtype%7C7=19&RE%7Ceventtype%7C8=20&RE%7Ceventtype%7C9=21&RE%7Ceventtype%7C10=22&RE%7Ceventtype%7C11=23&RE%7Ceventtype%7C12=24&RE%7Ceventtype%7C13=25&RE%7Ceventtype%7C14=26&RE%7Ceventtype%7C15=27&RE%7Ceventtype%7C16=3&filler1=see&resultsperpage=20] music event listing site[/url] and seem to be having an issue when the site sometimes inserts some extra info in front of the artist's name:

My extraction code is:

~@ARTIST@~

~@IGNORE@~

~@VENUE@~

~@IGNORE@~>~@TOWN@~

~@DAY@~

~@DATE@~

~@TIME@~

~@DATARECORD@~

Normally, the html I'm interested in for the artist name would look like this:

AKALA

But occasionally it has extra data before the artist name:

 Event Information AL STEWART

How do I ignore this extra data when it appears so that the ARTIST field is still correct? I'm guessing that a regular exression should sort this but I've not had any luck so far... any ideas?

Thanks in advance for any help
- Sandy

How do I ignore information that appears only occasionally?

Hi,

There are two ways you could handle this. The first would be to use sub-extractor patterns so that you match just the individual fields in each row. Based on the HTML you included, it looks as though this may be a good approach, though, simply because the fields are so similar. The other approach would be to use two separate extractor patterns--one for the case where the stuff doesn't appear before the artist's name, and the other for the case where it does.

Kind regards,

Todd Wilson