How do I ignore information that appears only occasionally?

Hello! I'm trying to scrape a [url=http://www.seetickets.com/see/event.asp?startno=120&e%7Cartist=&re%7Ceventtype=1&RE%7Ceventtype%7C1=2&RE%7Ceventtype%7C2=5&RE%7Ceventtype%7C3=6&RE%7Ceventtype%7C4=14&RE%7Ceventtype%7C5=17&RE%7Ceventtype%7C6=18&RE%7Ceventtype%7C7=19&RE%7Ceventtype%7C8=20&RE%7Ceventtype%7C9=21&RE%7Ceventtype%7C10=22&RE%7Ceventtype%7C11=23&RE%7Ceventtype%7C12=24&RE%7Ceventtype%7C13=25&RE%7Ceventtype%7C14=26&RE%7Ceventtype%7C15=27&RE%7Ceventtype%7C16=3&filler1=see&resultsperpage=20] music event listing site[/url] and seem to be having an issue when the site sometimes inserts some extra info in front of the artist's name:

My extraction code is:

~@ARTIST@~

~@IGNORE@~

~@VENUE@~

~@IGNORE@~>~@TOWN@~

~@DAY@~

~@DATE@~

~@TIME@~

~@DATARECORD@~

Normally, the html I'm interested in for the artist name would look like this:

AKALA

But occasionally it has extra data before the artist name:

AL STEWART

How do I ignore this extra data when it appears so that the ARTIST field is still correct? I'm guessing that a regular exression should sort this but I've not had any luck so far... any ideas?

Thanks in advance for any help
- Sandy

rhubarb on 09/09/2006 at 10:14 am

screen-scraper public support