Joining tables - DATARECORD shows up

Ok, so I have an extraction pattern like this:

~@DATARECORD@~

And the identifier is TABLE1, set to save that as the dataset. And if a dataset by the same name exists, the dropdown is set to "Join" the two tables.

Subextractor pattern like this:

src="~@SRS@~"

So I get a table with someimage locations in it. That's cool.

======

If I add another extraction pattern that's identical to the one outlined above, also marked as 'join' to the previous table, not only do I get the data I'm looking for, but the DATARECORD field is now showing up in TABLE1 as well. Which I don't really want.

So is this a bug, and if not, what am I not understanding?

In either case, what would be the most efficient way to remove the DATARECORD column from Table1?

(These are not the columns or patterns or values I'm actually using, I just simplified it for the sake of discussion.)

Export of sample scrape:


ScrapingSessionDatarecordoddness

Datarecordoddness1201http://www.cnn.comNew Scrapeable File
~@DATARECORD@~ TABLE1DATARECORD
src="~@SRS@~"SRS

ExtractorPattern ~@DATARECORD@~ TABLE1DATARECORDsrc="~@SRS@~"SRSExtractorPatternScrapeableFileNew Scrapeable File


Joining tables - DATARECORD shows up

Thanks! Since I posted I threw a workaround in a script that did just that.

Just happy to know I'm not going insane. :)

Joining tables - DATARECORD shows up

Hi,

This is, indeed, a bug, and I've logged it as such. Watch for a fix in a future version. The "join" feature is rarely used, so this one apparently slipped by us.

In the meantime, you could strip out the DATARECORD value, but it's actually harmless. To remove it for a given data record, you would do something like this:

dataRecord.put( "DATARECORD", null );

Just let me know if I can clarify further on that.

Kind regards,

Todd Wilson