Couple quick questions

I know I am in here alot.. you guys have been very helpful and I appreciate it.

I am catching on quite well to this program, would help me alot if I knew more about Java, but I don't.

There are 2 more issues that I would like to fix, and then my scraping should be perfect.

1. If any of the sub extractors comes back null, the whole entry is not entered into the text file.

2. If any of the ~@NAME@~ sub extractors come back with an apostrophe in it, I would like it to be converted to its html equivalent.

I would appreciate anyones help with this matter.

Thanks in advance!

Couple quick questions

You're correct in your assumption. If you use dataRecord.get, no flushing will be necessary.

Todd

Couple quick questions

I put the code in and everything "seems" to be working properly.

My question is this though:
If I use dataRecord.get instead of session.getVariable, will clearing out the session variables still be necessary?

If I am not mistaking, the datarecord is flushed each time it retrieves a record. I am not completely sure on this.

Thanks so much for the help!

Couple quick questions

Hi,

The code I gave you would simply insert a blank (or, more accurately, an empty string) instead of the word "null". To do what you're describing, you'd need to check each of the values to see if they extracted any data before writing it out to a file. Perhaps something like this:

if(
    session.getVariable( "FOO" )!=null
    &&
    session.getVariable( "BLAH" )!=null
    &&
    session.getVariable( "BAP" )!=null
  )
{
  // Write the data to the file.
}

The one caveat to be aware of is that you'll want to clear the session variables after each iteration, otherwise one of them could contain a value from the previous extraction. In this same script, I would do something like this:

session.setVariable( "FOO", null );
session.setVariable( "BLAH", null );
session.setVariable( "BAP", null );

Hopefully that helps.

Todd

Couple quick questions

Yes, exactly.

Will the above code you posted do this? Or is it just putting a blank where the word "null" usually goes in the record?

Thanks in advance!

Couple quick questions

Hi,

I think I'm still unclear on what you're after when you say that you want your extractors to be "required". Do you mean that, given a set of sub-extractor patterns, if any one of them doesn't match you don't want any data to be written for that particular record?

Todd

Couple quick questions

Thank you very much todd for your very helpful reply.

I probably should have rephrased my first issue in the OP.

The code you mentioned, that returns a blank to the null entry correct? So rather it saying "null" in my results, it will show a blank?

Basically, I want all my extractors to be required. If any of the extractors come back null, then I don't want that entire entry being written to the file.

I could be reading your sample code wrong, if so ignore my ignorance =P

Thank you so much for your help.

Couple quick questions

Hi,

On the first issue, you probably just want to check for a null value, and have it use an empty string instead if one is found. To do that, you'll want to add a method to your script that looks something like this:

String nullToEmptyString( String value )
{
  if( value==null )
  {
    return "";
  }

  return value;
}

Then in your code where you're writing out the values to a file, you'd do something like this:

out.write( nullToEmptyString( session.getVariable( "MY_VAL" ) ) + "\t" );

On the second issue, it sounds like you're wanting to replace the apostrophe with it's corresponding HTML entity. You'd probably do something like this:

session.setVariable( "MY_VAL", session.getVariable( "MY_VAL" ).replaceAll( "'", "'" ) );

That presupposes, of course, that the MY_VAL session variable won't be null. If it might be null, you'd use the technique I described above along with it, like this:

session.setVariable( "MY_VAL", nullToEmptyString( session.getVariable( "MY_VAL" ) ).replaceAll( "'", "'" ) );

See this page for more on HTML entities: [url]http://www.w3schools.com/html/html_entities.asp[/url].

Kind regards,

Todd Wilson