New line character should be replaced by single space character

While scraping a site I noticed the problem that a "new line" character is being replaced by an empty character.

Example:
The last response:

We love the chill attitude these jeans have. Roll 'em up or roll 'em
down for the ultimate laissez-faire look.

The extracted data:
We love the chill attitude these jeans have. Roll 'em up or roll 'emdown for the ultimate laissez-faire look.

The setting "Trim white spaces" is turned off which could have explained the missing space between the word "em" and "down". Is this a bug?

I don't really know many situations if any at all where I wouldn't want the "new line" character to be replaced by a single space character.

Regards,

Edgar

That does imply that the site

That does imply that the site has either a \n or a \r\l there, but I don't know which. What I do most of the time I see this is to make a script that will run before pattern is applied that will replace all the new line characters with a
tag or a space.

What kind of script?

"What I do most of the time I see this is to make a script that will run before pattern is applied"

How do you do this without first getting the data using an extractor pattern? The only way I could think of was to call .getContentAsString, then .extractData, but that seems kind of clunky. How are you doing this?

Hey Chirs, Here is a script I

Hey Chirs,

Here is a script I use when I want to scrape stuff from ugly JavaScript, and would rather work with a readable format:

import org.apache.commons.lang.StringEscapeUtils;

test = StringEscapeUtils.unescapeJavaScript(scrapeableFile.getContentAsString());
scrapeableFile.setLastScrapedData(test);

I set that on the first extractor, to run before the pattern is applied, and my "last response" is now filled with less user hostility.

Ah, I didn't know about the

Ah, I didn't know about the .setLastScrapedData method; in fact, I don't see it in the documentation either. That could be useful in many situations. Are there other methods we don't know about? What are you hiding from us?? ;)

Thanks.