might be SS bug? who knows

hi guys,

I got a problem with the SS.
when i trying to scrape some contents for example:

They're just as arbitrary, and just as invisible to most people. I've already said at least one thing that would have gotten me in big trouble in most of Europe in the seventeenth century, and did get Galileo in big trouble when he said it-- that the earth moves.Let's start with a test.

all become's like this in the "last respone" window as below:

They?re just as arbitrary, and just as invisible to most people. I?ve already said at least one thing that would have gotten me in big trouble in most of Europe in the seventeenth century, and did get Galileo in big trouble when he said it-- that the earth moves.Let?s start with a test.

You see what Im saying here? All these single quote becomes question mark. Just wondering am I the only user got this kind of problem or there are some guys got this too?

br,
Max

might be SS bug? who knows

max,

I opened the HTML using jEdit to see what character set it's using ([url=http://www.jedit.org/]jEdit[/url] is the only Windows editor I know of that will consistently render a document's character set accurately if the document contains unprintable characters) and it shows that the character set being used is Cp1252.

If you set your screen-scraper character set to Cp1252 it will render the otherwise unprintable characters. I would still recommend that you not tidy the HTML (this doesn't affect the rendering of the unprintable characters) but it will make matching your extractor patterns more consistent.

Thanks,
Scott

might be SS bug? who knows

hi Scott,

Thanks for your reply.And none of these two suggestions does the trick.

Here is the log we can have a look at:

Starting scraper.
Running scraping session: New Scraping Session
Processing scripts before scraping session begins.
Scraping file: "New Scrapeable File"
New Scrapeable File: Preliminary URL: http://www.xtremecomputing.co.uk/review.php?id=246&page=6
New Scrapeable File: Resolved URL: http://www.xtremecomputing.co.uk/review.php?id=246&page=6
New Scrapeable File: Sending request.
New Scrapeable File: Sorry, tidying HTML failed. Returning the original HTML.
Processing scripts after scraping session has ended.
Scraping session finished.

The result is for example "Okay, it didn’t pass the acid test, I’ll admit that, but looking at the other factors of this review I’ll still say I like the features and the ideas of this PSU. " becomes "Okay, it didn?’t pass the acid test, I’?ll admit that, but looking at the other factors of this review I?’ll still say I like the features and the ideas of this PSU."

Is it due the reason of tidying HTML failed? How would you resolve this kind of problems?

br,
Max

might be SS bug? who knows

Max,

Two things can affect whether or not the HTML in the Last Response tab displays ?s or 's. First I would try changing the Default character set under Options > Settings to either UTF-8 or ISO-8859-1. Those are the most universal character sets.

If altering your character set to either of those does not render the single-quotes properly make sure that under the Advanced tab of the scrapeable file you have the box checked for "Tidy HTML after scraping?".

If none of these suggestions does the trick please either post the URL you're having trouble with or you can private message it to me or email it to my name and last initial at our domain.

Thanks,
Scott