Newby: Can SS pas custom HTTP headers? Can SS send raw XML ?

Hi,

Apologies if this is dumb, but I'm a newbie. I have started using SS in order to render out CRM package for a PDA. I've been through the tutorials, and think I know what to do .....

Have managed to implement the login process as a scrape session which I can repeat, and get to a search screen.

I logged performing a search, and then scraped the file. However, when I try and run the scrape, it appears that some HTTP headers are missing.

Here's the request from the Proxy Transaction log :

##############################################
POST http://cgtpony2/onyxemployeeportal_onyx/common/include/otm_helper_end_po... HTTP/1.0

Accept-Language: en-gb

Pragma: no-cache

otm-page-size: 300

otm-paging: true

otm-lbo: QuickSearchPlus

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

Host: cgtpony2

Content-Type: text/xml

otm-method: CustomerQSP

Content-Length: 1242

Referer: http://cgtpony2/onyxemployeeportal_onyx/QuickSearch/search.asp?WindowTyp...

Cookie: S=7=3378295798&6=BA69A202%2DEC0B%2D488A%2D8DD6%2D05A59C85F9DA&4=sa&3=1&2=Onyx&1=2; O%5FP=39094%2E4654861111; O%5FLD=7=TRUE&2=&9=DD%2FMM%2FYYYY&10=HH%253Amm&6=&4=%253A&5=%2F&3=DMY&8=&1=2; O%5FO%5F8=true; O%5FLN=3=2&2=%2D&1=%2E; O%5FO%5F9=%2Fonyxemployeeportal%5Fonyx%2F; O%5FCC=3=2&4=Company&1=1495055&2=1495055; O%5FSTYPE=Company

Accept: */*

Proxy-Connection: Keep-Alive

##############################################

and here's what the scraper is sending :

##############################################
POST /onyxemployeeportal_onyx/common/include/otm_helper_end_point.asp?QuickSearchPlus_CustomerQSP HTTP/1.1

Cookie: O%5FP=39094%2E4654861111; O%5FLD=7=TRUE&2=&9=DD%2FMM%2FYYYY&10=HH%253Amm&6=&4=%253A&5=%2F&3=DMY&8=&1=2; O%5FO%5F8=true; O%5FLN=3=2&2=%2D&1=%2E; O%5FO%5F9=%2Fonyxemployeeportal%5Fonyx%2F; O%5FCC=2=&1=&3=&4=Company; S=7=3378296067&6=CA3B9364%2D0966%2D431B%2D95C2%2D8C3B927E5F30&4=sa&3=1&2=Onyx&1=2

Accept-Language: en-us,en;q=0.5

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)

Referer: http://cgtpony2/onyxemployeeportal_onyx/quicksearch/search.asp

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Content-Length: 1734

Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Accept-Encoding: gzip,deflate

Content-Type: application/x-www-form-urlencoded

Host: cgtpony2

key=%3Cparameters%3E%3C/parameters%3E
##############################################

as you can see the "otm-lbo: QuickSearchPlus" header is missing. Is there a way (in code if needs be) to send this ?

also, you'll see the scraped version which is POSTing the parameters has URLEncoded the XML that the original request passed as XML. Is there a way round this ?

Thanks in advance.

Newby: Can SS pas custom HTTP headers? Can SS send raw XML ?

I saw one of these the other day, as well. It was a GIS web application that was posting XML back to a script in a POST but it wasn't a value/pair situation, they were just blasting the XML back in one big blob.

We pretty much came to the decision that it's nonstandard enough not to expect screen-scraper to handle it, and we'll have to write at least part of the scrape by hand. Just the way it is, I guess.

Newby: Can SS pas custom HTTP headers? Can SS send raw XML ?

Hi Jason,

There's probably a better approach, but it may require telling you about some of the guts of screen-scraper, and would likely require a little bit of Java programming on your part. If you wouldn't mind sending me an email, I'd be happy to continue the conversation off of the forum. My email address is my first name at screen-scraper.com.

Thanks,

Todd

Newby: Can SS pas custom HTTP headers? Can SS send raw XML ?

Hi Todd,

I'm loath to give up on Screen-scraper as it get 90% of where I want to be. Typical that I have been given the most complex site we run to scrape first! But we have a growing PDA requirement, and using screen-scraper seems the best way to go as it avoids any recoding of the other sites.

Given what you have said, it looks like I'll have to create my own HTTP request at this point, and then capture the resulting output in screen-scaper for re-rendering. For what I have read, it should be possible to do this using the scripting feature of screen-scraper. After I have scraped the last page, I will create an WinHTTP.Request object and use that.

Does this sound a sensible approach ?

Newby: Can SS pas custom HTTP headers? Can SS send raw XML ?

Hi,

This one looks pretty tricky, unfortunately. First, we don't currently provide a way to add custom HTTP headers. Truthfully, the need has simply never come up. It would be possible to implement, but hasn't bubbled up to the top of our priority list.

Regarding the POST payload that's getting sent along, this seems to resemble a SOAP or XML-RPC request more than a standard HTTP request. My guess is that you're working with some type of AJAX-like application that's handling requests and responses by sending XML, rather than the more standard kind. I'm honestly not sure how you could handle those requests with screen-scraper.

I sincerely wish I could be more help on this one, but it looks like you're simply trying to scrape a site that's very non-standard. As such, screen-scraper probably isn't the most suitable tool for working with it.

Kind regards,

Todd Wilson