Infinite Redirect Loop

Specs: Screen-Scraper v.7.0 Enterprise Edition, Windows 7 64 bit

Screen-scraper is getting stuck in an infinite loop when attempting to scrape a blank/non functioning URL. Is there any way to stop this redirect loop and move on to the next datarecord?

A dataSet of URL's is extracted from a main page, and then the next page is scraped after each pattern match of the DATARECORD.
When screen-scraper encounters a URL with non functioning webpage, it gets stuck looping in redirects. This same loop appears to occur when attempting to view the website link within a browser as well.
The odd thing is that the last response tab of the scrapeableFile does not show the URL that is looping. It instead shows the last successful URL that was scraped. I believe this indicates that the redirect loop is occurring before the page is even loaded in the scrapeableFile. Is there any way to stop this from happening?
Here is the log from screen-scraper.

Details Page: Requesting URL: http://www.countyoffice.org/south-ridge-elementary-school-an-ib-world-school-staten-island-ny-f49/
Details Page: Extracting data for pattern "Untitled Extractor Pattern"
Processing script: "County Office - CSV"
The token "LATITUDE" in sub-extractor pattern #8 has no regular expression.
The token "LONGITUDE" in sub-extractor pattern #9 has no regular expression.
Processing script: "Scrape Details Page"
Scraping file: "Details Page"
Details Page: Requesting URL: http://www.countyoffice.org/ss-joseph-thomas-school-staten-island-ny-20d/
Details Page: Extracting data for pattern "Untitled Extractor Pattern"
Processing script: "County Office - CSV"
The token "LATITUDE" in sub-extractor pattern #8 has no regular expression.
The token "LONGITUDE" in sub-extractor pattern #9 has no regular expression.
Processing script: "Scrape Details Page"
Scraping file: "Details Page"
Details Page: Requesting URL: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/
County Office: Redirecting to: http://www.countyoffice.org/yeshiva-of-staten-island-new-york-ny-2d7/

Google Chrome won't display the website because of infinite redirects.

The www.countyoffice.org page isn’t working
www.countyoffice.org redirected you too many times.
Try:
Reloading the page
Clearing your cookies
ERR_TOO_MANY_REDIRECTS

GenericPause on 08/05/2016 at 5:50 pm

screen-scraper support for licensed users

That site is broken. We have

That site is broken. We have to deal with that sometimes. Probably the best thing to do is make a script like this, and it would run before the session is run:

import com.screenscraper.events.*;
import com.screenscraper.events.scrapeablefile.*;
import java.util.concurrent.atomic.*;

EventHandler redirectHandler = new EventHandler()
{
// Use atomic integer as I've had some issues with a regular one in a Beanshell class
AtomicInteger numRedirectsThisPage = new AtomicInteger(0);

public String getHandlerName() { return "Too Many Redirects Handler"; }

public Object handleEvent(EventFireTime fireTime, ScrapeableFileEventData data)
{
// Use an object since the before request and on redirect return different types
Object shouldFollowRedirect = data.getLastReturnValue();
if(fireTime == ScrapeableFileEventFireTime.BeforeHttpRequest)
{
// Reset, no redirects yet
numRedirectsThisPage.set(0);
}
else if(fireTime == ScrapeableFileEventFireTime.OnHttpRedirect)
{
int redirects = numRedirectsThisPage.incrementAndGet();

if(redirects > 10)
{
shouldFollowRedirect = false;
}
else
{
shouldFollowRedirect = true;
}
}
return shouldFollowRedirect;
}
};
session.addEventCallback(ScrapeableFileEventFireTime.BeforeHttpRequest, redirectHandler);
session.addEventCallback(ScrapeableFileEventFireTime.OnHttpRedirect, redirectHandler);

See the line: if(redirects > 10), and you can modify it if you need more redirects to work right.

jason on 08/08/2016 at 10:17 am

Worked

Thank you Jason. That worked well. Any chance this will be added as part of the Screen-Scraper API in the future?

GenericPause on 08/08/2016 at 5:55 pm

Since this situation is so

Since this situation is so rare, it's unlikely to be added to the interface.

jason on 08/09/2016 at 7:46 am

Search

Community

screen-scraper

User login

Infinite Redirect Loop

That site is broken. We have

Worked

Since this situation is so