Looping a scraping session?

Here is my script to loop my scraping session

counter = "0";

for( int i = 0; i < 10; i++ )

{
runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "LoginSession" );

counter = Integer.parseInt( counter ) + 1;
counter = Integer.toString( counter );
runnableScrapingSession.setVariable( "COUNTER", ( counter ) );

runnableScrapingSession.setVariable( "ACCOUNT", ( "user" + counter );

runnableScrapingSession.scrape( "LoginSession" );
}

The problem is that it runs through this loop quickly and starts all 10 scraping sessions at once. The loops require logging into different accounts so the "LoginSessions" step all over each other.

I have thought of using pause as a temp fix but it is not economical especially since I am running on a wireless laptop and completion times vary every hour. They are also a long "LoginSessions"

Is there a better way than to have a really long uneconomical pause within the loop? (Next loop starts right after the "LoginSession" from the previous loop ends.)

Looping a scraping session?

Not a problem ) The two are actually syntactic cousins, though the API's will differ pretty significantly. Good luck and don't hesitate to post in the future if we can be of assistance with anything.

Best,

Todd

Looping a scraping session?

Hi Joe,

I think a different approach may be in order. screen-scraper really isn't designed to handle passing data between scraping sessions (as evidenced by your experience). As such, I think the best approach would be to invoke screen-scraper from an external application written in something like Visual Basic or Java. Ideally, you should do as little programming in screen-scraper as possible. If you use an external application you'll have a much easier time debugging.

To get you started, I'd recommend taking a look at our second tutorial (http//www.screen-scraper.com/support/tutorials/tutorial2/tutorial_overview.php), which gives a basic example of this.

Best,

Todd

Looping a scraping session?

Is this what you tried before?

Thread.currentThread().sleep( 5000 );

If not, see how that works.

Looping a scraping session?

Hi,

I'll add this as a bug for us to investigate. In the meantime, if you insert a slight delay in between each of the sessions does it make a difference?

Best,

Todd

Looping a scraping session?

Ah, that makes sense. When you drop scraping sessions into the queue for screen-scraper to process it doesn't make any guarantees as to what order they'll get run. This is because the queue mechanism is generic and intended to handle multiple requests coming in potentially simultaneously from multiple sources.

Best,

Todd

Looping a scraping session?

Setting the maximum number of concurrent sessions to 1 should fix this. Did it not?

Todd

Looping a scraping session?

Hi,

The easiest way to do this would definitely be via an external application (e.g., a Java or Visual Basic app). If you really wanted to do it from the command line, though, you'd probably want to have some kind of initialization script that checks what value was used on the last scrape, increments the value for the current scrape, then writes that value back out to the file for the next time around. You can read more about running screen-scraper from the command line here http//www.screen-scraper.com/support/docs/invoking_screenscraper_from_the_command_line.php .

I realize that's a pretty general description, but hopefully it's enough to get the ball rolling for you. Not knowing how much programming experience you have it's hard to know just how much specific help to give.

Best,

Todd

Looping a scraping session?

Hi Joe,

In answer to your questions:

1. Are you wanting to cycle through a list of names, setting each name as a session variable for each scrape? If so, I'd probably do something like this:

import java.util.ArrayList;

names = new ArrayList();
names.add( "Joe" );
names.add( "Sally" );
names.add( "Brent" );

for( int i = 0; i < names.size(); i++ )
{
runnableScrapingSession = new com.screenscraper.scraper.RunnableScrapingSession( "LoginSession" );

runnableScrapingSession.setVariable( "ACCOUNT", names.get( i ) );

runnableScrapingSession.scrape( "LoginSession" );
}

You can find documentation on the ArrayList class here: http://java.sun.com/j2se/1.5.0/docs/api/java/util/ArrayList.html.

2. This code would generate the random number:

import java.util.Random;

generator = new Random();

myRandomNumber = generator.nextInt( 27 );

3. Hm. Not that I can think of. I think you'd need to use an array. I supposed you could do something tricky along the lines of converting an integer into an ASCII character by its code. You might Google around a bit. Someone's probably written code for this.

4.
a) It depends on the operating system. If you're using Windows the Task Scheduler does a good job. On a Unix variant (e.g., Linux) you'll probably want to use cron.
b) There are a number of ways you could do this. If you're invoking screen-scraper from the command line you'll probably just need to write out files with the various values. If you're writing an application to invoke screen-scraper (e.g., Java or Visual Basic), that application could handle tracking which values have been set when. That may not sound like a very good answer to the question, but it really just depends on what your preferences are and what programming languages you're comfortable with.

And in answer to your last posting:

There are a number of ways you could do this. This is just one off the top of my head:

for( int i = 1; i < 1000; i++ )
{
num = "00" + String.valueOf( i );
num = num.substring( num.length() - 3 );
}

Kind regards,

Todd

Looping a scraping session?

Hi Joe,

Sorry for the frustration. In the future, don't hesitate to drop us an email or post to the forum before digging in too deeply to something like this. If I understand the problem correctly I believe it can be solved pretty easily. By default screen-scraper will try to run scraping sessions concurrently (i.e., more than one at a time) when they're invoked within the workbench. There are two ways you could cause this not to happen. One would be to set the "Maximum number of concurrent running scraping sessions" to 1 in the "Settings" dialog box. The other would be to turn off "lazy" scraping for your scraping sessions. Try adding this to your script prior to calling the 'scrape' method:


runnableScrapingSession.setDoLazyScrape( false );

This will cause the scraping sessions to run serially (one after the other) rather than concurrently. I wouldn't recommend using this approach when running scraping sessions within the workbench, however, as it will cause the GUI to stop responding until the scraping sessions are finished. This is because it will cause screen-scraper not to spawn new threads for the sessions, but instead will make it run them all on the current thread, which disallows other events in the meantime.

Hopefully this does the trick. If not, feel free to write back.

Best,

Todd

Looping a scraping session?

Also I am extremely annoyed with myself. I am not a programmer and I can't even figure out how to pause.

How do you pause while their is not a scraping session running yet?

How do you pause within a session?

Where can i find out these commands.

I looked all over the web for an answer for hours.

For a 1 second pause I tried

Pause.Pause (1000);
Session.Pause (1000);
Pause (1000);
com.screenscraper.scraper.Pause (1000);

I am so annoyed. I wasted 2 hours trying to pause for one second!

I looked at beanshell.org... I tried googling java help. I tried all kinds of stuff. I can't find good help for easy easy easy stuff.