Scrape Only Recent Information

This script is designed to check how recent a post or advertisement is. If you were gathering time sensitive information and only wanted to reach back a few days then this script would be handy. After evaluating the date there will be a section for calling other scripts from inside this script.

//start with these imports
import java.util.Date;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.lang.*;
import java.util.*;
import java.io.*;

// Function to parse the passed string into a date
makeDate(date)
{
//This is the format for your date. It is in the April 20, 1999 format
formatter = new SimpleDateFormat("MMM d, yyyy");

//some other options instead of blank could be null, N/A, etc. Really it just depends on how the site is structured.
    if (date.equals("BLANK")){
        session.log(" ---NO ATTEMPT TO PARSE BLANK DATE");
    }
//if it is not blank go ahead and parse the data and apply the Format above. This will also print the date to the log.
    else{
        date = (Date)formatter.parse(date);
        session.log(" +++Parsed date " + date);
    }
    return date;
}

// Function to get current date
oldestDate(){
    // Set number of days to minus from current date.
    minusDays = -5;

    // Get the current date or instance, then you are going to add a negative amount of days. If that seems strange
     // Just trust us. This is not a double negative thing.
    Calendar rightNow = Calendar.getInstance();
    rightNow.add( Calendar.DATE, minusDays );

    // Substitute the Date variable endDate for rightNow becuase it makes more sense to
     // Return endDate than a variable named rightNow which is 5 days in the past.
    Date endDate = rightNow.getTime();
    session.log("The end date is: " + endDate);
    return endDate;
}

// Parse posted date. you are getting this posted date from a dataRecord.
// if you were getting it from a session variable it would say session.getVariable("POSTED_DATE")
posted = makeDate(dataRecord.get("POSTED_DATE"));

// Parse the current Date and return it in a format that you can compare to the advertisement or post date.
desired = oldestDate();

// Compare the two.<br />
 if (posted.after(desired) || posted.equals(desired))
{
    session.log ("AD IS FRESH. SCRAPING DETAILS.");

    // If you are keeping track of URLs this will get it from the scrapeable file.
    session.setVariable ("SOURCE_URL", scrapeableFile.getCurrentURL() );

    // This is the place in the code where you would execute additional scripts.
    session.executeScript("Your script name here");
    session.executeScript("Your second script name here");
}
else{
    session.log("Posted is too old");
}

Hopefully it is evident that the above code is useful in comparing todays date against a previous one. Depending on your needs you might consider developing a script which will move your scraping session on after it reaches a certain date in a listing. For example if you were scraping an auction website for many terms you might want to move on to the next term after you have reached a specified date for the listings. What are some other ways this script could be useful?