tidyDataRecord

DataRecord sutil.tidyDataRecord ( DataRecord record ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, Map<String, Boolean> settings ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( DataRecord record, Map<String, Boolean> settings, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, Map<String, Boolean> settings ) (professional and enterprise editions only)
DataRecord sutil.tidyDataRecord ( ScrapeableFile scrapeableFile, DataRecord record, Map<String, Boolean> settings, boolean ignoreLowerCaseKeys ) (professional and enterprise editions only)

Description

Tidies the DataRecord by performing actions based on the values of the settings map given (or getDefaultTidySettings if none is given). Each value in the record that is a string will be tidied. Keys are not modified. The record given will not be modified, but a new record with the tidied values will be returned. If no settings are given, will use the values obtained from sUtil.getDefaultTidySettings().

Parameters

  • record The DataRecord to tidy (values in the record will not be overwritten with the tidied values)
  • scrapeableFile (optional) The current ScrapeableFile, used for resolving relative URLs when tidying links
  • settings (optional) The operations to perform when tidying, using a Map<String, Boolean>

    The settings tidy settings and their default values are given below. If a key is missing in the settings map, that operation will not be performed.

    Map Key Default Value Description of operation performed
    trim true Trims whitespace from values
    convertNullStringToLiteral true Converts the string 'null' (without quotes) to the null literal (unless it has quotes around it, such as "null")
    convertLinks true Preserves links by converting <a href="link">text</a> to text (link), will try to resolve urls if scrapeableFile isn't null. Note that if there isn't a start and end <a> tag, this will do nothing
    removeTags true Remove html tags, and attempts to convert line break HTML tags such as <br> to a new line in the result
    removeSurroundingQuotes true Remove quotes from values surrounded by them -- "value" becomes value
    convertEntities (professional and enterprise editions only) true Convert html entities
    removeNewLines false Remove all new lines from the text. Replaces them with a space
    removeMultipleSpaces true Convert multiple spaces to a single space, and preserve new lines
    convertBlankToNull false Convert blank strings to null literal

  • ignoreLowerCaseKeys (optional) True if values with keys containing lowercase characters should be ignored

Return Values

A new DataRecord containing all the tidied values and any values that were not Strings in the original record. The values that were Strings but were not tidied as well as the DATARECORD value will not be in the returned record.

Change Log

Version Description
5.5.26a Available in all editions.
5.5.28a Now uses a Map for the settings, rather than bit flags.

Examples

Tidy all values in an extracted DataRecord

 DataRecord tidied = sutil.tidyDataRecord(dataRecord);
 
 // Run code here to save the tidied record