tidyString

String sutil.tidyString ( String value ) (professional and enterprise editions only)
String sutil.tidyString ( String value, Map<String, Boolean> settings ) (professional and enterprise editions only)
String sutil.tidyString ( ScrapeableFile scrapeableFile, String value ) (professional and enterprise editions only)
String sutil.tidyString ( ScrapeableFile scrapeableFile, String value, Map<String, Boolean> settings ) (professional and enterprise editions only)

Description

Tidies the string by performing actions based on the values of the settings map.

Parameters

  • value The String to tidy
  • settings(optional) The operations to perform when tidying, using a Map<String, Boolean>

    The tidy settings and their default values are given below. If a key is missing in the settings map, that operation will not be performed.

    Map Key Default Value Description of operation performed
    trim true Trims whitespace from values
    convertNullStringToLiteral true Converts the string 'null' (without quotes) to the null literal (unless it has quotes around it, such as "null")
    convertLinks true Preserves links by converting <a href="link">text</a> to text (link), will try to resolve urls if scrapeableFile isn't null. Note that if there isn't a start and end <a> tag, this will do nothing
    removeTags true Remove html tags, and attempts to convert line break HTML tags such as <br> to a new line in the result
    removeSurroundingQuotes true Remove quotes from values surrounded by them -- "value" becomes value
    convertEntities (professional and enterprise editions only) true Convert html entities
    removeNewLines false Remove all new lines from the text. Replaces them with a space
    removeMultipleSpaces true Convert multiple spaces to a single space, and preserve new lines
    convertBlankToNull false Convert blank strings to null literal

  • scrapeableFile (optional) The current ScrapeableFile, used for resolving relative URLs when tidying links

Return Values

The tidied string

Change Log

Version Description
5.5.26a Available in all editions.
5.5.28a Now uses a Map for the settings, rather than bit flags.

Examples

Tidy a comment extracted from a website

Assuming the extracted text's HTML code was:
&nbsp;&nbsp;<a href="http://www.somelink.com">This</a> was great because of these reasons:<br />
1 - Some reason<br />
2 - Another reason<br />
3 - Final reason

 String comment = sutil.tidyString(scrapeableFile, dataRecord.get("COMMENT"));

The output text would be:

This (http://www.somelink.com) was great because of these reasons:
1 - Some reason
2 - Another reason
3 - Final reason

Run only specific operations

 Map settings = new HashMap();
 settings.put("convertEntities", true);
 settings.put("trim", true);
 String text = sutil.tidyString("&nbsp;A String to tidy", settings);