Module org.hsqldb

Class KMPSearchAlgorithm

java.lang.Object
org.hsqldb.lib.KMPSearchAlgorithm

public class KMPSearchAlgorithm extends Object
Implements the Knuth-Morris-Pratt string search algorithm for searching streams or arrays of octets or characters.

This algorithm is a good choice for searching large, forward-only access streams for repeated search using pre-processed small to medium sized patterns.

This is because in addition to the facts that it:

  • does not require pre-processing the searched data (only the pattern)
  • scans strictly left-to-right
  • does not need to perform back tracking
  • does not need to employ reverse scan order
  • does not need to perform effectively random access lookups against the searched data or pattern
it also has:
  • a very simple, highly predictable behavior
  • an O(n) complexity once the a search pattern is preprocessed
  • an O(m) complexity for pre-processing search patterns
  • a worst case performance characteristic of only 2n
  • an average case performance characteristic that is deemed to be 2-3 times better than the naive search algorithm employed by String.indexOf(java.lang.String,int).
Note that the Boyer-Moore algorithm is generally considered to be the better practical, all-round exact sub-string search algorithm, but due to its reverse pattern scan order, performance considerations dictate that it requires more space and that is somewhat more complex to implement efficiently for searching forward-only access streams.

In particular, its higher average performance is biased toward larger search patterns, due to its ability to skip ahead further and with fewer tests under reverse pattern scan. But when searching forward-only access streams, overall performance considerations require the use a circular buffer of the same size as the search pattern to hold data from the searched stream as it is being compared in reverse order to the search pattern. Hence, Boyer-Moore requires at minimum twice the memory required by Knuth-Morris-Pratt to search for the same pattern and that factor has the greatest impact precisely on the same class of patterns (larger) for which it is most outperforms Knuth-Morris-Pratt.

Since:
2.1
Author:
Campbell Burnet (campbell-burnet@users dot sourceforge.net)
See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    static int[]
    computeTable(byte[] pattern)
    computes the table used to optimize octet pattern search
    static int[]
    computeTable(char[] pattern)
    computes the table used to optimize octet pattern search
    static int[]
    computes the table used to optimize octet pattern search
    static int
    search(byte[] source, byte[] pattern, int[] table, int start)
    Searches the given octet string for the given octet pattern returning the zero-based offset from given start position at which the first match is detected.
    static int
    search(char[] source, char[] pattern, int[] table, int start)
    Searches the given character array for the given character pattern returning the zero-based offset from given start position at which the first match is detected.
    static long
    search(InputStream inputStream, byte[] pattern, int[] table)
    Searches the given octet stream for the given octet pattern returning the zero-based offset from the initial stream position at which the first match is detected.
    static long
    search(Reader reader, char[] pattern, int[] table)
    Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.
    static long
    search(Reader reader, String pattern, int[] table)
    Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.
    static int
    search(String source, String pattern, int[] table, int start)
    Searches the given String object for the given character pattern returning the zero-based offset from given start position at which the first match is detected.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • computeTable

      public static int[] computeTable(byte[] pattern)
      computes the table used to optimize octet pattern search
      Parameters:
      pattern - for which to compute the table.
      Returns:
      the table computed from the octet pattern.
      Throws:
      IllegalArgumentException - if pattern == null || pattern.length < 2.
    • computeTable

      public static int[] computeTable(char[] pattern)
      computes the table used to optimize octet pattern search
      Parameters:
      pattern - for which to compute the table.
      Returns:
      the table computed from the character pattern.
      Throws:
      IllegalArgumentException - if pattern == null || pattern.length < 2.
    • computeTable

      public static int[] computeTable(String pattern)
      computes the table used to optimize octet pattern search
      Parameters:
      pattern - for which to compute the table.
      Returns:
      the table computed from the String pattern.
      Throws:
      IllegalArgumentException - if pattern == null || pattern.length < 2.
    • search

      public static long search(InputStream inputStream, byte[] pattern, int[] table) throws IOException
      Searches the given octet stream for the given octet pattern returning the zero-based offset from the initial stream position at which the first match is detected.

      Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.

      Parameters:
      inputStream - in which to search
      pattern - for which to search
      table - computed from the pattern that optimizes the search. If null, automatically computed.
      Returns:
      zero-based offset of first match; -1 if inputStream == null || pattern == null or no match found, ; zero (0) if pattern.length == 0.
      Throws:
      IOException - when an error occurs accessing the input stream.
    • search

      public static long search(Reader reader, char[] pattern, int[] table) throws IOException
      Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.

      Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.

      Parameters:
      reader - in which to search
      pattern - for which to search
      table - computed from the pattern that optimizes the search If null, automatically computed.
      Returns:
      zero-based offset of first match; -1 if reader == null || pattern == null or no match found, ; zero (0) if pattern.length == 0.
      Throws:
      IOException - when an error occurs accessing the input stream.
    • search

      public static long search(Reader reader, String pattern, int[] table) throws IOException
      Searches the given character stream for the given character pattern returning the zero-based offset from the initial stream position at which the first match is detected.

      Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.

      Parameters:
      reader - in which to search
      pattern - for which to search
      table - computed from the pattern that optimizes the search If null, automatically computed.
      Returns:
      zero-based offset of first match; -1 if reader == null || pattern == null or no match found, ; zero (0) if pattern.length() == 0.
      Throws:
      IOException - when an error occurs accessing the input stream.
    • search

      public static int search(byte[] source, byte[] pattern, int[] table, int start)
      Searches the given octet string for the given octet pattern returning the zero-based offset from given start position at which the first match is detected.

      Note that the signature includes a slot for the table so that searches for a pattern can be performed multiple times without incurring the overhead of computing the table each time.

      Parameters:
      source - array in which to search
      pattern - to be matched
      table - computed from the pattern that optimizes the search If null, automatically computed.
      start - position in source at which to start the search
      Returns:
      zero-based offset of first match; -1 if source == null || pattern == null or no match found, ; start if pattern.length == 0 and start <= source.length.
    • search

      public static int search(char[] source, char[] pattern, int[] table, int start)
      Searches the given character array for the given character pattern returning the zero-based offset from given start position at which the first match is detected.
      Parameters:
      source - array in which to search
      pattern - to be matched
      table - computed from the pattern that optimizes the search If null, automatically computed.
      start - position in source at which to start the search
      Returns:
      zero-based offset of first match; -1 if source == null || pattern == null or no match found, ; start if pattern.length == 0 and start <= source.length.
    • search

      public static int search(String source, String pattern, int[] table, int start)
      Searches the given String object for the given character pattern returning the zero-based offset from given start position at which the first match is detected.
      Parameters:
      source - array to be searched
      pattern - to be matched
      table - computed from the pattern that optimizes the search
      start - position in source at which to start the search
      Returns:
      zero-based offset of first match; -1 if source == null || pattern == null or no match found, ; start if pattern.length == 0 and start <= source.length.