Class RegexTokenizer

java.lang.Object
org.apache.commons.text.similarity.RegexTokenizer
All Implemented Interfaces:
Tokenizer<CharSequence>

class RegexTokenizer extends Object implements Tokenizer<CharSequence>
A simple word tokenizer that utilizes regex to find words. It applies a regex (\w)+ over the input text to extract words from a given character sequence.
Since:
1.0
  • Field Details

    • PATTERN

      private static final Pattern PATTERN
      The whitespace pattern.
  • Constructor Details

    • RegexTokenizer

      RegexTokenizer()
  • Method Details