All Classes and Interfaces
Class
Description
Filters those that were not generated by the old n-gram generator.
Some character normalization (and exclusion) functionality.
Deprecated.
can't be used because it would be a big loss to not inline this code.
LangDetect Command Line Interface.
Contains some standard
TextObjectFactory
s ready to use for
common use cases.Holds information about a detected language: the locale (language) and the probability.
Load Wikipedia's abstract XML as corpus and generate its language profile in JSON format.
Generate a language profile from any given text file.
Deprecated.
Deprecated.
replaced by LanguageProfile
Reads
LangProfile
s.Writes a
LangProfile
to an output stream (file).Guesses the language of an input string or text.
Builder for
LanguageDetector
.This class is immutable and thus thread-safe.
This is just a utility to update the code with the existing languages.
A language profile knows the locale (language), and contains the n-grams and some statistics.
Builder for
LanguageProfile
.This class is immutable.
Reads
LanguageProfile
s.Writes a
LanguageProfile
to an output stream or file.A language-detector implementation of a Locale, similar to the java.util.Locale.
This is
Messages
class generated by Eclipse automatically.Groups multiple
TextFilter
s as one and runs them in the given order.TODO document.
Class for extracting n-grams out of a text.
Provides easy access to commonly used NgramExtractor configs.
Filters out some undesired n-grams.
Contains frequency information for n-grams coming from multiple
LanguageProfile
s.Converts an old
LangProfile
to a new LanguageProfile
.Deprecated.
Removes text written in scripts that are not the dominant script of the text.
Filters what is generally not desired.
TagExtractor
is a class which extracts inner texts of specified tag.Allows to filter content from a text to be ignored for the n-gram analysis.
A convenient text object implementing CharSequence and Appendable.
Factory for
TextObject
s.Builder for
TextObjectFactory
.Removes URLs and email addresses from the text.
A place for sharing code.