Package org.languagetool.tagging
Class BaseTagger
java.lang.Object
org.languagetool.tagging.BaseTagger
- All Implemented Interfaces:
Tagger
Base tagger using Morfologik binary dictionaries.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final Locale
private final morfologik.stemming.Dictionary
private final String
private final boolean
protected final WordTagger
-
Constructor Summary
ConstructorsConstructorDescriptionBaseTagger
(String filename) BaseTagger
(String filename, Locale conversionLocale) BaseTagger
(String filename, Locale locale, boolean tagLowercaseWithUppercase) -
Method Summary
Modifier and TypeMethodDescriptionprotected @Nullable List
<AnalyzedToken> additionalTags
(String word, WordTagger wordTagger) Allows additional tagging in some language-dependent circumstancesprivate void
addTokens
(List<AnalyzedToken> taggedTokens, List<AnalyzedToken> l) protected AnalyzedToken
asAnalyzedToken
(String word, morfologik.stemming.WordData wd) private AnalyzedToken
asAnalyzedToken
(String word, TaggedWord taggedWord) protected List
<AnalyzedToken> asAnalyzedTokenList
(String word, List<morfologik.stemming.WordData> wdList) protected List
<AnalyzedToken> asAnalyzedTokenListForTaggedWords
(String word, List<TaggedWord> taggedWords) final AnalyzedTokenReadings
createNullToken
(String token, int startPos) Create the AnalyzedToken used for whitespace and other non-words.createToken
(String token, String posTag) Create a token specific to the language of the implementing class.protected List
<AnalyzedToken> getAnalyzedTokens
(String word) protected morfologik.stemming.Dictionary
abstract @Nullable String
Get the filename for manual additions, e.g.,/en/added.txt
, ornull
.@Nullable String
Get the filename for manual removals, e.g.,/en/removed.txt
, ornull
.protected WordTagger
private WordTagger
boolean
If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.Returns a list ofAnalyzedToken
s that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).
-
Field Details
-
wordTagger
-
conversionLocale
-
tagLowercaseWithUppercase
private final boolean tagLowercaseWithUppercase -
dictionaryPath
-
dictionary
private final morfologik.stemming.Dictionary dictionary
-
-
Constructor Details
-
BaseTagger
- Since:
- 2.9
-
BaseTagger
- Since:
- 2.9
-
BaseTagger
- Since:
- 2.9
-
-
Method Details
-
getManualAdditionsFileName
Get the filename for manual additions, e.g.,/en/added.txt
, ornull
.- Since:
- 2.8
-
getManualRemovalsFileName
Get the filename for manual removals, e.g.,/en/removed.txt
, ornull
.- Since:
- 3.2
-
getDictionaryPath
- Since:
- 2.9
-
overwriteWithManualTagger
public boolean overwriteWithManualTagger()If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.- Since:
- 2.9
-
getWordTagger
-
initWordTagger
-
getDictionary
protected morfologik.stemming.Dictionary getDictionary() -
tag
Description copied from interface:Tagger
Returns a list ofAnalyzedToken
s that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).Note that this method takes exactly one sentence. Its implementation may implement special cases for the first word of a sentence, which is usually written with an uppercase letter.
- Specified by:
tag
in interfaceTagger
- Parameters:
sentenceTokens
- the text as returned by a WordTokenizer- Throws:
IOException
-
getAnalyzedTokens
-
asAnalyzedTokenList
protected List<AnalyzedToken> asAnalyzedTokenList(String word, List<morfologik.stemming.WordData> wdList) -
asAnalyzedTokenListForTaggedWords
protected List<AnalyzedToken> asAnalyzedTokenListForTaggedWords(String word, List<TaggedWord> taggedWords) -
asAnalyzedToken
-
asAnalyzedToken
-
addTokens
-
createNullToken
Description copied from interface:Tagger
Create the AnalyzedToken used for whitespace and other non-words. Usenull
as the POS tag for this token.- Specified by:
createNullToken
in interfaceTagger
-
createToken
Description copied from interface:Tagger
Create a token specific to the language of the implementing class.- Specified by:
createToken
in interfaceTagger
-
additionalTags
@Nullable protected @Nullable List<AnalyzedToken> additionalTags(String word, WordTagger wordTagger) Allows additional tagging in some language-dependent circumstances- Parameters:
word
- The word to tag- Returns:
- Returns list of analyzed tokens with additional tags, or
null
-