Package edu.berkeley.nlp.lm.map
Class HashNgramMap<T>
java.lang.Object
edu.berkeley.nlp.lm.map.AbstractNgramMap<T>
edu.berkeley.nlp.lm.map.HashNgramMap<T>
- Type Parameters:
T
-
- All Implemented Interfaces:
ContextEncodedNgramMap<T>
,NgramMap<T>
,Serializable
- Author:
- adampauls
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface edu.berkeley.nlp.lm.map.NgramMap
NgramMap.Entry<T>
-
Field Summary
Fields inherited from class edu.berkeley.nlp.lm.map.AbstractNgramMap
NUM_BITS_PER_BYTE, NUM_SUFFIX_BITS, NUM_WORD_BITS, opts, SUFFIX_BIT_MASK, values, WORD_BIT_MASK
-
Method Summary
Modifier and TypeMethodDescriptionvoid
boolean
contains
(int[] ngram, int startPos, int endPos) static <T> HashNgramMap
<T> createExplicitWordHashNgramMap
(ValueContainer<T> values, ConfigOptions opts, int maxNgramOrder, boolean reversed) Note: Explicit HashNgramMap can grow beyond maxNgramOrderstatic <T> HashNgramMap
<T> createImplicitWordHashNgramMap
(ValueContainer<T> values, ConfigOptions opts, LongArray[] numNgramsForEachWord, boolean reversed) get
(int[] ngram, int startPos, int endPos) int
getFirstWordForOffset
(long offset, int ngramOrder) int
getLastWordForOffset
(long offset, int ngramOrder) int
long
getNextContextOffset
(long offset, int ngramOrder) int
getNextWord
(long offset, int ngramOrder) int[]
getNgramForOffset
(long offset, int ngramOrder) int[]
getNgramForOffset
(long offset, int ngramOrder, int[] ret) int[]
getNgramFromContextEncoding
(long contextOffset, int contextOrder, int word) getNgramOffsetsForOrder
(int ngramOrder) getNgramsForOrder
(int ngramOrder) long
getNumNgrams
(int ngramOrder) long
getOffset
(long contextOffset, int contextOrder, int word) getOffsetForNgram
(int[] ngram, int startPos, int endPos) long
getOffsetForNgramInModel
(int[] ngram, int startPos, int endPos) LikegetOffsetForNgram(int[], int, int)
, but assumes that the full n-gram is in the map (i.e.long
getPrefixOffset
(long offset, int ngramOrder) Gets the offset of the context for an n-gram (represented by offset)long
long
getValueAndOffset
(long contextOffset, int contextOrder, int word, T outputVal) getValueStoringArray
(int ngramOrder) void
handleNgramsFinished
(int justFinishedOrder) void
initWithLengths
(List<Long> numNGrams) boolean
long
long
putWithOffset
(int[] ngram, int startPos, int endPos, long contextOffset, T val) Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly.long
putWithOffsetAndSuffix
(int[] ngram, int startPos, int endPos, long contextOffset, long suffixOffset, T val) Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly.void
rehashIfNecessary
(int num) void
trim()
boolean
wordHasBigrams
(int word) Methods inherited from class edu.berkeley.nlp.lm.map.AbstractNgramMap
combineToKey, containsOutOfVocab, contextOffsetOf, equals, getSubArray, getValues, wordOf
-
Method Details
-
createImplicitWordHashNgramMap
public static <T> HashNgramMap<T> createImplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, LongArray[] numNgramsForEachWord, boolean reversed) -
createExplicitWordHashNgramMap
public static <T> HashNgramMap<T> createExplicitWordHashNgramMap(ValueContainer<T> values, ConfigOptions opts, int maxNgramOrder, boolean reversed) Note: Explicit HashNgramMap can grow beyond maxNgramOrder- Type Parameters:
T
-- Parameters:
values
-opts
-maxNgramOrder
-reversed
-- Returns:
-
put
-
putWithOffset
Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly. This is so that the offsets returned remain valid. Basically, you should not use this function unless you really know what you're doing.- Parameters:
ngram
-startPos
-endPos
-contextOffset
-val
-- Returns:
-
putWithOffsetAndSuffix
public long putWithOffsetAndSuffix(int[] ngram, int startPos, int endPos, long contextOffset, long suffixOffset, T val) Warning: does not rehash if load factor is exceeded, must call rehashIfNecessary explicitly. This is so that the offsets returned remain valid. Basically, you should not use this function unless you really know what you're doing.- Parameters:
ngram
-startPos
-endPos
-contextOffset
-val
-- Returns:
-
rehashIfNecessary
public void rehashIfNecessary(int num) -
getValueAndOffset
- Specified by:
getValueAndOffset
in interfaceNgramMap<T>
-
getOffset
public long getOffset(long contextOffset, int contextOrder, int word) - Specified by:
getOffset
in interfaceContextEncodedNgramMap<T>
-
getNgramFromContextEncoding
public int[] getNgramFromContextEncoding(long contextOffset, int contextOrder, int word) - Specified by:
getNgramFromContextEncoding
in interfaceContextEncodedNgramMap<T>
-
getNextWord
public int getNextWord(long offset, int ngramOrder) -
getNextContextOffset
public long getNextContextOffset(long offset, int ngramOrder) -
getFirstWordForOffset
public int getFirstWordForOffset(long offset, int ngramOrder) -
getLastWordForOffset
public int getLastWordForOffset(long offset, int ngramOrder) -
getNgramForOffset
public int[] getNgramForOffset(long offset, int ngramOrder) -
getNgramForOffset
public int[] getNgramForOffset(long offset, int ngramOrder, int[] ret) -
getOffsetForNgram
public ContextEncodedNgramLanguageModel.LmContextInfo getOffsetForNgram(int[] ngram, int startPos, int endPos) - Specified by:
getOffsetForNgram
in interfaceContextEncodedNgramMap<T>
-
getOffsetForNgramInModel
public long getOffsetForNgramInModel(int[] ngram, int startPos, int endPos) LikegetOffsetForNgram(int[], int, int)
, but assumes that the full n-gram is in the map (i.e. does not back off to the largest suffix which is in the model).- Parameters:
ngram
-startPos
-endPos
-- Returns:
-
handleNgramsFinished
public void handleNgramsFinished(int justFinishedOrder) - Specified by:
handleNgramsFinished
in interfaceNgramMap<T>
-
initWithLengths
- Specified by:
initWithLengths
in interfaceNgramMap<T>
-
trim
public void trim() -
getPrefixOffset
public long getPrefixOffset(long offset, int ngramOrder) Gets the offset of the context for an n-gram (represented by offset)- Parameters:
offset
-- Returns:
-
getMaxNgramOrder
public int getMaxNgramOrder()- Specified by:
getMaxNgramOrder
in interfaceNgramMap<T>
-
getNumNgrams
public long getNumNgrams(int ngramOrder) - Specified by:
getNumNgrams
in interfaceNgramMap<T>
-
getNgramsForOrder
- Specified by:
getNgramsForOrder
in interfaceNgramMap<T>
-
getNgramOffsetsForOrder
-
isReversed
public boolean isReversed() -
wordHasBigrams
public boolean wordHasBigrams(int word) - Specified by:
wordHasBigrams
in interfaceContextEncodedNgramMap<T>
-
contains
public boolean contains(int[] ngram, int startPos, int endPos) -
get
-
getTotalSize
public long getTotalSize() -
getValueStoringArray
- Specified by:
getValueStoringArray
in interfaceNgramMap<T>
-
clearStorage
public void clearStorage()- Specified by:
clearStorage
in interfaceNgramMap<T>
-