Class Lucene40BlockTreeTermsReader
java.lang.Object
org.apache.lucene.index.Fields
org.apache.lucene.codecs.FieldsProducer
org.apache.lucene.backward_codecs.lucene40.blocktree.Lucene40BlockTreeTermsReader
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Iterable<String>
A block-based terms index and dictionary that assigns terms to variable length blocks according
to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The
advantage of this approach is that seekExact is often able to determine a term cannot exist
without doing any IO, and intersection with Automata is very fast. Note that this terms
dictionary has its own fixed terms index (ie, it does not support a pluggable terms index
implementation).
NOTE: this terms dictionary supports min/maxItemsPerBlock during indexing to control how much memory the terms index uses.
The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones.
Use CheckIndex
with the -verbose
option to see
summary statistics on the blocks in the dictionary.
See BlockTreeTermsWriter
.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final Map
<String, FieldReader> (package private) final IndexInput
(package private) static final BytesRef
(package private) static final int
(package private) static final int
(package private) static final int
(package private) static final int
(package private) final PostingsReaderBase
(package private) final String
(package private) static final String
(package private) static final String
Extension of terms file(package private) static final String
(package private) static final String
Extension of terms index file(package private) static final String
(package private) static final String
Extension of terms meta file(package private) final IndexInput
(package private) final int
static final int
Suffixes are compressed to save space.static final int
Current terms format.static final int
Metadata is written to its own file.static final int
The long[] + byte[] metadata has been replaced with a single byte[].static final int
Initial terms format.Fields inherited from class org.apache.lucene.index.Fields
EMPTY_ARRAY
-
Constructor Summary
ConstructorsConstructorDescriptionLucene40BlockTreeTermsReader
(PostingsReaderBase postingsReader, SegmentReadState state) Sole constructor. -
Method Summary
Modifier and TypeMethodDescription(package private) String
void
Checks consistency of this reader.void
close()
iterator()
Returns an iterator that will step through all fields names.private static BytesRef
private static void
seekDir
(IndexInput input) Seekinput
to the directory offset.int
size()
Returns the number of fields or -1 if the number of distinct field names is unknown.Get theTerms
for this field.toString()
Methods inherited from class org.apache.lucene.codecs.FieldsProducer
getMergeInstance
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
FST_OUTPUTS
-
NO_OUTPUT
-
OUTPUT_FLAGS_NUM_BITS
static final int OUTPUT_FLAGS_NUM_BITS- See Also:
-
OUTPUT_FLAGS_MASK
static final int OUTPUT_FLAGS_MASK- See Also:
-
OUTPUT_FLAG_IS_FLOOR
static final int OUTPUT_FLAG_IS_FLOOR- See Also:
-
OUTPUT_FLAG_HAS_TERMS
static final int OUTPUT_FLAG_HAS_TERMS- See Also:
-
TERMS_EXTENSION
Extension of terms file- See Also:
-
TERMS_CODEC_NAME
- See Also:
-
VERSION_START
public static final int VERSION_STARTInitial terms format.- See Also:
-
VERSION_META_LONGS_REMOVED
public static final int VERSION_META_LONGS_REMOVEDThe long[] + byte[] metadata has been replaced with a single byte[].- See Also:
-
VERSION_COMPRESSED_SUFFIXES
public static final int VERSION_COMPRESSED_SUFFIXESSuffixes are compressed to save space.- See Also:
-
VERSION_META_FILE
public static final int VERSION_META_FILEMetadata is written to its own file.- See Also:
-
VERSION_CURRENT
public static final int VERSION_CURRENTCurrent terms format.- See Also:
-
TERMS_INDEX_EXTENSION
Extension of terms index file- See Also:
-
TERMS_INDEX_CODEC_NAME
- See Also:
-
TERMS_META_EXTENSION
Extension of terms meta file- See Also:
-
TERMS_META_CODEC_NAME
- See Also:
-
termsIn
-
indexIn
-
postingsReader
-
fieldMap
-
fieldList
-
segment
-
version
final int version
-
-
Constructor Details
-
Lucene40BlockTreeTermsReader
public Lucene40BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState state) throws IOException Sole constructor.- Throws:
IOException
-
-
Method Details
-
readBytesRef
- Throws:
IOException
-
seekDir
Seekinput
to the directory offset.- Throws:
IOException
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classFieldsProducer
- Throws:
IOException
-
iterator
Description copied from class:Fields
Returns an iterator that will step through all fields names. This will not return null. -
terms
Description copied from class:Fields
Get theTerms
for this field. This will return null if the field does not exist.- Specified by:
terms
in classFields
- Throws:
IOException
-
size
public int size()Description copied from class:Fields
Returns the number of fields or -1 if the number of distinct field names is unknown. If >= 0,Fields.iterator()
will return as many field names. -
brToString
-
checkIntegrity
Description copied from class:FieldsProducer
Checks consistency of this reader.Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.
- Specified by:
checkIntegrity
in classFieldsProducer
- Throws:
IOException
-
toString
-