java.lang.Object
org.apache.lucene.search.DocIdSetIterator
org.apache.lucene.backward_codecs.lucene70.IndexedDISI

final class IndexedDISI extends DocIdSetIterator
Disk-based implementation of a DocIdSetIterator which can return the index of the current document, i.e. the ordinal of the current document among the list of documents that this iterator can return. This is useful to implement sparse doc values by only having to encode values for documents that actually have a value.

Implementation-wise, this DocIdSetIterator is inspired of roaring bitmaps and encodes ranges of 65536 documents independently and picks between 3 encodings depending on the density of the range:

  • ALL if the range contains 65536 documents exactly,
  • DENSE if the range contains 4096 documents or more; in that case documents are stored in a bit set,
  • SPARSE otherwise, and the lower 16 bits of the doc IDs are stored in a short.

Only ranges that contain at least one value are encoded.

This implementation uses 6 bytes per document in the worst-case, which happens in the case that all ranges contain exactly one document.

  • Field Details

    • MAX_ARRAY_LENGTH

      static final int MAX_ARRAY_LENGTH
      See Also:
    • slice

      private final IndexInput slice
      The slice that stores the DocIdSetIterator.
    • cost

      private final long cost
    • block

      private int block
    • blockEnd

      private long blockEnd
    • nextBlockIndex

      private int nextBlockIndex
    • method

    • doc

      private int doc
    • index

      private int index
    • exists

      boolean exists
    • word

      private long word
    • wordIndex

      private int wordIndex
    • numberOfOnes

      private int numberOfOnes
    • gap

      private int gap
  • Constructor Details

  • Method Details

    • flush

      private static void flush(int block, FixedBitSet buffer, int cardinality, IndexOutput out) throws IOException
      Throws:
      IOException
    • writeBitSet

      static void writeBitSet(DocIdSetIterator it, IndexOutput out) throws IOException
      Throws:
      IOException
    • docID

      public int docID()
      Description copied from class: DocIdSetIterator
      Returns the following:
      Specified by:
      docID in class DocIdSetIterator
    • advance

      public int advance(int target) throws IOException
      Description copied from class: DocIdSetIterator
      Advances to the first beyond the current whose document number is greater than or equal to target, and returns the document number itself. Exhausts the iterator and returns DocIdSetIterator.NO_MORE_DOCS if target is greater than the highest document number in the set.

      The behavior of this method is undefined when called with target ≤ current , or after the iterator has exhausted. Both cases may result in unpredicted behavior.

      When target > current it behaves as if written:

       int advance(int target) {
         int doc;
         while ((doc = nextDoc()) < target) {
         }
         return doc;
       }
       
      Some implementations are considerably more efficient than that.

      NOTE: this method may be called with DocIdSetIterator.NO_MORE_DOCS for efficiency by some Scorers. If your implementation cannot efficiently determine that it should exhaust, it is recommended that you check for that value in each call to this method.

      Specified by:
      advance in class DocIdSetIterator
      Throws:
      IOException
    • advanceExact

      public boolean advanceExact(int target) throws IOException
      Throws:
      IOException
    • advanceBlock

      private void advanceBlock(int targetBlock) throws IOException
      Throws:
      IOException
    • readBlockHeader

      private void readBlockHeader() throws IOException
      Throws:
      IOException
    • nextDoc

      public int nextDoc() throws IOException
      Description copied from class: DocIdSetIterator
      Advances to the next document in the set and returns the doc it is currently on, or DocIdSetIterator.NO_MORE_DOCS if there are no more docs in the set.
      NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behavior.
      Specified by:
      nextDoc in class DocIdSetIterator
      Throws:
      IOException
    • index

      public int index()
    • cost

      public long cost()
      Description copied from class: DocIdSetIterator
      Returns the estimated cost of this DocIdSetIterator.

      This is generally an upper bound of the number of documents this iterator might match, but may be a rough heuristic, hardcoded value, or otherwise completely inaccurate.

      Specified by:
      cost in class DocIdSetIterator