java.lang.Object
org.apache.lucene.util.OfflineSorter
On-disk sorting of byte arrays. Each byte array (entry) is a composed of the following fields:
- (two bytes) length of the following byte array,
- exactly the above count of bytes for the sequence to be sorted.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final class
A bit more descriptive unit for constructors.static class
Utility class to read length-prefixed byte[] entries from an input.static class
Utility class to emit length-prefixed byte[] entries to an output stream for sorting.(package private) static class
private class
Merges multiple file-based partitions to a single on-disk partition.private static class
Holds one partition of items, either loaded into memory or based on a file.class
Sort info (debugging mostly).private class
Sorts one in-memory partition, writes it to disk, and returns the resulting file-based partition. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final long
Absolute minimum required buffer size for sorting.private final Comparator
<BytesRef> static final Comparator
<BytesRef> Default comparator: sorts in binary (codepoint) orderprivate final Directory
private final ExecutorService
static final long
Convenience constant for gigabytesstatic final int
Maximum number of temporary files before doing an intermediate merge.private int
static final long
Convenience constant for megabytesstatic final long
Minimum recommended buffer size for sorting.private static final String
private final Semaphore
private final OfflineSorter.BufferSize
(package private) OfflineSorter.SortInfo
private final String
private final int
-
Constructor Summary
ConstructorsConstructorDescriptionOfflineSorter
(Directory dir, String tempFileNamePrefix) Defaults constructor.OfflineSorter
(Directory dir, String tempFileNamePrefix, Comparator<BytesRef> comparator) Defaults constructor with a custom comparator.OfflineSorter
(Directory dir, String tempFileNamePrefix, Comparator<BytesRef> comparator, OfflineSorter.BufferSize ramBufferSize, int maxTempfiles, int valueLength, ExecutorService exec, int maxPartitionsInRAM) All-details constructor. -
Method Summary
Modifier and TypeMethodDescriptionReturns the comparator in use to sort entriesReturns theDirectory
we use to create temp files.private OfflineSorter.Partition
getPartition
(Future<OfflineSorter.Partition> future) protected OfflineSorter.ByteSequencesReader
getReader
(ChecksumIndexInput in, String name) Subclasses can override to change how byte sequences are read from disk.Returns the temp file name prefix passed toDirectory.createTempOutput(java.lang.String, java.lang.String, org.apache.lucene.store.IOContext)
to generate temporary files.protected OfflineSorter.ByteSequencesWriter
getWriter
(IndexOutput out, long itemCount) Subclasses can override to change how byte sequences are written to disk.(package private) void
mergePartitions
(Directory trackingDir, List<Future<OfflineSorter.Partition>> segments) Merge the most recentmaxTempFile
partitions into a new partition.(package private) OfflineSorter.Partition
Read in a single partition of data, setting isExhausted[0] to true if there are no more items.Sort input to a new temp file, returning its name.private void
verifyChecksum
(Throwable priorException, OfflineSorter.ByteSequencesReader reader) Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.
-
Field Details
-
MB
public static final long MBConvenience constant for megabytes- See Also:
-
GB
public static final long GBConvenience constant for gigabytes- See Also:
-
MIN_BUFFER_SIZE_MB
public static final long MIN_BUFFER_SIZE_MBMinimum recommended buffer size for sorting.- See Also:
-
ABSOLUTE_MIN_SORT_BUFFER_SIZE
public static final long ABSOLUTE_MIN_SORT_BUFFER_SIZEAbsolute minimum required buffer size for sorting.- See Also:
-
MIN_BUFFER_SIZE_MSG
- See Also:
-
MAX_TEMPFILES
public static final int MAX_TEMPFILESMaximum number of temporary files before doing an intermediate merge.- See Also:
-
dir
-
valueLength
private final int valueLength -
tempFileNamePrefix
-
exec
-
partitionsInRAM
-
ramBufferSize
-
sortInfo
OfflineSorter.SortInfo sortInfo -
maxTempFiles
private int maxTempFiles -
comparator
-
DEFAULT_COMPARATOR
Default comparator: sorts in binary (codepoint) order
-
-
Constructor Details
-
OfflineSorter
Defaults constructor.- Throws:
IOException
- See Also:
-
OfflineSorter
public OfflineSorter(Directory dir, String tempFileNamePrefix, Comparator<BytesRef> comparator) throws IOException Defaults constructor with a custom comparator.- Throws:
IOException
- See Also:
-
OfflineSorter
public OfflineSorter(Directory dir, String tempFileNamePrefix, Comparator<BytesRef> comparator, OfflineSorter.BufferSize ramBufferSize, int maxTempfiles, int valueLength, ExecutorService exec, int maxPartitionsInRAM) All-details constructor. IfvalueLength
is -1 (the default), the length of each value differs; otherwise, all values have the specified length. If you pass a non-nullExecutorService
then it will be used to run sorting operations that can be run concurrently, and maxPartitionsInRAM is the maximum concurrent in-memory partitions. Thus the maximum possible RAM used by this class while sorting ismaxPartitionsInRAM * ramBufferSize
.
-
-
Method Details
-
getDirectory
Returns theDirectory
we use to create temp files. -
getTempFileNamePrefix
Returns the temp file name prefix passed toDirectory.createTempOutput(java.lang.String, java.lang.String, org.apache.lucene.store.IOContext)
to generate temporary files. -
sort
Sort input to a new temp file, returning its name.- Throws:
IOException
-
verifyChecksum
private void verifyChecksum(Throwable priorException, OfflineSorter.ByteSequencesReader reader) throws IOException Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.- Throws:
IOException
-
mergePartitions
void mergePartitions(Directory trackingDir, List<Future<OfflineSorter.Partition>> segments) throws IOException Merge the most recentmaxTempFile
partitions into a new partition.- Throws:
IOException
-
readPartition
OfflineSorter.Partition readPartition(OfflineSorter.ByteSequencesReader reader) throws IOException, InterruptedException Read in a single partition of data, setting isExhausted[0] to true if there are no more items.- Throws:
IOException
InterruptedException
-
getWriter
protected OfflineSorter.ByteSequencesWriter getWriter(IndexOutput out, long itemCount) throws IOException Subclasses can override to change how byte sequences are written to disk.- Throws:
IOException
-
getReader
protected OfflineSorter.ByteSequencesReader getReader(ChecksumIndexInput in, String name) throws IOException Subclasses can override to change how byte sequences are read from disk.- Throws:
IOException
-
getComparator
Returns the comparator in use to sort entries -
getPartition
private OfflineSorter.Partition getPartition(Future<OfflineSorter.Partition> future) throws IOException - Throws:
IOException
-