Package org.jcodings.specific
Class CESU8Encoding
java.lang.Object
org.jcodings.Encoding
org.jcodings.AbstractEncoding
org.jcodings.MultiByteEncoding
org.jcodings.unicode.UnicodeEncoding
org.jcodings.specific.CESU8Encoding
- All Implemented Interfaces:
Cloneable
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int[]
(package private) static final int[][]
static final CESU8Encoding
private static final int
private static final int
(package private) static final boolean
private static final int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint
codeToMbc
(int code, byte[] bytes, int p) Extracts code point into it's multibyte representationint
codeToMbcLength
(int code) Returns character length given a code point Oniguruma equivalent:code_to_mbclen
int[]
ctypeCodeRange
(int ctype, IntHolder sbOut) Returns code range for a given character type Oniguruma equivalent:get_ctype_code_range
The name of the equivalent Java Charset for this encoding.boolean
isNewLine
(byte[] bytes, int p, int end) onigenc_is_mbc_newline_0x0a / used also by multibyte encodingsboolean
isReverseMatchAllowed
(byte[] bytes, int p, int end) Returns true if it's safe to use reversal Boyer-Moore search fail fast algorithm Oniguruma equivalent:is_allowed_reverse_match
int
leftAdjustCharHead
(byte[] bytes, int p, int s, int end) Seeks the previous character head in a stream Oniguruma equivalent:left_adjust_char_head
int
length
(byte[] bytes, int p, int end) Returns character length given stream, character position and stream end returns1
for singlebyte encodings or performs sanity validations for multibyte ones and returns the character length, missing characters in the stream otherwiseprivate int
lengthForOneUptoSix
(byte[] bytes, int p, int end, int b, int s) int
mbcCaseFold
(int flag, byte[] bytes, IntHolder pp, int end, byte[] fold) onigenc_ascii_mbc_case_foldint
mbcToCode
(byte[] bytes, int p, int end) Returns code point for a character Oniguruma equivalent:mbc_to_code
(package private) static byte
trail0
(int code) (package private) static byte
trail0
(long code) (package private) static byte
trailS
(int code, int shift) (package private) static byte
trailS
(long code, int shift) private static boolean
utf8IsLead
(int c) Methods inherited from class org.jcodings.unicode.UnicodeEncoding
applyAllCaseFold, caseFoldCodesByString, caseMap, ctypeCodeRange, isCodeCType, isInCodeRange, propertyNameToCType
Methods inherited from class org.jcodings.MultiByteEncoding
isInRange, length, lengthForTwoUptoFour, mb2CodeToMbc, mb2CodeToMbcLength, mb2IsCodeCType, mb4CodeToMbc, mb4CodeToMbcLength, mb4IsCodeCType, mbnMbcCaseFold, mbnMbcToCode, missing, missing, safeLengthForUptoFour, safeLengthForUptoThree, safeLengthForUptoTwo, strCodeAt, strLength
Methods inherited from class org.jcodings.AbstractEncoding
asciiApplyAllCaseFold, asciiCaseFoldCodesByString, asciiMbcCaseFold, isCodeCTypeInternal
Methods inherited from class org.jcodings.Encoding
asciiToLower, asciiToUpper, digitVal, equals, getCharset, getIndex, getName, hashCode, isAlnum, isAlpha, isAscii, isAscii, isAsciiCompatible, isBlank, isCntrl, isDigit, isDummy, isFixedWidth, isGraph, isLower, isMbcAscii, isMbcCrnl, isMbcHead, isMbcWord, isNewLine, isPrint, isPunct, isSbWord, isSingleByte, isSpace, isUnicode, isUpper, isUTF8, isWord, isWordGraphPrint, isXDigit, load, load, maxLength, maxLengthDistance, mbcodeStartPosition, minLength, odigitVal, prevCharHead, rightAdjustCharHead, rightAdjustCharHeadWithPrev, setDummy, setName, setName, step, stepBack, strByteLengthNull, strLengthNull, strNCmp, toLowerCaseTable, toString, xdigitVal
-
Field Details
-
USE_INVALID_CODE_SCHEME
static final boolean USE_INVALID_CODE_SCHEME- See Also:
-
INVALID_CODE_FE
private static final int INVALID_CODE_FE- See Also:
-
INVALID_CODE_FF
private static final int INVALID_CODE_FF- See Also:
-
VALID_CODE_LIMIT
private static final int VALID_CODE_LIMIT- See Also:
-
CESU8EncLen
private static final int[] CESU8EncLen -
CESU8Trans
static final int[][] CESU8Trans -
INSTANCE
-
-
Constructor Details
-
CESU8Encoding
protected CESU8Encoding()
-
-
Method Details
-
getCharsetName
Description copied from class:Encoding
The name of the equivalent Java Charset for this encoding. Defaults to the name of the encoding. Subclasses can override this to provide a different name.- Overrides:
getCharsetName
in classUnicodeEncoding
- Returns:
- the name of the equivalent Java Charset for this encoding
-
length
public int length(byte[] bytes, int p, int end) Description copied from class:Encoding
Returns character length given stream, character position and stream end returns1
for singlebyte encodings or performs sanity validations for multibyte ones and returns the character length, missing characters in the stream otherwise -
lengthForOneUptoSix
private int lengthForOneUptoSix(byte[] bytes, int p, int end, int b, int s) -
isNewLine
public boolean isNewLine(byte[] bytes, int p, int end) Description copied from class:AbstractEncoding
onigenc_is_mbc_newline_0x0a / used also by multibyte encodings- Overrides:
isNewLine
in classAbstractEncoding
-
codeToMbcLength
public int codeToMbcLength(int code) Description copied from class:Encoding
Returns character length given a code point Oniguruma equivalent:code_to_mbclen
- Specified by:
codeToMbcLength
in classEncoding
-
mbcToCode
public int mbcToCode(byte[] bytes, int p, int end) Description copied from class:Encoding
Returns code point for a character Oniguruma equivalent:mbc_to_code
-
trailS
static byte trailS(int code, int shift) -
trail0
static byte trail0(int code) -
trailS
static byte trailS(long code, int shift) -
trail0
static byte trail0(long code) -
codeToMbc
public int codeToMbc(int code, byte[] bytes, int p) Description copied from class:Encoding
Extracts code point into it's multibyte representation -
mbcCaseFold
Description copied from class:AbstractEncoding
onigenc_ascii_mbc_case_fold- Overrides:
mbcCaseFold
in classUnicodeEncoding
- Parameters:
flag
- case fold flagpp
- anIntHolder
that points at character headfold
- a buffer where to extract case folded character Oniguruma equivalent:mbc_case_fold
-
ctypeCodeRange
Description copied from class:Encoding
Returns code range for a given character type Oniguruma equivalent:get_ctype_code_range
- Specified by:
ctypeCodeRange
in classEncoding
-
utf8IsLead
private static boolean utf8IsLead(int c) -
leftAdjustCharHead
public int leftAdjustCharHead(byte[] bytes, int p, int s, int end) Description copied from class:Encoding
Seeks the previous character head in a stream Oniguruma equivalent:left_adjust_char_head
- Specified by:
leftAdjustCharHead
in classEncoding
- Parameters:
bytes
- byte streamp
- positions
- stopend
- end
-
isReverseMatchAllowed
public boolean isReverseMatchAllowed(byte[] bytes, int p, int end) Description copied from class:Encoding
Returns true if it's safe to use reversal Boyer-Moore search fail fast algorithm Oniguruma equivalent:is_allowed_reverse_match
- Specified by:
isReverseMatchAllowed
in classEncoding
-