Package org.jsoup.parser
Class TokenQueue
- java.lang.Object
-
- org.jsoup.parser.TokenQueue
-
public class TokenQueue extends java.lang.Object
A character queue with parsing helpers.
-
-
Field Summary
Fields Modifier and Type Field Description private static java.lang.String[]
CssIdentifierChars
private static java.lang.String[]
ElementSelectorChars
private static char
ESC
private int
pos
private java.lang.String
queue
-
Constructor Summary
Constructors Constructor Description TokenQueue(java.lang.String data)
Create a new TokenQueue.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addFirst(java.lang.String seq)
Add a string to the start of the queue.void
advance()
Drops the next character off the queue.java.lang.String
chompBalanced(char open, char close)
Pulls a balanced string off the queue.java.lang.String
chompTo(java.lang.String seq)
Pulls a string off the queue (like consumeTo), and then pulls off the matched string (but does not return it).java.lang.String
chompToIgnoreCase(java.lang.String seq)
char
consume()
Consume one character off queue.void
consume(java.lang.String seq)
Consumes the supplied sequence of the queue.java.lang.String
consumeCssIdentifier()
Consume a CSS identifier (ID or class) off the queue (letter, digit, -, _) http://www.w3.org/TR/CSS2/syndata.html#value-def-identifierjava.lang.String
consumeElementSelector()
Consume a CSS element selector (tag name, but | instead of : for namespaces (or *| for wildcard namespace), to not conflict with :pseudo selects).private java.lang.String
consumeEscapedCssIdentifier(java.lang.String... matches)
java.lang.String
consumeTo(java.lang.String seq)
Pulls a string off the queue, up to but exclusive of the match sequence, or to the queue running out.java.lang.String
consumeToAny(java.lang.String... seq)
Consumes to the first sequence provided, or to the end of the queue.java.lang.String
consumeToIgnoreCase(java.lang.String seq)
boolean
consumeWhitespace()
Pulls the next run of whitespace characters of the queue.java.lang.String
consumeWord()
Retrieves the next run of word type (letter or digit) off the queue.static java.lang.String
escapeCssIdentifier(java.lang.String in)
boolean
isEmpty()
Is the queue empty?boolean
matchChomp(java.lang.String seq)
Tests if the queue matches the sequence (as with match), and if they do, removes the matched string from the queue.boolean
matches(java.lang.String seq)
Tests if the next characters on the queue match the sequence.boolean
matchesAny(char... seq)
boolean
matchesAny(java.lang.String... seq)
Tests if the next characters match any of the sequences.private boolean
matchesCssIdentifier(java.lang.String... matches)
boolean
matchesWhitespace()
Tests if queue starts with a whitespace character.boolean
matchesWord()
Test if the queue matches a word character (letter or digit).java.lang.String
remainder()
Consume and return whatever is left on the queue.private int
remainingLength()
java.lang.String
toString()
static java.lang.String
unescape(java.lang.String in)
Unescape a \ escaped string.
-
-
-
Field Detail
-
queue
private java.lang.String queue
-
pos
private int pos
-
ESC
private static final char ESC
- See Also:
- Constant Field Values
-
ElementSelectorChars
private static final java.lang.String[] ElementSelectorChars
-
CssIdentifierChars
private static final java.lang.String[] CssIdentifierChars
-
-
Method Detail
-
isEmpty
public boolean isEmpty()
Is the queue empty?- Returns:
- true if no data left in queue.
-
remainingLength
private int remainingLength()
-
addFirst
public void addFirst(java.lang.String seq)
Add a string to the start of the queue.- Parameters:
seq
- string to add.
-
matches
public boolean matches(java.lang.String seq)
Tests if the next characters on the queue match the sequence. Case insensitive.- Parameters:
seq
- String to check queue for.- Returns:
- true if the next characters match.
-
matchesAny
public boolean matchesAny(java.lang.String... seq)
Tests if the next characters match any of the sequences. Case insensitive.- Parameters:
seq
- list of strings to case insensitively check for- Returns:
- true of any matched, false if none did
-
matchesAny
public boolean matchesAny(char... seq)
-
matchChomp
public boolean matchChomp(java.lang.String seq)
Tests if the queue matches the sequence (as with match), and if they do, removes the matched string from the queue.- Parameters:
seq
- String to search for, and if found, remove from queue.- Returns:
- true if found and removed, false if not found.
-
matchesWhitespace
public boolean matchesWhitespace()
Tests if queue starts with a whitespace character.- Returns:
- if starts with whitespace
-
matchesWord
public boolean matchesWord()
Test if the queue matches a word character (letter or digit).- Returns:
- if matches a word character
-
advance
public void advance()
Drops the next character off the queue.
-
consume
public char consume()
Consume one character off queue.- Returns:
- first character on queue.
-
consume
public void consume(java.lang.String seq)
Consumes the supplied sequence of the queue. If the queue does not start with the supplied sequence, will throw an illegal state exception -- but you should be running match() against that condition.Case insensitive.
- Parameters:
seq
- sequence to remove from head of queue.
-
consumeTo
public java.lang.String consumeTo(java.lang.String seq)
Pulls a string off the queue, up to but exclusive of the match sequence, or to the queue running out.- Parameters:
seq
- String to end on (and not include in return, but leave on queue). Case sensitive.- Returns:
- The matched data consumed from queue.
-
consumeToIgnoreCase
public java.lang.String consumeToIgnoreCase(java.lang.String seq)
-
consumeToAny
public java.lang.String consumeToAny(java.lang.String... seq)
Consumes to the first sequence provided, or to the end of the queue. Leaves the terminator on the queue.- Parameters:
seq
- any number of terminators to consume to. Case insensitive.- Returns:
- consumed string
-
chompTo
public java.lang.String chompTo(java.lang.String seq)
Pulls a string off the queue (like consumeTo), and then pulls off the matched string (but does not return it).If the queue runs out of characters before finding the seq, will return as much as it can (and queue will go isEmpty() == true).
- Parameters:
seq
- String to match up to, and not include in return, and to pull off queue. Case sensitive.- Returns:
- Data matched from queue.
-
chompToIgnoreCase
public java.lang.String chompToIgnoreCase(java.lang.String seq)
-
chompBalanced
public java.lang.String chompBalanced(char open, char close)
Pulls a balanced string off the queue. E.g. if queue is "(one (two) three) four", (,) will return "one (two) three", and leave " four" on the queue. Unbalanced openers and closers can be quoted (with ' or ") or escaped (with \). Those escapes will be left in the returned string, which is suitable for regexes (where we need to preserve the escape), but unsuitable for contains text strings; use unescape for that.- Parameters:
open
- openerclose
- closer- Returns:
- data matched from the queue
-
unescape
public static java.lang.String unescape(java.lang.String in)
Unescape a \ escaped string.- Parameters:
in
- backslash escaped string- Returns:
- unescaped string
-
escapeCssIdentifier
public static java.lang.String escapeCssIdentifier(java.lang.String in)
-
consumeWhitespace
public boolean consumeWhitespace()
Pulls the next run of whitespace characters of the queue.- Returns:
- Whether consuming whitespace or not
-
consumeWord
public java.lang.String consumeWord()
Retrieves the next run of word type (letter or digit) off the queue.- Returns:
- String of word characters from queue, or empty string if none.
-
consumeElementSelector
public java.lang.String consumeElementSelector()
Consume a CSS element selector (tag name, but | instead of : for namespaces (or *| for wildcard namespace), to not conflict with :pseudo selects).- Returns:
- tag name
-
consumeCssIdentifier
public java.lang.String consumeCssIdentifier()
Consume a CSS identifier (ID or class) off the queue (letter, digit, -, _) http://www.w3.org/TR/CSS2/syndata.html#value-def-identifier- Returns:
- identifier
-
consumeEscapedCssIdentifier
private java.lang.String consumeEscapedCssIdentifier(java.lang.String... matches)
-
matchesCssIdentifier
private boolean matchesCssIdentifier(java.lang.String... matches)
-
remainder
public java.lang.String remainder()
Consume and return whatever is left on the queue.- Returns:
- remained of queue.
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-