BetterTokenizer

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

main
Class BetterTokenizer

java.lang.Object
  main.BetterTokenizer

public class BetterTokenizer
extends java.lang.Object

The BetterTokenizer class takes an input stream and * parses it into "tokens", allowing the tokens to be * read one at a time. The parsing process is controlled by a table * and a number of flags that can be set to various states. The * stream tokenizer can recognize identifiers, numbers, quoted * strings, and various comment styles. *

* Each byte read from the input stream is regarded as a character * in the range '\u0000' through '\u00FF'. * The character value is used to look up five possible attributes of * the character: white space, alphabetic, * numeric, string quote, and comment character. * Each character can have zero or more of these attributes. *

* In addition, an instance has four flags. These flags indicate: *

Whether line terminators are to be returned as tokens or treated * as white space that merely separates tokens. *
Whether C-style comments are to be recognized and skipped. *
Whether C++-style comments are to be recognized and skipped. *
Whether the characters of identifiers are converted to lowercase. *

* A typical application first constructs an instance of this class, * sets up the syntax tables, and then repeatedly loops calling the * nextToken method in each iteration of the loop until * it returns the value TT_EOF. * * @author James Gosling, modified by GvonD * @version 1.21g, 03/04/99 * @see java.io.BetterTokenizer#nextToken() * @see java.io.BetterTokenizer#TT_EOF * @since JDK1.0

Field Summary
`double`	`nval` If the current token is a number, this field contains the value * of that number.
`java.lang.String`	`sval` If the current token is a word token, this field contains a * string giving the characters of the word token.
`static int`	`TT_EOF` * A constant indicating that the end of the stream has been read.
`static int`	`TT_EOL` * A constant indicating that the end of the line has been read.
`static int`	`TT_NUMBER` * A constant indicating that a number token has been read.
`static int`	`TT_WORD` * A constant indicating that a word token has been read.
`int`	`ttype` * After a call to the `nextToken` method, this field * contains the type of the token just read.

Constructor Summary
`BetterTokenizer(java.io.InputStream is)` Creates a stream tokenizer that parses the specified input * stream.
`BetterTokenizer(java.io.Reader r)` Create a tokenizer that parses the given character stream.

Method Summary
`void`	`commentChar(int ch)` * Specified that the character argument starts a single-line * comment.
`void`	`eolIsSignificant(boolean flag)` Determines whether or not ends of line are treated as tokens.
`int`	`lineno()` Return the current line number.
`void`	`lowerCaseMode(boolean fl)` Determines whether or not word token are automatically lowercased.
`int`	`nextToken()` * Parses the next token from the input stream of this tokenizer.
`void`	`ordinaryChar(int ch)` * Specifies that the character argument is "ordinary" * in this tokenizer.
`void`	`ordinaryChars(int low, int hi)` * Specifies that all characters c in the range * `low <= c <= high` * are "ordinary" in this tokenizer.
`void`	`parseNumbers()` * Specifies that numbers should be parsed by this tokenizer.
`void`	`pushBack()` Causes the next call to the `nextToken` method of this * tokenizer to return the current value in the `ttype` * field, and not to modify the value in the `nval` or * `sval` field.
`void`	`quoteChar(int ch)` * Specifies that matching pairs of this character delimit string * constants in this tokenizer.
`void`	`resetSyntax()` * Resets this tokenizer's syntax table so that all characters are * "ordinary." See the `ordinaryChar` method * for more information on a character being ordinary.
`void`	`slashSlashComments(boolean flag)` * Determines whether or not the tokenizer recognizes C++-style comments.
`void`	`slashStarComments(boolean flag)` * Determines whether or not the tokenizer recognizes C-style comments.
`java.lang.String`	`toString()` Returns the string representation of the current stream token.
`void`	`whitespaceChars(int low, int hi)` * Specifies that all characters c in the range * `low <= c <= high` * are white space characters.
`void`	`wordChars(int low, int hi)` * Specifies that all characters c in the range * `low <= c <= high` * are word constituents.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

ttype

public int ttype

* After a call to the nextToken method, this field * contains the type of the token just read. For a single character * token, its value is the single character, converted to an integer. * For a quoted string token (see , its value is the quote character. * Otherwise, its value is one of the following: *

TT_WORD indicates that the token is a word. *
TT_NUMBER indicates that the token is a number. *
TT_EOL indicates that the end of line has been read. * The field can only have this value if the * eolIsSignificant method has been called with the * argument true. *
TT_EOF indicates that the end of the input stream * has been reached. *

* * @see java.io.BetterTokenizer#eolIsSignificant(boolean) * @see java.io.BetterTokenizer#nextToken() * @see java.io.BetterTokenizer#quoteChar(int) * @see java.io.BetterTokenizer#TT_EOF * @see java.io.BetterTokenizer#TT_EOL * @see java.io.BetterTokenizer#TT_NUMBER * @see java.io.BetterTokenizer#TT_WORD

TT_EOF

public static final int TT_EOF

* A constant indicating that the end of the stream has been read.

See Also:: Constant Field Values

TT_EOL

public static final int TT_EOL

* A constant indicating that the end of the line has been read.

See Also:: Constant Field Values

TT_NUMBER

public static final int TT_NUMBER

* A constant indicating that a number token has been read.

See Also:: Constant Field Values

TT_WORD

public static final int TT_WORD

* A constant indicating that a word token has been read.

See Also:: Constant Field Values

sval

public java.lang.String sval

If the current token is a word token, this field contains a * string giving the characters of the word token. When the current * token is a quoted string token, this field contains the body of * the string. *

* The current token is a word when the value of the * ttype field is TT_WORD. The current token is * a quoted string token when the value of the ttype field is * a quote character. * * @see java.io.BetterTokenizer#quoteChar(int) * @see java.io.BetterTokenizer#TT_WORD * @see java.io.BetterTokenizer#ttype * @since JDK1.0

nval

public double nval

If the current token is a number, this field contains the value * of that number. The current token is a number when the value of * the ttype field is TT_NUMBER. * * @see java.io.BetterTokenizer#TT_NUMBER * @see java.io.BetterTokenizer#ttype

Constructor Detail

BetterTokenizer

public BetterTokenizer(java.io.InputStream is)

Creates a stream tokenizer that parses the specified input * stream. The stream tokenizer is initialized to the following * default state: *

All byte values 'A' through 'Z', * 'a' through 'z', and * '\u00A0' through '\u00FF' are * considered to be alphabetic. *
All byte values '\u0000' through * '\u0020' are considered to be white space. *
'/' is a comment character. *
Single quote '\'' and double quote '"' * are string quote characters. *
Numbers are parsed. *
Ends of lines are treated as white space, not as separate tokens. *
C-style and C++-style comments are not recognized. *

* * @deprecated As of JDK version 1.1, the preferred way to tokenize an * input stream is to convert it into a character stream, for example: *

	 *   Reader r = new BufferedReader(new InputStreamReader(is));
	 *   BetterTokenizer st = new BetterTokenizer(r);
	 *

* * @param is an input stream. * @see java.io.BufferedReader * @see java.io.InputStreamReader * @see java.io.BetterTokenizer#BetterTokenizer(java.io.Reader)

BetterTokenizer

public BetterTokenizer(java.io.Reader r)

Create a tokenizer that parses the given character stream. * @since JDK1.1

Method Detail

resetSyntax

public void resetSyntax()

* Resets this tokenizer's syntax table so that all characters are * "ordinary." See the ordinaryChar method * for more information on a character being ordinary. * * @see java.io.BetterTokenizer#ordinaryChar(int)

wordChars

public void wordChars(int low,
                      int hi)

* Specifies that all characters c in the range * low <= c <= high * are word constituents. A word token consists of a word constituent * followed by zero or more word constituents or number constituents. * * @param low the low end of the range. * @param hi the high end of the range.

whitespaceChars

public void whitespaceChars(int low,
                            int hi)

* Specifies that all characters c in the range * low <= c <= high * are white space characters. White space characters serve only to * separate tokens in the input stream. * * @param low the low end of the range. * @param hi the high end of the range.

ordinaryChars

public void ordinaryChars(int low,
                          int hi)

* Specifies that all characters c in the range * low <= c <= high * are "ordinary" in this tokenizer. See the * ordinaryChar method for more information on a * character being ordinary. * * @param low the low end of the range. * @param hi the high end of the range. * @see java.io.BetterTokenizer#ordinaryChar(int)

ordinaryChar

public void ordinaryChar(int ch)

* Specifies that the character argument is "ordinary" * in this tokenizer. It removes any special significance the * character has as a comment character, word component, string * delimiter, white space, or number character. When such a character * is encountered by the parser, the parser treates it as a * single-character token and sets ttype field to the * character value. * * @param ch the character. * @see java.io.BetterTokenizer#ttype

commentChar

public void commentChar(int ch)

* Specified that the character argument starts a single-line * comment. All characters from the comment character to the end of * the line are ignored by this stream tokenizer. * * @param ch the character.

quoteChar

public void quoteChar(int ch)

* Specifies that matching pairs of this character delimit string * constants in this tokenizer. *

* When the nextToken method encounters a string * constant, the ttype field is set to the string * delimiter and the sval field is set to the body of * the string. *

* If a string quote character is encountered, then a string is * recognized, consisting of all characters after (but not including) * the string quote character, up to (but not including) the next * occurrence of that same string quote character, or a line * terminator, or end of file. The usual escape sequences such as * "\n" and "\t" are recognized and * converted to single characters as the string is parsed. * * @param ch the character. * @see java.io.BetterTokenizer#nextToken() * @see java.io.BetterTokenizer#sval * @see java.io.BetterTokenizer#ttype

parseNumbers

public void parseNumbers()

* Specifies that numbers should be parsed by this tokenizer. The * syntax table of this tokenizer is modified so that each of the twelve * characters: *


	 *      0 1 2 3 4 5 6 7 8 9 . -
	 *

* has the "numeric" attribute. *

* When the parser encounters a word token that has the format of a * double precision floating-point number, it treats the token as a * number rather than a word, by setting the the ttype * field to the value TT_NUMBER and putting the numeric * value of the token into the nval field. * * @see java.io.BetterTokenizer#nval * @see java.io.BetterTokenizer#TT_NUMBER * @see java.io.BetterTokenizer#ttype

eolIsSignificant

public void eolIsSignificant(boolean flag)

Determines whether or not ends of line are treated as tokens. * If the flag argument is true, this tokenizer treats end of lines * as tokens; the nextToken method returns * TT_EOL and also sets the ttype field to * this value when an end of line is read. *

* A line is a sequence of characters ending with either a * carriage-return character ('\r') or a newline * character ('\n'). In addition, a carriage-return * character followed immediately by a newline character is treated * as a single end-of-line token. *

* If the flag is false, end-of-line characters are * treated as white space and serve only to separate tokens. * * @param flag true indicates that end-of-line characters * are separate tokens; false indicates that * end-of-line characters are white space. * @see java.io.BetterTokenizer#nextToken() * @see java.io.BetterTokenizer#ttype * @see java.io.BetterTokenizer#TT_EOL

slashStarComments

public void slashStarComments(boolean flag)

* Determines whether or not the tokenizer recognizes C-style comments. * If the flag argument is true, this stream tokenizer * recognizes C-style comments. All text between successive * occurrences of /* and */ are discarded. *

* If the flag argument is false, then C-style comments * are not treated specially. * * @param flag true indicates to recognize and ignore * C-style comments.

slashSlashComments

public void slashSlashComments(boolean flag)

* Determines whether or not the tokenizer recognizes C++-style comments. * If the flag argument is true, this stream tokenizer * recognizes C++-style comments. Any occurrence of two consecutive * slash characters ('/') is treated as the beginning of * a comment that extends to the end of the line. *

* If the flag argument is false, then C++-style * comments are not treated specially. * * @param flag true indicates to recognize and ignore * C++-style comments.

lowerCaseMode

public void lowerCaseMode(boolean fl)

Determines whether or not word token are automatically lowercased. * If the flag argument is true, then the value in the * sval field is lowercased whenever a word token is * returned (the ttype field has the * value TT_WORD by the nextToken method * of this tokenizer. *

* If the flag argument is false, then the * sval field is not modified. * * @param fl true indicates that all word tokens should * be lowercased. * @see java.io.BetterTokenizer#nextToken() * @see java.io.BetterTokenizer#ttype * @see java.io.BetterTokenizer#TT_WORD

nextToken

public int nextToken()
              throws java.io.IOException

* Parses the next token from the input stream of this tokenizer. * The type of the next token is returned in the ttype * field. Additional information about the token may be in the * nval field or the sval field of this * tokenizer. *

* Typical clients of this * class first set up the syntax tables and then sit in a loop * calling nextToken to parse successive tokens until TT_EOF * is returned. * * @return the value of the ttype field. * @exception IOException if an I/O error occurs. * @see java.io.BetterTokenizer#nval * @see java.io.BetterTokenizer#sval * @see java.io.BetterTokenizer#ttype

Throws:: java.io.IOException

pushBack

public void pushBack()

Causes the next call to the nextToken method of this * tokenizer to return the current value in the ttype * field, and not to modify the value in the nval or * sval field. * * @see java.io.BetterTokenizer#nextToken() * @see java.io.BetterTokenizer#nval * @see java.io.BetterTokenizer#sval * @see java.io.BetterTokenizer#ttype

lineno

public int lineno()

Return the current line number. * * @return the current line number of this stream tokenizer.

toString

public java.lang.String toString()

Returns the string representation of the current stream token. * * @return a string representation of the token specified by the * ttype, nval, and sval * fields. * @see java.io.BetterTokenizer#nval * @see java.io.BetterTokenizer#sval * @see java.io.BetterTokenizer#ttype

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

main Class BetterTokenizer

ttype

TT_EOF

TT_EOL

TT_NUMBER

TT_WORD

sval

nval

BetterTokenizer

BetterTokenizer

resetSyntax

wordChars

whitespaceChars

ordinaryChars

ordinaryChar

commentChar

quoteChar

parseNumbers

eolIsSignificant

slashStarComments

slashSlashComments

lowerCaseMode

nextToken

pushBack

lineno

toString

main
Class BetterTokenizer