Javolution 6.0.0 java
javolution.io.UTF8StreamReader Class Reference
Inheritance diagram for javolution.io.UTF8StreamReader:
[legend]
Collaboration diagram for javolution.io.UTF8StreamReader:
[legend]

Public Member Functions

 UTF8StreamReader ()
 
 UTF8StreamReader (int capacity)
 
UTF8StreamReader setInput (InputStream inStream)
 
boolean ready () throws IOException
 
void close () throws IOException
 
int read () throws IOException
 
int read (char cbuf[], int off, int len) throws IOException
 
void read (Appendable dest) throws IOException
 
void reset ()
 
UTF8StreamReader setInputStream (InputStream inStream)
 

Private Member Functions

int read2 () throws IOException
 

Private Attributes

InputStream _inputStream
 
int _start
 
int _end
 
final byte[] _bytes
 
int _code
 
int _moreBytes
 

Detailed Description

A UTF-8 stream reader.

This reader supports surrogate char pairs (representing characters in the range [U+10000 .. U+10FFFF]). It can also be used to read characters unicodes (31 bits) directly (ref. read()).

Each invocation of one of the read() methods may cause one or more bytes to be read from the underlying byte-input stream. To enable the efficient conversion of bytes to characters, more bytes may be read ahead from the underlying stream than are necessary to satisfy the current read operation.

Instances of this class can be reused for different input streams and can be part of a higher level component (e.g. parser) in order to avoid dynamic buffer allocation when the input source changes. Also wrapping using a java.io.BufferedReader is unnescessary as instances of this class embed their own data buffers.

Note: This reader is unsynchronized and does not test if the UTF-8 encoding is well-formed (e.g. UTF-8 sequences longer than necessary to encode a character).

Author
Jean-Marie Dautelle
Version
2.0, December 9, 2004
See also
UTF8StreamWriter

Definition at line 44 of file UTF8StreamReader.java.

Constructor & Destructor Documentation

◆ UTF8StreamReader() [1/2]

javolution.io.UTF8StreamReader.UTF8StreamReader ( )

Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).

Definition at line 69 of file UTF8StreamReader.java.

69  {
70  _bytes = new byte[2048];
71  }

References javolution.io.UTF8StreamReader._bytes.

◆ UTF8StreamReader() [2/2]

javolution.io.UTF8StreamReader.UTF8StreamReader ( int  capacity)

Creates a UTF-8 reader having a byte buffer of specified capacity.

Parameters
capacitythe capacity of the byte buffer.

Definition at line 78 of file UTF8StreamReader.java.

78  {
79  _bytes = new byte[capacity];
80  }

References javolution.io.UTF8StreamReader._bytes.

Member Function Documentation

◆ close()

void javolution.io.UTF8StreamReader.close ( ) throws IOException

Closes and resets this reader for reuse.

Exceptions
IOExceptionif an I/O error occurs.

Definition at line 120 of file UTF8StreamReader.java.

120  {
121  if (_inputStream != null) {
122  _inputStream.close();
123  reset();
124  }
125  }

References javolution.io.UTF8StreamReader._inputStream, and javolution.io.UTF8StreamReader.reset().

Here is the call graph for this function:

◆ read() [1/3]

int javolution.io.UTF8StreamReader.read ( ) throws IOException

Reads a single character. This method will block until a character is available, an I/O error occurs or the end of the stream is reached.

Returns
the 31-bits Unicode of the character read, or -1 if the end of the stream has been reached.
Exceptions
IOExceptionif an I/O error occurs.

Definition at line 135 of file UTF8StreamReader.java.

135  {
136  byte b = _bytes[_start];
137  return ((b >= 0) && (_start++ < _end)) ? b : read2();
138  }

References javolution.io.UTF8StreamReader._bytes, javolution.io.UTF8StreamReader._end, javolution.io.UTF8StreamReader._start, and javolution.io.UTF8StreamReader.read2().

Here is the call graph for this function:

◆ read() [2/3]

void javolution.io.UTF8StreamReader.read ( Appendable  dest) throws IOException

Reads characters into the specified appendable. This method will block until the end of the stream is reached.

Parameters
destthe destination buffer.
Exceptions
IOExceptionif an I/O error occurs.

Definition at line 271 of file UTF8StreamReader.java.

271  {
272  if (_inputStream == null)
273  throw new IOException("No input stream or stream closed");
274  while (true) {
275  if (_start >= _end) { // Fills buffer.
276  _start = 0;
277  _end = _inputStream.read(_bytes, 0, _bytes.length);
278  if (_end <= 0) { // Done.
279  break;
280  }
281  }
282  byte b = _bytes[_start];
283  if (b >= 0) {
284  dest.append((char) b); // Most common case.
285  _start++;
286  } else {
287  int code = read2();
288  if (code < 0x10000) {
289  dest.append((char) code);
290  } else if (code <= 0x10ffff) { // Surrogates.
291  dest.append((char) (((code - 0x10000) >> 10) + 0xd800));
292  dest.append((char) (((code - 0x10000) & 0x3ff) + 0xdc00));
293  } else {
294  throw new CharConversionException("Cannot convert U+"
295  + Integer.toHexString(code)
296  + " to char (code greater than U+10FFFF)");
297  }
298  }
299  }
300  }

References javolution.io.UTF8StreamReader._bytes, javolution.io.UTF8StreamReader._end, javolution.io.UTF8StreamReader._inputStream, javolution.io.UTF8StreamReader._start, and javolution.io.UTF8StreamReader.read2().

Here is the call graph for this function:

◆ read() [3/3]

int javolution.io.UTF8StreamReader.read ( char  cbuf[],
int  off,
int  len 
) throws IOException

Reads characters into a portion of an array. This method will block until some input is available, an I/O error occurs or the end of the stream is reached.

Note: Characters between U+10000 and U+10FFFF are represented by surrogate pairs (two char).

Parameters
cbufthe destination buffer.
offthe offset at which to start storing characters.
lenthe maximum number of characters to read
Returns
the number of characters read, or -1 if the end of the stream has been reached
Exceptions
IOExceptionif an I/O error occurs.

Definition at line 222 of file UTF8StreamReader.java.

222  {
223  if (_inputStream == null)
224  throw new IOException("No input stream or stream closed");
225  if (_start >= _end) { // Fills buffer.
226  _start = 0;
227  _end = _inputStream.read(_bytes, 0, _bytes.length);
228  if (_end <= 0) { // Done.
229  return _end;
230  }
231  }
232  final int off_plus_len = off + len;
233  for (int i = off; i < off_plus_len;) {
234  // assert(_start < _end)
235  byte b = _bytes[_start];
236  if ((b >= 0) && (++_start < _end)) {
237  cbuf[i++] = (char) b; // Most common case.
238  } else if (b < 0) {
239  if (i < off_plus_len - 1) { // Up to two 'char' can be read.
240  int code = read2();
241  if (code < 0x10000) {
242  cbuf[i++] = (char) code;
243  } else if (code <= 0x10ffff) { // Surrogates.
244  cbuf[i++] = (char) (((code - 0x10000) >> 10) + 0xd800);
245  cbuf[i++] = (char) (((code - 0x10000) & 0x3ff) + 0xdc00);
246  } else {
247  throw new CharConversionException("Cannot convert U+"
248  + Integer.toHexString(code)
249  + " to char (code greater than U+10FFFF)");
250  }
251  if (_start < _end) {
252  continue;
253  }
254  }
255  return i - off;
256  } else { // End of buffer (_start >= _end).
257  cbuf[i++] = (char) b;
258  return i - off;
259  }
260  }
261  return len;
262  }

References javolution.io.UTF8StreamReader._bytes, javolution.io.UTF8StreamReader._end, javolution.io.UTF8StreamReader._inputStream, javolution.io.UTF8StreamReader._start, and javolution.io.UTF8StreamReader.read2().

Here is the call graph for this function:

◆ read2()

int javolution.io.UTF8StreamReader.read2 ( ) throws IOException
private

Definition at line 141 of file UTF8StreamReader.java.

141  {
142  if (_start < _end) {
143  byte b = _bytes[_start++];
144 
145  // Decodes UTF-8.
146  if ((b >= 0) && (_moreBytes == 0)) {
147  // 0xxxxxxx
148  return b;
149  } else if (((b & 0xc0) == 0x80) && (_moreBytes != 0)) {
150  // 10xxxxxx (continuation byte)
151  _code = (_code << 6) | (b & 0x3f); // Adds 6 bits to code.
152  if (--_moreBytes == 0) {
153  return _code;
154  } else {
155  return read2();
156  }
157  } else if (((b & 0xe0) == 0xc0) && (_moreBytes == 0)) {
158  // 110xxxxx
159  _code = b & 0x1f;
160  _moreBytes = 1;
161  return read2();
162  } else if (((b & 0xf0) == 0xe0) && (_moreBytes == 0)) {
163  // 1110xxxx
164  _code = b & 0x0f;
165  _moreBytes = 2;
166  return read2();
167  } else if (((b & 0xf8) == 0xf0) && (_moreBytes == 0)) {
168  // 11110xxx
169  _code = b & 0x07;
170  _moreBytes = 3;
171  return read2();
172  } else if (((b & 0xfc) == 0xf8) && (_moreBytes == 0)) {
173  // 111110xx
174  _code = b & 0x03;
175  _moreBytes = 4;
176  return read2();
177  } else if (((b & 0xfe) == 0xfc) && (_moreBytes == 0)) {
178  // 1111110x
179  _code = b & 0x01;
180  _moreBytes = 5;
181  return read2();
182  } else {
183  throw new CharConversionException("Invalid UTF-8 Encoding");
184  }
185  } else { // No more bytes in buffer.
186  if (_inputStream == null)
187  throw new IOException("No input stream or stream closed");
188  _start = 0;
189  _end = _inputStream.read(_bytes, 0, _bytes.length);
190  if (_end > 0) {
191  return read2(); // Continues.
192  } else { // Done.
193  if (_moreBytes == 0) {
194  return -1;
195  } else { // Incomplete sequence.
196  throw new CharConversionException(
197  "Unexpected end of stream");
198  }
199  }
200  }
201  }

References javolution.io.UTF8StreamReader._bytes, javolution.io.UTF8StreamReader._code, javolution.io.UTF8StreamReader._end, javolution.io.UTF8StreamReader._inputStream, javolution.io.UTF8StreamReader._moreBytes, and javolution.io.UTF8StreamReader._start.

Referenced by javolution.io.UTF8StreamReader.read().

Here is the caller graph for this function:

◆ ready()

boolean javolution.io.UTF8StreamReader.ready ( ) throws IOException

Indicates if this stream is ready to be read.

Returns
true if the next read() is guaranteed not to block for input; false otherwise.
Exceptions
IOExceptionif an I/O error occurs.

Definition at line 109 of file UTF8StreamReader.java.

109  {
110  if (_inputStream == null)
111  throw new IOException("Stream closed");
112  return ((_end - _start) > 0) || (_inputStream.available() != 0);
113  }

References javolution.io.UTF8StreamReader._end, javolution.io.UTF8StreamReader._inputStream, and javolution.io.UTF8StreamReader._start.

◆ reset()

void javolution.io.UTF8StreamReader.reset ( )

Definition at line 302 of file UTF8StreamReader.java.

302  {
303  _code = 0;
304  _end = 0;
305  _inputStream = null;
306  _moreBytes = 0;
307  _start = 0;
308  }

References javolution.io.UTF8StreamReader._code, javolution.io.UTF8StreamReader._end, javolution.io.UTF8StreamReader._inputStream, javolution.io.UTF8StreamReader._moreBytes, and javolution.io.UTF8StreamReader._start.

Referenced by javolution.io.UTF8StreamReader.close(), and javolution.xml.internal.stream.XMLStreamReaderImpl.reset().

Here is the caller graph for this function:

◆ setInput()

UTF8StreamReader javolution.io.UTF8StreamReader.setInput ( InputStream  inStream)

Sets the input stream to use for reading until this reader is closed. For example:[code] Reader reader = new UTF8StreamReader().setInput(inStream); [/code] is equivalent but reads twice as fast as [code] Reader reader = new java.io.InputStreamReader(inStream, "UTF-8"); [/code]

Parameters
inStreamthe input stream.
Returns
this UTF-8 reader.
Exceptions
IllegalStateExceptionif this reader is being reused and it has not been closed or reset.

Definition at line 95 of file UTF8StreamReader.java.

95  {
96  if (_inputStream != null)
97  throw new IllegalStateException("Reader not closed or reset");
98  _inputStream = inStream;
99  return this;
100  }

References javolution.io.UTF8StreamReader._inputStream.

Referenced by javolution.xml.internal.stream.XMLStreamReaderImpl.setInput(), and javolution.io.UTF8StreamReader.setInputStream().

Here is the caller graph for this function:

◆ setInputStream()

UTF8StreamReader javolution.io.UTF8StreamReader.setInputStream ( InputStream  inStream)
Deprecated:
Replaced by setInput(InputStream)

Definition at line 313 of file UTF8StreamReader.java.

313  {
314  return this.setInput(inStream);
315  }

References javolution.io.UTF8StreamReader.setInput().

Here is the call graph for this function:

Member Data Documentation

◆ _bytes

final byte [] javolution.io.UTF8StreamReader._bytes
private

◆ _code

int javolution.io.UTF8StreamReader._code
private

◆ _end

int javolution.io.UTF8StreamReader._end
private

◆ _inputStream

InputStream javolution.io.UTF8StreamReader._inputStream
private

◆ _moreBytes

int javolution.io.UTF8StreamReader._moreBytes
private

◆ _start

int javolution.io.UTF8StreamReader._start
private

The documentation for this class was generated from the following file:
javolution.io.UTF8StreamReader.reset
void reset()
Definition: UTF8StreamReader.java:302
javolution.io.UTF8StreamReader._moreBytes
int _moreBytes
Definition: UTF8StreamReader.java:205
javolution.io.UTF8StreamReader.read2
int read2()
Definition: UTF8StreamReader.java:141
javolution.io.UTF8StreamReader._code
int _code
Definition: UTF8StreamReader.java:203
javolution.io.UTF8StreamReader.setInput
UTF8StreamReader setInput(InputStream inStream)
Definition: UTF8StreamReader.java:95
javolution.io.UTF8StreamReader._end
int _end
Definition: UTF8StreamReader.java:59
javolution.io.UTF8StreamReader._bytes
final byte[] _bytes
Definition: UTF8StreamReader.java:64
javolution.io.UTF8StreamReader._start
int _start
Definition: UTF8StreamReader.java:54
javolution.io.UTF8StreamReader._inputStream
InputStream _inputStream
Definition: UTF8StreamReader.java:49