12.3 Usage

12.3.1 Reading bytes

The simplest way to read bytes from a stream is to call fz_read_byte to read the next byte from a file. Akin to the standard fgetc, this returns -1 for end of data, or the next byte available.

/* 
   fz_read_byte: Read the next byte from a stream. 
 
   stm: The stream t read from. 
 
   Returns -1 for end of stream, or the next byte. May 
   throw exceptions. 
*/ 
int fz_read_byte(fz_context *ctx, fz_stream *stm);

To read more than 1 byte at a time, there are two different options.

Firstly, and most efficiently, bytes can be read directly from the streams underlying buffer. For a given fz_stream *stm, the current position in the stream is pointed to by stm->rp. Bytes can simply be read out, and the pointer incremented by the number read.

To do this, you must first know how many bytes there are available to be read out. This is achieved by calling fz_available. If there are no bytes already decoded and awaiting reading, this call will trigger a refill of the underlying buffer, which may take noticeable time.

/* 
   fz_available: Ask how many bytes are available immediately from 
   a given stream. 
 
   stm: The stream to read from. 
 
   max: A hint for the underlying stream; the maximum number of 
   bytes that we are sure we will want to read. If you do not know 
   this number, give 1. 
 
   Returns the number of bytes immediately available between the 
   read and write pointers. This number is guaranteed only to be 0 
   if we have hit EOF. The number of bytes returned here need have 
   no relation to max (could be larger, could be smaller). 
*/ 
size_t fz_available(fz_context *ctx, fz_stream *stm, size_t max);

To avoid needless work, a ‘max’ value can be supplied as a hint, telling any buffer refill operation that is triggered how many bytes are actually required. Specifying a max value does not guarantee you anything about the number of bytes actually made available.

Some callers may find this awkward - the need to potentially repeatedly call until you get enough bytes to fill a buffer of the required length may be tedious. Therefore as an alternative, we provide a simpler call, fz_read.

Designed to be similar to the standard fread call, this attempts to read as many bytes as possible into a supplied data block, returning the actual number of bytes successfully read.

/* 
   fz_read: Read from a stream into a given data block. 
 
   stm: The stream to read from. 
 
   data: The data block to read into. 
 
   len: The length of the data block (in bytes). 
 
   Returns the number of bytes read. May throw exceptions. 
*/ 
size_t fz_read(fz_context *ctx, fz_stream *stm, unsigned char *data, size_t len);

Typically the only reason that fz_read will not return the requested number of bytes is if we hit the end of the stream. This implies that calls to fz_read will block until such data is ready. For streams based on ‘fast’ sources like files or memory, this is an unimportant distinction.

For streams based on (say) an http download, this might result in significant delays, and an unacceptable user experience. To alleviate this problem we have a mechanism whereby such streams can signal a temporary end of data by throwing the FZ_ERROR_TRYLATER error. See chapter 16 Progressive Mode for more details.

To facilitate reading without blocking (or using buffers larger than required), fz_available can be called to find out the number of bytes that can safely be requested.

If data within a stream is not required, it can be skipped over using fz_skip:

/* 
   fz_skip: Read from a stream discarding data. 
 
   stm: The stream to read from. 
 
   len: The number of bytes to read. 
 
   Returns the number of bytes read. May throw exceptions. 
*/ 
size_t fz_skip(fz_context *ctx, fz_stream *stm, size_t len);

As a special case, after a single byte is read, it can be pushed back into the stream, using fz_unread_byte:

/* 
   fz_unread_byte: Unread the single last byte successfully 
   read from a stream. Do not call this without having 
   successfully read a byte. 
*/ 
void fz_unread_byte(fz_context *ctx FZ_UNUSED, fz_stream *stm);

The act of reading a byte, and then, if successful pushing it back again is encapsulated in a convenience function, fz_peek_byte:

/* 
   fz_peek_byte: Peek at the next byte in a stream. 
 
   stm: The stream to peek at. 
 
   Returns -1 for EOF, or the next byte that will be read. 
*/ 
int fz_peek_byte(fz_context *ctx, fz_stream *stm);

12.3.2 Reading objects

Often, when parsing different document formats, it can be useful to read specific objects from streams, so convenience functions exist for this too. Firstly, integers of different size and endianness are catered for:

/* 
   fz_read_[u]int(16|24|32|64)(_le)? 
 
   Read a 16/32/64 bit signed/unsigned integer from stream, 
   in big or little-endian byte orders. 
 
   Throws an exception if EOF is encountered. 
*/ 
uint16_t fz_read_uint16(fz_context *ctx, fz_stream *stm); 
uint32_t fz_read_uint24(fz_context *ctx, fz_stream *stm); 
uint32_t fz_read_uint32(fz_context *ctx, fz_stream *stm); 
uint64_t fz_read_uint64(fz_context *ctx, fz_stream *stm); 
 
uint16_t fz_read_uint16_le(fz_context *ctx, fz_stream *stm); 
uint32_t fz_read_uint24_le(fz_context *ctx, fz_stream *stm); 
uint32_t fz_read_uint32_le(fz_context *ctx, fz_stream *stm); 
uint64_t fz_read_uint64_le(fz_context *ctx, fz_stream *stm); 
 
int16_t fz_read_int16(fz_context *ctx, fz_stream *stm); 
int32_t fz_read_int32(fz_context *ctx, fz_stream *stm); 
int64_t fz_read_int64(fz_context *ctx, fz_stream *stm); 
 
int16_t fz_read_int16_le(fz_context *ctx, fz_stream *stm); 
int32_t fz_read_int32_le(fz_context *ctx, fz_stream *stm); 
int64_t fz_read_int64_le(fz_context *ctx, fz_stream *stm);

We have functions to read both C style strings, and newline/return terminated lines:

/* 
   fz_read_string: Read a null terminated string from the stream into 
   a buffer of a given length. The buffer will be null terminated. 
   Throws on failure (including the failure to fit the entire string 
   including the terminator into the buffer). 
*/ 
void fz_read_string(fz_context *ctx, fz_stream *stm, char *buffer, int len); 
 
/* 
   fz_read_line: Read a line from stream into the buffer until either a 
   terminating newline or EOF, which it replaces with a null byte (’\0’). 
 
   Returns buf on success, and NULL when end of file occurs while no characters 
   have been read. 
*/ 
char *fz_read_line(fz_context *ctx, fz_stream *stm, char *buf, size_t max);

12.3.3 Reading bits

Streams (or sections of streams) can be treated as a string of bits, packed either most significant or least significant bits first.

To read from an msb packed stream, use fz_read_bits:

/* 
   fz_read_bits: Read the next n bits from a stream (assumed to 
   be packed most significant bit first). 
 
   stm: The stream to read from. 
 
   n: The number of bits to read, between 1 and 8*sizeof(int) 
   inclusive. 
 
   Returns (unsigned int)-1 for EOF, or the required number of bits. 
*/ 
unsigned int fz_read_bits(fz_context *ctx, fz_stream *stm, int n);

Conversely, to read from a lsb packed stream, use fz_read_rbits:

/* 
   fz_read_rbits: Read the next n bits from a stream (assumed to 
   be packed least significant bit first). 
 
   stm: The stream to read from. 
 
   n: The number of bits to read, between 1 and 8*sizeof(int) 
   inclusive. 
 
   Returns (unsigned int)-1 for EOF, or the required number of bits. 
*/ 
unsigned int fz_read_rbits(fz_context *ctx, fz_stream *stm, int n);

;

Whichever of these is used, reading n bits will return the results in the lowest n bits of the returned value.

After reading bits using these functions, if a return to reading bytewise (or objectwise) is required, then fz_sync_bits must be called.

/* 
   fz_sync_bits: Called after reading bits to tell the stream 
   that we are about to return to reading bytewise. Resyncs 
   the stream to whole byte boundaries. 
*/ 
void fz_sync_bits(fz_context *ctx FZ_UNUSED, fz_stream *stm);

This function skips as many bits as as required to align with a byte boundary.

12.3.4 Reading whole streams

As a convenience function, MuPDF provides a mechanism for reading the entire contents of a stream into a fz_buffer.

/* 
   fz_read_all: Read all of a stream into a buffer. 
 
   stm: The stream to read from 
 
   initial: Suggested initial size for the buffer. 
 
   Returns a buffer created from reading from the stream. May throw 
   exceptions on failure to allocate. 
*/ 
fz_buffer *fz_read_all(fz_context *ctx, fz_stream *stm, size_t initial);

This will throw an error (and hence not return any data) if an error is encountered during the decode of the stream. Sometimes it can be preferable to ‘do the best we can’ and tolerate problematic data. For such cases, we provide fz_read_best:

/* 
   fz_read_best: Attempt to read a stream into a buffer. If truncated 
   is NULL behaves as fz_read_all, otherwise does not throw exceptions 
   in the case of failure, but instead sets a truncated flag. 
 
   stm: The stream to read from. 
 
   initial: Suggested initial size for the buffer. 
 
   truncated: Flag to store success/failure indication in. 
 
   Returns a buffer created from reading from the stream. 
*/ 
fz_buffer *fz_read_best(fz_context *ctx, fz_stream *stm, size_t initial, int *truncated);

12.3.5 Seeking

Most stream operations simply advance the stream pointer as the stream is read. The current stream position can always be obtained using fz_tell (deliberately similar to the standard ftell call):

/* 
   fz_tell: return the current reading position within a stream 
*/ 
int64_t fz_tell(fz_context *ctx, fz_stream *stm);

Some streams allow you to seek within them, that is, to change the current stream pointer to a given offset. To do this, use fz_seek (deliberately similar to fseek):

/* 
   fz_seek: Seek within a stream. 
 
   stm: The stream to seek within. 
 
   offset: The offset to seek to. 
 
   whence: From where the offset is measured (see fseek). 
*/ 
void fz_seek(fz_context *ctx, fz_stream *stm, int64_t offset, int whence);

In the event that a stream does not support seeking, an error will be thrown.

As fz_seek and fz_tell work at byte granularity, care should be exercised when reading streams bitwise. Always fz_sync_bits before expecting fz_tell to give you a value that you can safely fz_seek back to.

12.3.6 Meta data

Occasionally, it can be useful to interrogate the properties of a stream, for example the length of the stream, or whether it is coming from a progressive source (see chapter 16 Progressive Mode).

While not implemented currently, perhaps in future a particular stream user might want to interrogate information about the Mimetype of the stream, or its compression ratios.

To allow this, we have an extensible system to request Meta operations on a stream. The fz_stream_meta function allows such calls to be made, with a reason code to identify the required operation, and pointer and size parameters to identify data to be passed:

/* 
   fz_stream_meta: Perform a meta call on a stream (typically to 
   request meta information about a stream). 
 
   stm: The stream to query. 
 
   key: The meta request identifier. 
 
   size: Meta request specific parameter - typically the size of 
   the data block pointed to by ptr. 
 
   ptr: Meta request specific parameter - typically a pointer to 
   a block of data to be filled in. 
 
   Returns -1 if this stream does not support this meta operation, 
   or a meta operation specific return value. 
*/ 
int fz_stream_meta(fz_context *ctx, fz_stream *stm, int key, int size, void *ptr);

12.3.7 Destruction

In common with most other MuPDF objects, fz_streams are reference counted.

As such additional references can be taken using fz_keep_stream and they can be destroyed using fz_drop_stream.

Note that care must be taken not to use fz_stream objects simultaneously in more than one thread. Not only does the act of reading in one thread upset the point at which the next read will happen in another thread, no protection is provided to make operations atomic - thus the internal data can become corrupted and cause crashes.