Structured Text Device

The Structured Text device is used to extract the text from a given graphical stream, together with the position it inhabits on the output page. It can also optionally include details of images and their positions within its output.


\begin{lstlisting}
/*
fz_new_stext_device: Create a device to extract the text ...
...e(fz_context *ctx, fz_stext_sheet *sheet, fz_stext_page *page);
\end{lstlisting}

This can be used as the basis for searching (including highlighting the text as matches are found), for exporting text files (or text and image based files such as HTML), or even to do more complex page analysis (such as spotting what regions of the page are text, what are graphics etc).

An (initially empty) fz_stext_sheet should be created using fz_new_stext_sheet, and an empty fz_stext_page created using fz_new_stext_page. These are used in the call to fz_new_stext_device. After the contents have been run to that device, the sheet will be populated with the common styles used by the page, and the page will be populated with details of the text extracted and its position.