How to Work with Content Stream Operators in Java

How to Work with Content Stream Operators in Java

Accessing Page Content

Each Page in a Document holds its graphical content in a content stream. The stream contains a sequence of PDF operators that define text positioning, drawing commands, and resource references.

Accessing the Content Stream

Read the raw content stream bytes from a page. The bytes contain the sequence of operators that render the page’s visual content:

try (Document doc = new Document("input.pdf")) {
    Page page = doc.getPages().get(1);
    // The content stream bytes are accessible via page.getContents()
    byte[] contentBytes = page.getContents().toByteArray();
}

ContentStreamParser

ContentStreamParser tokenizes and parses a content stream into operator objects, making it possible to inspect each drawing command in the stream:

ContentStreamParser parser = new ContentStreamParser(contentBytes);
// Iterate over parsed operators

ContentStreamBuilder

ContentStreamBuilder creates new content streams for embedding into a page. Use it when constructing a page’s graphical content programmatically rather than modifying an existing stream. After building the stream, assign it back to the page and save the document.

Operator Reference

PDF content stream operators are defined in ISO 32000-1:2008, Table A.1. Common operators include text operators (BT, ET, Tf, Tj), graphics state operators (q, Q, cm), and path operators (m, l, h, f, S).

See Also