How to Work with Content Stream Operators in Java
Accessing Page Content
Each Page in a Document holds its graphical content in a content stream. The stream
contains a sequence of PDF operators that define text positioning, drawing commands, and
resource references.
Accessing the Content Stream
Read the raw content stream bytes from a page. The bytes contain the sequence of operators that render the page’s visual content:
try (Document doc = new Document("input.pdf")) {
Page page = doc.getPages().get(1);
// The content stream bytes are accessible via page.getContents()
byte[] contentBytes = page.getContents().toByteArray();
}ContentStreamParser
ContentStreamParser tokenizes and parses a content stream into operator objects,
making it possible to inspect each drawing command in the stream:
ContentStreamParser parser = new ContentStreamParser(contentBytes);
// Iterate over parsed operatorsContentStreamBuilder
ContentStreamBuilder creates new content streams for embedding into a page.
Use it when constructing a page’s graphical content programmatically rather than
modifying an existing stream. After building the stream, assign it back to the page
and save the document.
Operator Reference
PDF content stream operators are defined in ISO 32000-1:2008, Table A.1. Common
operators include text operators (BT, ET, Tf, Tj), graphics state operators
(q, Q, cm), and path operators (m, l, h, f, S).