How to Extract Text from OneNote Files in Python
Microsoft OneNote .one files are binary documents that cannot be read as plain text or parsed with generic XML tools. Aspose.Note FOSS for Python provides a pure-Python parser that loads .one files into a full document object model (DOM), making it straightforward to extract text, formatting metadata, and hyperlinks programmatically.
Benefits of Using Aspose.Note FOSS for Python
- No Microsoft Office required — read
.onefiles on any platform, including Linux CI/CD servers - Full text and formatting access — plain text, bold/italic/underline runs, font properties, and hyperlink URLs
- Free and open-source — MIT license, no usage fees or API keys
Step-by-Step Guide
Common Issues and Fixes
1. ImportError: No module named ‘aspose’
Cause: The package is not installed in the active Python environment.
Fix:
pip install aspose-note
##Confirm active environment:
pip show aspose-note2. FileNotFoundError when loading .one file
Cause: The file path is incorrect or the file does not exist.
Fix: Use an absolute path or verify the file exists before loading:
from pathlib import Path
from aspose.note import Document
path = Path("MyNotes.one")
if not path.exists():
raise FileNotFoundError(f"File not found: {path.resolve()}")
doc = Document(str(path))3. UnicodeEncodeError on Windows when printing
Cause: Windows terminals may use a legacy encoding that cannot render Unicode characters.
Fix: Reconfigure stdout at the start of your script:
import sys
if hasattr(sys.stdout, "reconfigure"):
sys.stdout.reconfigure(encoding="utf-8", errors="replace")4. Empty text results
Cause: The .one file may be empty, contain only images or tables (no RichText nodes), or be a notebook file (.onetoc2) rather than a section file (.one).
Fix: Check the page count and inspect node types:
from aspose.note import Document
doc = Document("MyNotes.one")
print(f"Pages: {doc.Count()}")
for page in doc:
print(f" Children: {sum(1 for _ in page)}")5. IncorrectPasswordException
Cause: The .one file is encrypted. Encrypted documents are not supported.
Fix: Aspose.Note FOSS for Python does not support encrypted .one files. The full-featured commercial Aspose.Note product supports decryption.
Frequently Asked Questions
Can I extract text from all pages at once?
Yes. doc.GetChildNodes(RichText) searches the entire document tree recursively, including all pages, outlines, and outline elements.
Does the library support .onetoc2 notebook files?
No. The library handles .one section files only. Notebook table-of-contents files (.onetoc2) are a different format and are not supported.
Can I extract text from tables?
Yes. TableCell nodes contain RichText children that can be read the same way:
from aspose.note import Document, Table, TableRow, TableCell, RichText
doc = Document("MyNotes.one")
for table in doc.GetChildNodes(Table):
for row in table.GetChildNodes(TableRow):
for cell in row.GetChildNodes(TableCell):
cell_text = " ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
print(cell_text, end="\t")
print()What Python versions are supported?
Python 3.10, 3.11, and 3.12.
Is the library thread-safe?
Each Document instance should be used from a single thread. For parallel extraction, create a separate Document per thread.
Related Resources: