How to Parse Tables in OneNote Files Using Python

How to Parse Tables in OneNote Files Using Python

This guide shows how to extract and process table data from Microsoft OneNote .one files in Python using Document.GetChildNodes() to walk the Table → TableRow → TableCell hierarchy. Aspose.Note FOSS for Python provides programmatic access to all cell content, column metadata, and table tags.

Benefits

  1. Structured access: row and column counts, individual cell content, column widths
  2. No spreadsheet app required: extract table data from OneNote on any platform
  3. Free and open-source: MIT license, no API key

Step-by-Step Guide

Step 1: Install Aspose.Note FOSS for Python

Install the Aspose.Note FOSS package from PyPI using the pip command below:

pip install aspose-note

Step 2: Load the .one File

Instantiate Document with the path to a .one file to load its full content tree into memory:

from aspose.note import Document

doc = Document("MyNotes.one")
print(f"Pages: {len(list(doc))}")

Step 3: Find All Tables

Use GetChildNodes(Table) to retrieve every table from the entire document recursively:

from aspose.note import Document, Table

doc = Document("MyNotes.one")
tables = doc.GetChildNodes(Table)
print(f"Found {len(tables)} table(s)")

Step 4: Read Row and Cell Values

Iterate TableRow and TableCell nodes. Each cell contains RichText nodes whose .Text property gives the plain-text content:

from aspose.note import Document, Table, TableRow, TableCell, RichText

doc = Document("MyNotes.one")

for t, table in enumerate(doc.GetChildNodes(Table), start=1):
    print(f"\nTable {t}: {len(table.Columns)} column(s)")
    for r, row in enumerate(table.GetChildNodes(TableRow), start=1):
        cell_values = []
        for cell in row.GetChildNodes(TableCell):
            text = " ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
            cell_values.append(text)
        print(f"  Row {r}: {cell_values}")

Step 5: Read Column Widths

Read column widths and border visibility from the table.Columns collection by iterating with enumerate():

from aspose.note import Document, Table

doc = Document("MyNotes.one")
for i, table in enumerate(doc.GetChildNodes(Table), start=1):
    print(f"Table {i} column widths (pts): {[col.Width for col in table.Columns]}")
    print(f"Borders visible: {table.IsBordersVisible}")

Step 6: Export to CSV

Export all table data to a CSV file by writing each row’s cell text values using the csv.writer module:

import csv, io
from aspose.note import Document, Table, TableRow, TableCell, RichText

doc = Document("MyNotes.one")
buf = io.StringIO()
writer = csv.writer(buf)

for table in doc.GetChildNodes(Table):
    for row in table.GetChildNodes(TableRow):
        values = [
            " ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
            for cell in row.GetChildNodes(TableCell)
        ]
        writer.writerow(values)
    writer.writerow([])   # blank row between tables

with open("tables.csv", "w", encoding="utf-8", newline="") as f:
    f.write(buf.getvalue())
print("Saved tables.csv")

Common Issues and Fixes

Tables appear empty

Cause: The cells contain Image nodes rather than RichText nodes.

Check: Use the following code to count RichText and Image nodes per cell and diagnose why table cells appear empty:

from aspose.note import Document, Table, TableRow, TableCell, RichText, Image

doc = Document("MyNotes.one")
for table in doc.GetChildNodes(Table):
    for row in table.GetChildNodes(TableRow):
        for cell in row.GetChildNodes(TableCell):
            texts = cell.GetChildNodes(RichText)
            images = cell.GetChildNodes(Image)
            print(f"  Cell: {len(texts)} text(s), {len(images)} image(s)")

Column count doesn’t match Columns

table.Columns reflects the column metadata stored in the file. The actual number of cells per row may differ if rows have merged cells (the file format stores this at the binary level; the public API does not expose a merge flag).

ImportError: No module named ‘aspose’

pip install aspose-note
pip show aspose-note  # confirm it is installed in the active environment

Frequently Asked Questions

Can I edit table data and save it back? No. Writing back to .one format is not supported. Changes made in-memory (e.g. via RichText.Replace()) cannot be persisted to the source file.

Are merged cells detected? The CompositeNode API does not expose merge metadata. Each TableCell is treated as a separate cell regardless of visual merging.

Can I count how many rows a table has? Yes: len(table.GetChildNodes(TableRow)).


Related Resources:

See Also

 English