چگونه جداول را در فایل‌های OneNote با استفاده از پایتون تجزیه کنیم

Microsoft OneNote به کاربران اجازه می‌دهد جداول داده‌های ساختاریافته را مستقیماً در صفحات جاسازی کنند. Aspose.Note FOSS for Python هر جدول را از طریق سلسله‌مراتب Table → TableRow → TableCell در دسترس قرار می‌دهد و دسترسی برنامه‌نویسی به تمام محتوای سلول‌ها، متادیتای ستون‌ها و برچسب‌های جدول را فراهم می‌کند.

مزایا

دسترسی ساختاریافته: شمارش ردیف و ستون، محتوای سلول‌های منفرد، عرض ستون‌ها
بدون نیاز به برنامه صفحه‌گسترده: استخراج داده‌های جدول از OneNote در هر پلتفرمی
رایگان و منبع باز: مجوز MIT، بدون کلید API

راهنمای گام به گام

مرحله 1: نصب Aspose.Note FOSS برای Python

pip install aspose-note

مرحله ۲: بارگذاری فایل .one

from aspose.note import Document

doc = Document("MyNotes.one")
print(f"Pages: {len(list(doc))}")

مرحله ۳: یافتن تمام جداول

از GetChildNodes(Table) برای بازیابی تمام جداول از کل سند به‌صورت بازگشتی استفاده کنید:

from aspose.note import Document, Table

doc = Document("MyNotes.one")
tables = doc.GetChildNodes(Table)
print(f"Found {len(tables)} table(s)")

مرحله ۴: خواندن مقادیر ردیف و سلول

گره‌های TableRow و TableCell را تکرار کنید. هر سلول شامل گره‌های RichText است که ویژگی .Text آن‌ها محتوای متن ساده را ارائه می‌دهد:

from aspose.note import Document, Table, TableRow, TableCell, RichText

doc = Document("MyNotes.one")

for t, table in enumerate(doc.GetChildNodes(Table), start=1):
    print(f"\nTable {t}: {len(table.Columns)} column(s)")
    for r, row in enumerate(table.GetChildNodes(TableRow), start=1):
        cell_values = []
        for cell in row.GetChildNodes(TableCell):
            text = " ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
            cell_values.append(text)
        print(f"  Row {r}: {cell_values}")

مرحله 5: خواندن عرض ستون‌ها

from aspose.note import Document, Table

doc = Document("MyNotes.one")
for i, table in enumerate(doc.GetChildNodes(Table), start=1):
    print(f"Table {i} column widths (pts): {[col.Width for col in table.Columns]}")
    print(f"Borders visible: {table.IsBordersVisible}")

مرحله 6: خروجی به CSV

import csv, io
from aspose.note import Document, Table, TableRow, TableCell, RichText

doc = Document("MyNotes.one")
buf = io.StringIO()
writer = csv.writer(buf)

for table in doc.GetChildNodes(Table):
    for row in table.GetChildNodes(TableRow):
        values = [
            " ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
            for cell in row.GetChildNodes(TableCell)
        ]
        writer.writerow(values)
    writer.writerow([])   # blank row between tables

with open("tables.csv", "w", encoding="utf-8", newline="") as f:
    f.write(buf.getvalue())
print("Saved tables.csv")

مسائل رایج و راه‌حل‌ها

جداول خالی به نظر می‌رسند

Cause: سلول‌ها حاوی گره‌های Image به جای گره‌های RichText هستند.

بررسی:

from aspose.note import Document, Table, TableRow, TableCell, RichText, Image

doc = Document("MyNotes.one")
for table in doc.GetChildNodes(Table):
    for row in table.GetChildNodes(TableRow):
        for cell in row.GetChildNodes(TableCell):
            texts = cell.GetChildNodes(RichText)
            images = cell.GetChildNodes(Image)
            print(f"  Cell: {len(texts)} text(s), {len(images)} image(s)")

تعداد ستون‌ها مطابقت ندارد `Columns`

table.Columns متادیتای ستون ذخیره‌شده در فایل را نشان می‌دهد. تعداد واقعی سلول‌ها در هر ردیف ممکن است متفاوت باشد اگر ردیف‌ها سلول‌های ترکیبی داشته باشند (قالب فایل این را در سطح باینری ذخیره می‌کند؛ API عمومی پرچم ترکیب را نشان نمی‌دهد).

ImportError: ماژولی به نام ‘aspose’ وجود ندارد

pip install aspose-note
pip show aspose-note  # confirm it is installed in the active environment

سوالات متداول

آیا می‌توانم داده‌های جدول را ویرایش کنم و دوباره ذخیره کنم؟ خیر. نوشتن مجدد به فرمت .one پشتیبانی نمی‌شود. تغییرات ایجاد شده در حافظه (مثلاً از طریق RichText.Replace()) نمی‌توانند در فایل منبع حفظ شوند.

آیا سلول‌های ادغام‌شده شناسایی می‌شوند؟ API CompositeNode متادیتای ادغام را افشا نمی‌کند. هر TableCell به‌عنوان یک سلول جداگانه در نظر گرفته می‌شود، صرف‌نظر از ادغام بصری.

آیا می‌توانم تعداد ردیف‌های یک جدول را بشمارم؟ بله: len(table.GetChildNodes(TableRow)).

منابع مرتبط: