How to Parse MSG Files in Python

How to Parse MSG Files in Python

This guide shows how to load and parse Outlook MSG files in Python using MapiMessage.from_file(), access subject, body, recipients, and attachments, and inspect the low-level CFB structure with CFBReader.

Step-by-Step Guide

Step 1: Install the Package

Install the Aspose.Email FOSS package from PyPI using the pip command below (requires Python 3.10 or later):

pip install aspose-email-foss

Step 2: Import the MapiMessage Class

Import MapiMessage from aspose.email_foss.msg to access the MSG loading and parsing API:

from aspose.email_foss.msg import MapiMessage

Step 3: Load an MSG File

Call MapiMessage.from_file() with the path to an Outlook .msg file to load it into memory:

msg = MapiMessage.from_file("message.msg")

For lenient parsing of malformed files, pass the optional strict=False keyword argument to suppress strict validation errors:

msg = MapiMessage.from_file("message.msg", strict=False)

Step 4: Access Message Properties

Read the subject, body, body_html, and message_class properties to access message metadata and content:

print(f"Subject: {msg.subject}")
print(f"Body: {msg.body}")
print(f"HTML Body: {msg.body_html[:200] if msg.body_html else 'None'}")
print(f"Message Class: {msg.message_class}")

Step 5: List Attachments

Iterate iter_attachments_info() to get the storage name and whether each attachment is an embedded message:

for att in msg.iter_attachments_info():
    name = att.storage_name
    is_embedded = att.is_embedded_message
    print(f"Attachment: {name}, embedded={is_embedded}")

Step 6: Inspect Low-Level CFB Structure

Use CFBReader.from_file() to inspect the raw Compound File Binary structure of the MSG file and iterate its directory streams:

from aspose.email_foss.cfb import CFBReader

reader = CFBReader.from_file("message.msg")
print(f"Directory entries: {reader.directory_entry_count}")
for entry in reader.iter_streams():
    print(f"  Stream: {entry}")
reader.close()

Common Issues and Fixes

CFBError when loading

The file is a valid CFB container check. Verify the file is a genuine Outlook MSG file in binary format — plain-text RFC 5322 files require a conversion step via MapiMessage.from_email_message() first.

Body is empty but HTML body has content

Some messages store content only in HTML. Check msg.body_html when msg.body returns None.

Validation warnings

Access msg.validation_issues to see a tuple of compliance warnings for the loaded file.


Frequently Asked Questions (FAQ)

Can I read EML files?

Yes, via a conversion step. The library reads MSG (CFB) binary files directly. To load an EML string, first parse it with Python’s standard email.message_from_string() and then pass the result to MapiMessage.from_email_message().

Does loading read all attachment data into memory?

Yes. All attachment data including binary content is loaded into memory when MapiMessage.from_file() completes. iter_attachments_info() is a convenience iterator over the already-loaded attachments list.

Is it thread-safe?

Each MapiMessage instance is independent. Concurrent reads from separate instances are safe.

See Also

 English