How to Parse MSG Files in Python
This guide shows how to load and parse Outlook MSG files in Python using MapiMessage.from_file(), access subject, body, recipients, and attachments, and inspect the low-level CFB structure with CFBReader.
Step-by-Step Guide
Step 1: Install the Package
Install the Aspose.Email FOSS package from PyPI using the pip command below (requires Python 3.10 or later):
pip install aspose-email-fossStep 2: Import the MapiMessage Class
Import MapiMessage from aspose.email_foss.msg to access the MSG loading and parsing API:
from aspose.email_foss.msg import MapiMessageStep 3: Load an MSG File
Call MapiMessage.from_file() with the path to an Outlook .msg file to load it into memory:
msg = MapiMessage.from_file("message.msg")For lenient parsing of malformed files, pass the optional strict=False keyword argument to suppress strict validation errors:
msg = MapiMessage.from_file("message.msg", strict=False)Step 4: Access Message Properties
Read the subject, body, body_html, and message_class properties to access message metadata and content:
print(f"Subject: {msg.subject}")
print(f"Body: {msg.body}")
print(f"HTML Body: {msg.body_html[:200] if msg.body_html else 'None'}")
print(f"Message Class: {msg.message_class}")Step 5: List Attachments
Iterate iter_attachments_info() to get the storage name and whether each attachment is an embedded message:
for att in msg.iter_attachments_info():
name = att.storage_name
is_embedded = att.is_embedded_message
print(f"Attachment: {name}, embedded={is_embedded}")Step 6: Inspect Low-Level CFB Structure
Use CFBReader.from_file() to inspect the raw Compound File Binary structure of the MSG file and iterate its directory streams:
from aspose.email_foss.cfb import CFBReader
reader = CFBReader.from_file("message.msg")
print(f"Directory entries: {reader.directory_entry_count}")
for entry in reader.iter_streams():
print(f" Stream: {entry}")
reader.close()Common Issues and Fixes
CFBError when loading
The file is a valid CFB container check. Verify the file is a genuine Outlook MSG file in binary format — plain-text RFC 5322 files require a conversion step via MapiMessage.from_email_message() first.
Body is empty but HTML body has content
Some messages store content only in HTML. Check msg.body_html when msg.body returns None.
Validation warnings
Access msg.validation_issues to see a tuple of compliance warnings for the loaded file.
Frequently Asked Questions (FAQ)
Can I read EML files?
Yes, via a conversion step. The library reads MSG (CFB) binary files directly. To load an EML string, first parse it with Python’s standard email.message_from_string() and then pass the result to MapiMessage.from_email_message().
Does loading read all attachment data into memory?
Yes. All attachment data including binary content is loaded into memory when MapiMessage.from_file() completes. iter_attachments_info() is a convenience iterator over the already-loaded attachments list.
Is it thread-safe?
Each MapiMessage instance is independent. Concurrent reads from separate instances are safe.