How to Parse MSG Files in Python
aspose-email-foss for Python provides a pure-Python API for reading Outlook MSG files without Microsoft Office dependencies. Load a file into a MapiMessage object to access all message data.
Step-by-Step Guide
Step 1: Install the Package
pip install aspose-email-fossRequires Python 3.10 or later.
Step 2: Import the MapiMessage Class
from aspose.email_foss.msg.message import MapiMessageStep 3: Load an MSG File
msg = MapiMessage.from_file("message.msg")For lenient parsing of malformed files, pass strict=False:
msg = MapiMessage.from_file("message.msg", strict=False)Step 4: Access Message Properties
print(f"Subject: {msg.subject()}")
print(f"Body: {msg.body()}")
print(f"HTML Body: {msg.body_html()[:200] if msg.body_html() else 'None'}")
print(f"Message Class: {msg.message_class()}")Step 5: List Attachments
for att in msg.iter_attachments_info():
name = att.storage_name()
is_embedded = att.is_embedded_message()
print(f"Attachment: {name}, embedded={is_embedded}")Step 6: Inspect Low-Level CFB Structure
from aspose.email_foss.cfb.reader import CFBReader
reader = CFBReader.from_file("message.msg")
print(f"Directory entries: {reader.directory_entry_count()}")
for entry in reader.iter_streams():
print(f" Stream: {entry}")
reader.close()Common Issues and Fixes
CFBError when loading
The file is not a valid CFB container. Verify it is an actual Outlook MSG file, not an EML.
Body is empty but HTML body has content
Some messages store content only in HTML. Check msg.body_html() when msg.body() returns None.
Validation warnings
Call msg.validation_issues() to see a tuple of compliance warnings for the loaded file.
Frequently Asked Questions (FAQ)
Can I read EML files?
Not directly. The library handles MSG (CFB) format. Convert EML content to an EmailMessage object first, then use MapiMessage.from_email_message().
Does loading read all attachment data into memory?
No. Attachment metadata is read on demand. Use iter_attachments_info() for lightweight iteration.
Is it thread-safe?
Each MapiMessage instance is independent. Concurrent reads from separate instances are safe.