How to Work with XML Processing in .NET

How to Work with XML Processing in .NET

XLSX files store their data in XML parts inside an OPC ZIP package. Aspose.Cells FOSS for .NET processes these through four XML mapper classes: WorkbookXmlMapper, WorksheetXmlMapper, SharedStringTableXmlMapper, and StylesheetXmlMapper. Understanding these classes helps you diagnose parsing failures and correctly configure fault-tolerant loading. Install with dotnet add package Aspose.Cells_FOSS.

Step-by-Step Guide

Step 1: Install the Package

dotnet add package Aspose.Cells_FOSS

Step 2: Import the Namespace

using Aspose.Cells_FOSS;

Step 3: Understand the XML Mapper Responsibilities

Each mapper handles one XML part of the XLSX structure:

MapperXML PartHandles
WorkbookXmlMapperxl/workbook.xmlWorkbook metadata, sheet list, defined names
WorksheetXmlMapperxl/worksheets/sheetN.xmlCell data, formulas, hyperlinks, validations, conditional formats
SharedStringTableXmlMapperxl/sharedStrings.xmlDe-duplicated string values
StylesheetXmlMapperxl/styles.xmlCell styles, fonts, fills, borders

These mappers are invoked automatically during Workbook construction and Save(). You do not instantiate them directly in application code.


Step 4: Handle XmlParsingException

XmlParsingException is thrown when a mapper encounters malformed XML that cannot be repaired. Enable TryRepairXml = true in LoadOptions to activate the mapper’s fault-tolerant parsing path.

using Aspose.Cells_FOSS;

var opts = new LoadOptions
{
    TryRepairPackage = true,
    TryRepairXml = true,
};

try
{
    var wb = new Workbook("malformed.xlsx", opts);
    Console.WriteLine("Loaded: " + wb.Worksheets.Count + " sheet(s)");

    var diag = wb.LoadDiagnostics;
    if (diag.HasRepairs)
        Console.WriteLine("XML repairs applied. Data loss risk: " + diag.HasDataLossRisk);
}
catch (XmlParsingException ex)
{
    Console.WriteLine("Unrecoverable XML error: " + ex.Message);
}
catch (WorkbookLoadException ex)
{
    Console.WriteLine("Load failed: " + ex.Message);
}

Step 5: Use LoadDiagnostics to Identify XML Issues

After a successful load, check LoadDiagnostics.Issues for DiagnosticEntry records to understand which XML repairs were applied and whether any resulted in data loss.

using Aspose.Cells_FOSS;

var opts = new LoadOptions { TryRepairXml = true };
var wb = new Workbook("file.xlsx", opts);
var diag = wb.LoadDiagnostics;

foreach (var entry in diag.Issues)
{
    Console.WriteLine($"[{entry.Severity}] {entry.Code}");
    Console.WriteLine($"  Message: {entry.Message}");
    Console.WriteLine($"  RepairApplied: {entry.RepairApplied}  DataLossRisk: {entry.DataLossRisk}");
}

Common Issues and Fixes

XmlParsingException even with TryRepairXml = true. The XML is so malformed that the fault-tolerant parser cannot recover it. This can happen with files created by non-standard tools that produce syntactically invalid XML. There is no recovery path for these files.

Styles are missing after load. The StylesheetXmlMapper may have encountered a corrupt xl/styles.xml. Check LoadDiagnostics.Issues for entries with code related to styles, and DataLossRisk = true for affected cells.

Shared strings appear as empty cells. A corrupt xl/sharedStrings.xml can cause cells that reference the shared string table to render as empty. Enable TryRepairXml to attempt recovery.

Frequently Asked Questions

Can I implement a custom XML mapper?

No. The XML mapper classes are sealed internal infrastructure and are not designed for extension.

Why is the SharedStringTableXmlMapper separate?

The OOXML specification separates repeated string values into a shared string table to reduce file size. The mapper handles reading and writing this table independently from cell data.

Does TryRepairXml fix all XML parsing issues?

TryRepairXml handles recoverable errors such as unclosed elements, missing namespaces, and truncated attribute values. Structurally valid but semantically inconsistent XML (e.g. formula tokens referencing non-existent cells) will still parse without error.

See Also

 English