GroupDocs.Parser for Python via .NET 25.12 Release Notes

We’re happy to announce the first release of GroupDocs.Parser for Python via .NET 25.12 – a powerful document parsing and data extraction library that enables Python developers to extract text, images, attachments, barcodes, and structured content from a wide range of document formats.

What’s New in This Release

This initial release introduces GroupDocs.Parser for Python via .NET, bringing comprehensive document parsing capabilities to Python developers through the .NET-powered parsing engine.

Major Features

Text Extraction – Extract plain or formatted text from PDFs, Office documents, emails, e‑books, archives and more
Advanced Search – Page‑level access with advanced search options including case‑sensitive, whole‑word, and regex support
Structured Content Parsing – Parse document structure including headings, paragraphs, tables, and text areas
Template Parsing – Use templates to extract strongly‑typed fields from invoices, receipts and other business documents
Image Extraction – Extract embedded images from supported document and image formats
Attachment Extraction – Extract file attachments from documents
Barcode Scanning – Scan and extract barcodes from documents
OCR Support – OCR functionality to read text from scanned PDFs and raster images with optional spell‑checking
Metadata Extraction – Extract document metadata including author, creation date, and other properties
Table of Contents – Extract table of contents from supported document formats
Hyperlink Extraction – Extract hyperlinks from documents (limited support)

Supported Document Formats

This release supports a comprehensive range of document families:

Word Processing – DOC, DOCX, RTF, TXT, ODT
PDF & Markup – PDF, HTML/MHTML, Markdown, XML
Spreadsheets – XLS, XLSX, ODS, CSV
Presentations – PPT, PPTX, ODP
Email & Notes – PST, OST, EML, MSG, ONE
eBooks & Web Content – EPUB, MOBI, AZW3, CHM, FB2
Images – JPEG, PNG, TIFF, GIF, BMP, SVG
Archives & Containers – ZIP, RAR, 7Z, TAR, GZ, BZ2

Platform Support

Windows, Linux, and macOS
Python 3.5+

Installation

Download the package for your platform from the GroupDocs Releases website:

Windows x64 – GroupDocs.Parser for Python via .NET 25.12 Windows x64
Windows x32 – GroupDocs.Parser for Python via .NET 25.12 Windows x32
Linux – GroupDocs.Parser for Python via .NET 25.12 Linux
macOS – GroupDocs.Parser for Python via .NET 25.12 MacOs
macOS ARM – GroupDocs.Parser for Python via .NET 25.12 MacOs Arm

After downloading the appropriate WHL package for your platform, install it using pip:

pip install groupdocs_parser_net-25.12-*.whl

Getting Started

Quick example to extract text from a PDF:

from groupdocs.parser import Parser

# Create a Parser instance for your document
with Parser("sample.pdf") as parser:
    # Extract text from the document
    text = parser.GetText()
    
    # Print all extracted text to the console
    print(text)