GroupDocs.Parser for Python via .NET 25.12 Release Notes

We’re happy to announce the first release of GroupDocs.Parser for Python via .NET 25.12 – a powerful document parsing and data extraction library that enables Python developers to extract text, images, attachments, barcodes, and structured content from a wide range of document formats.

What’s New in This Release

This initial release introduces GroupDocs.Parser for Python via .NET, bringing comprehensive document parsing capabilities to Python developers through the .NET-powered parsing engine.

Major Features

  • Text Extraction – Extract plain or formatted text from PDFs, Office documents, emails, e‑books, archives and more
  • Advanced Search – Page‑level access with advanced search options including case‑sensitive, whole‑word, and regex support
  • Structured Content Parsing – Parse document structure including headings, paragraphs, tables, and text areas
  • Template Parsing – Use templates to extract strongly‑typed fields from invoices, receipts and other business documents
  • Image Extraction – Extract embedded images from supported document and image formats
  • Attachment Extraction – Extract file attachments from documents
  • Barcode Scanning – Scan and extract barcodes from documents
  • OCR Support – OCR functionality to read text from scanned PDFs and raster images with optional spell‑checking
  • Metadata Extraction – Extract document metadata including author, creation date, and other properties
  • Table of Contents – Extract table of contents from supported document formats
  • Hyperlink Extraction – Extract hyperlinks from documents (limited support)

Supported Document Formats

This release supports a comprehensive range of document families:

  • Word Processing – DOC, DOCX, RTF, TXT, ODT
  • PDF & Markup – PDF, HTML/MHTML, Markdown, XML
  • Spreadsheets – XLS, XLSX, ODS, CSV
  • Presentations – PPT, PPTX, ODP
  • Email & Notes – PST, OST, EML, MSG, ONE
  • eBooks & Web Content – EPUB, MOBI, AZW3, CHM, FB2
  • Images – JPEG, PNG, TIFF, GIF, BMP, SVG
  • Archives & Containers – ZIP, RAR, 7Z, TAR, GZ, BZ2

Platform Support

  • Windows, Linux, and macOS
  • Python 3.5+

Installation

Download the package for your platform from the GroupDocs Releases website:

  • Windows x64 – GroupDocs.Parser for Python via .NET 25.12 Windows x64
  • Windows x32 – GroupDocs.Parser for Python via .NET 25.12 Windows x32
  • Linux – GroupDocs.Parser for Python via .NET 25.12 Linux
  • macOS – GroupDocs.Parser for Python via .NET 25.12 MacOs
  • macOS ARM – GroupDocs.Parser for Python via .NET 25.12 MacOs Arm

After downloading the appropriate WHL package for your platform, install it using pip:

pip install groupdocs_parser_net-25.12-*.whl

Getting Started

Quick example to extract text from a PDF:

from groupdocs.parser import Parser

# Create a Parser instance for your document
with Parser("sample.pdf") as parser:
    # Extract text from the document
    text = parser.GetText()
    
    # Print all extracted text to the console
    print(text)

Resources