Latest Release (December 2025)

We’re happy to announce the first release of GroupDocs.Parser for Python via .NET – a powerful document parsing and data extraction library that enables Python developers to extract text, images, attachments, barcodes, and structured content from a wide range of document formats.

What’s New

This initial release brings the full power of GroupDocs.Parser to Python developers through the .NET-powered parsing engine, providing a unified API for advanced document parsing and data extraction.

Key Features

  • Rich text extraction & search – Extract plain or formatted text from PDFs, Office documents, emails, e‑books, archives and more, with page‑level access and advanced search options (case‑sensitive, whole‑word, regex)
  • Structured content & templates – Parse document structure (headings, paragraphs, tables, text areas) and use templates to pull out strongly‑typed fields from invoices, receipts and other business documents
  • Images, attachments & barcodes – Extract embedded images, file attachments and barcodes from supported document and image formats
  • OCR for scanned documents – Use OCR to read text from scanned PDFs and raster images, optionally combining it with spell‑checking for better recognition quality
  • Wide format & platform support – Work with dozens of document, image and archive formats on Windows, Linux and macOS

Supported Document Formats

GroupDocs.Parser for Python via .NET supports a comprehensive range of document families:

  • Word processing – DOC, DOCX, RTF, TXT, ODT and others
  • PDF & markup – PDF, HTML/MHTML, Markdown, XML
  • Spreadsheets – XLS, XLSX, ODS, CSV and related formats
  • Presentations – PPT, PPTX, ODP and similar formats
  • Email & notes – PST, OST, EML, MSG, ONE
  • eBooks & web content – EPUB, MOBI, AZW3, CHM, FB2
  • Images – JPEG, PNG, TIFF, GIF, BMP, SVG and more
  • Archives & containers – ZIP, RAR, 7Z, TAR, GZ, BZ2

Quick Start

Download the package for your platform from the GroupDocs Releases website and install it using pip:

pip install groupdocs_parser_net-25.12-*.whl

Extract text from a document:

from groupdocs.parser import Parser

# Create a Parser instance for your document
with Parser("sample.pdf") as parser:
    # Extract text from the document
    text = parser.GetText()
    
    # Print all extracted text to the console
    print(text)

Resources