GroupDocs.Parser for Python via .NET 25.12 Release Notes
We’re happy to announce the first release of GroupDocs.Parser for Python via .NET 25.12 – a powerful document parsing and data extraction library that enables Python developers to extract text, images, attachments, barcodes, and structured content from a wide range of document formats.
What’s New in This Release
This initial release introduces GroupDocs.Parser for Python via .NET, bringing comprehensive document parsing capabilities to Python developers through the .NET-powered parsing engine.
Major Features
- Text Extraction – Extract plain or formatted text from PDFs, Office documents, emails, e‑books, archives and more
- Advanced Search – Page‑level access with advanced search options including case‑sensitive, whole‑word, and regex support
- Structured Content Parsing – Parse document structure including headings, paragraphs, tables, and text areas
- Template Parsing – Use templates to extract strongly‑typed fields from invoices, receipts and other business documents
- Image Extraction – Extract embedded images from supported document and image formats
- Attachment Extraction – Extract file attachments from documents
- Barcode Scanning – Scan and extract barcodes from documents
- OCR Support – OCR functionality to read text from scanned PDFs and raster images with optional spell‑checking
- Metadata Extraction – Extract document metadata including author, creation date, and other properties
- Table of Contents – Extract table of contents from supported document formats
- Hyperlink Extraction – Extract hyperlinks from documents (limited support)
Supported Document Formats
This release supports a comprehensive range of document families:
- Word Processing – DOC, DOCX, RTF, TXT, ODT
- PDF & Markup – PDF, HTML/MHTML, Markdown, XML
- Spreadsheets – XLS, XLSX, ODS, CSV
- Presentations – PPT, PPTX, ODP
- Email & Notes – PST, OST, EML, MSG, ONE
- eBooks & Web Content – EPUB, MOBI, AZW3, CHM, FB2
- Images – JPEG, PNG, TIFF, GIF, BMP, SVG
- Archives & Containers – ZIP, RAR, 7Z, TAR, GZ, BZ2
Platform Support
- Windows, Linux, and macOS
- Python 3.5+
Installation
Download the package for your platform from the GroupDocs Releases website:
- Windows x64 – GroupDocs.Parser for Python via .NET 25.12 Windows x64
- Windows x32 – GroupDocs.Parser for Python via .NET 25.12 Windows x32
- Linux – GroupDocs.Parser for Python via .NET 25.12 Linux
- macOS – GroupDocs.Parser for Python via .NET 25.12 MacOs
- macOS ARM – GroupDocs.Parser for Python via .NET 25.12 MacOs Arm
After downloading the appropriate WHL package for your platform, install it using pip:
pip install groupdocs_parser_net-25.12-*.whl
Getting Started
Quick example to extract text from a PDF:
from groupdocs.parser import Parser
# Create a Parser instance for your document
with Parser("sample.pdf") as parser:
# Extract text from the document
text = parser.GetText()
# Print all extracted text to the console
print(text)