Browse our Products

If so you can download any of the below versions for testing. The product will function as normal except for an evaluation limitation. At the time of purchase we provide a license file via email that will allow the product to work in its full capacity. If you would also like an evaluation license to test without any restrictions for 30 days, please follow the directions provided here.

Download JAR for Text Extraction & Parsing via Java High Code API

GroupDocs.Parser for Java is a fascinating document text extraction API. It extracts text and metadata from Microsoft Word, Excel, PowerPoint, email messages, container files that contain other files like ZIP archives, plain text files and HTML without any of these document reader installed. Text extractor API performs operations with unprecedented accuracy and speed. API also provides convenient tools to detect encoding such as UTF32 LE, UTF32 BE, UTF16 LE , UTF16 BE and more


Get Started

GroupDocs.Parser for Java does not require any external software or third party tool to be installed. Just follow one of the ways as described in Installation and Configuration.

You can easily use GroupDocs.Parser for Java API directly in your Maven based project by adding the following configurations to the pom.xml.

<repository>
    <id>groupdocs-artifacts-repository</id>
    <name>GroupDocs Artifacts Repository</name>
    <url>https://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
    <groupId>com.groupdocs</groupId>
    <artifactId>groupdocs-parser</artifactId>
    <version>22.6</version>
</dependency>

Product Page | Docs | Demos | API Reference | Examples | Blog | Free Support | Temporary License

Why download GroupDocs.Parser for Java?

GroupDocs.Parser for Java is on-premise API that enable your Java applications to parse and extract data from various type of file formats. It allows you to extract hyperlinks, tables, barcodes, text, images, as well as data extraction from ZIP archives, email Archives, PDF portfolios, & databases. GroupDocs.Parser for Java can be used to define user-defined templates containing fixed, regex, & linked field positions for accurate data extraction.

Text Extraction & Parsing Java On-Premise API Features

  • Document parsing via user-defined template
    • Create a user-defined template with data field & table definitions.
    • Parse documents via user-defined templates and extract data, such as, invoices, tables, etc.
  • Supports extraction of various text elements, such as:
    • Plain text extraction
    • Formatted text extraction as simple text, HTML or Markdown (MD)
    • Structured text extraction in the XML form
    • Text Area extraction as per specific coordinates, text style
    • Extract text around (in context of) a specific word
  • Supports various extraction modes, such as:
    • Accurate Text Extraction Mode: The default text extraction mode with the best possible text quality.
    • Raw Text Extraction Mode: The extraction mode with better performance but the text quality is not as accurate as the aforementioned mode.
  • Extract the text of the whole document or extract only the desired document page.
  • Ability to search documents using specific keywords or via regular expression.
  • Supports metadata extraction & image extraction from Microsoft Word®, Excel®, PowerPoint®, PDF® & other document types.
  • Extract table of contents (TOC) from Microsoft Office® Word® & EPUB eBook formats.
  • Ability to extract data from containers (Archives), such as, ZIP, PDF portfolios, OST containers, etc.
  • Ability to iterate through the form fields and extract PDF Form data.
  • Extract data from databases (e.g. Sqlite) via JDBC.
  • Extract information from Microsoft OneNote® notebooks.
  • Extract all hyper-links from whole document or from specific page or from a specific page area only.

Supported Document Parser File Formats

Microsoft Word®: DOC/DOT/DOCX/DOCM/DOTX/DOTM/RTF/TXT
OpenOffice Writer®: ODT/OTT/ Microsoft Excel®: XLS/XLT/XLSX/XLSM/XLSB/XLTX/XLTM/XLA/XLAM
OpenOffice Calc®: ODS/OTS/CSV
Apple® iWork: NUMBERS
Microsoft PowerPoint®: PPT/PPS/POT/PPTX/PPTM/POTX/POTM/PPSX/PPSM
OpenOffice Impress®: ODP/OTP
Microsoft Outlook®: PST/OST/EML/MSG
Apple® Mail Message: EMLX
Microsoft OneNote®: ONE
Fixed Layout: PDF
Postscript: PS
Markup: XHTML/MHTML/MD/XML
eBook: CHM/EPUB/FB2
Archive: ZIP/RAR/TAR/GZ/BZ2
Image: BMP/GIF/JPG/JPEG/JPE/JP2/PNG/TIF/TIFF/DJVU/J2K/WEBP
Vector: SVG/SVGZ
Adobe Photoshop®: PSD
Medical Imaging: DICOM
Metadata: EMF/WMF
Database: JDBC

For details and limitations please visit, Supported Document Formats.

System Requirements

  • Microsoft Windows®: Windows Desktop & Server (x86, x64), Microsoft Azure
  • macOS: Mac OS X
  • Linux: Ubuntu, OpenSUSE, CentOS, and others
  • Java Versions: J2SE 7.0 (1.7), J2SE 8.0 (1.8) or above (for example Java 10)

Product Page | Docs | Demos | API Reference | Examples | Blog | Free Support | Temporary License



Direct Download

GroupDocs.Parser for Java 19.5

This ZIP file contains only assemblies for GroupDocs.Parser for Java 19.5

Added: 5/29/2019 Downloads:

Download

File Size: 102.8 MB

GroupDocs.Parser for Java 22.6

This ZIP file contains only assemblies for GroupDocs.Parser for Java

Added: 6/7/2022 Downloads:

Download

File Size: 137.8 MB

GroupDocs.Parser for Java 22.3

This ZIP file contains only assemblies for GroupDocs.Parser for Java

Added: 3/18/2022 Downloads:

Download

File Size: 137.8 MB

GroupDocs.Parser for Java 21.2

This ZIP file contains only assemblies for GroupDocs.Parser for Java

Added: 2/27/2021 Downloads:

Download

File Size: 123.8 MB

GroupDocs.Parser for Java 20.12

This ZIP file contains only assemblies for GroupDocs.Parser for Java

Added: 12/30/2020 Downloads:

Download

File Size: 117.7 MB

GroupDocs.Parser for Java 20.8

This ZIP file contains only assemblies for GroupDocs.Parser for Java 20.8

Added: 8/19/2020 Downloads:

Download

File Size: 118.0 MB

GroupDocs.Parser for Java 20.6

This ZIP file contains only assemblies for GroupDocs.Parser for Java 20.6

Added: 6/30/2020 Downloads:

Download

File Size: 113.4 MB

GroupDocs.Parser for Java 20.5

This ZIP file contains only assemblies for GroupDocs.Parser for Java 20.5

Added: 5/14/2020 Downloads:

Download

File Size: 101.2 MB

GroupDocs.Parser for Java 20.3

This ZIP file contains only assemblies for GroupDocs.Parser for Java 20.3

Added: 4/1/2020 Downloads:

Download

File Size: 101.0 MB

GroupDocs.Parser for Java 20.1

This ZIP file contains only assemblies for GroupDocs.Parser for Java 20.1

Added: 2/4/2020 Downloads:

Download

File Size: 101.0 MB

GroupDocs.Parser for Java 19.11

This ZIP file contains only assemblies for GroupDocs.Parser for Java 19.11

Added: 12/3/2019 Downloads:

Download

File Size: 104.4 MB

GroupDocs.Parser for Java 18.12

This ZIP file contains only assemblies for GroupDocs.Parser for Java 18.12

Added: 12/11/2018 Downloads:

Download

File Size: 97.9 MB

GroupDocs.Parser for Java 18.11

This ZIP file contains only assemblies for GroupDocs.Parser for Java 18.11

Added: 11/8/2018 Downloads:

Download

File Size: 96.9 MB