4.0.0
com.groupdocs
groupdocs-parser
Groupdocs.Parser
https://products.groupdocs.com/parser
23.9
GroupDocs.Parser for Java is a useful parsing class library which allows to extract different data from documents of various formats. The data extraction API allows to extract quick raw or quality formatted text from PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX. The library will allow you to create document descriptive templates and apply them on documents, specific to your business workflow and extract required data.
Features:
* Extract both raw and formatted text associated with supported file formats with a few lines of code;
* Extract metadata associated with supported file formats with a few lines of code;
* Extract content from formats that contain attachments (PDF, Email) and extract name, path, media type and content;
* Support encrypted document formats;
* Extract structured text;
* Extract images;
* Text Analysis API;
* Extract PDF form data;
* Tools for encoding detection;
* Tools for media type detection;
* Document data parsing API by template;
* Zip archives support;
Supported document formats:
* Microsoft Word documents - DOC, DOT, DOCX, DOCM, DOTX, DOTM, TXT, RTF;
* Microsoft Excel spreadsheets - XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM,CSV, XLA, XLAM, XML;
* Microsoft PowerPoint presentations - PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM;
* Microsoft OneNote - ONE;
* Open Document formats - ODP, ODS, ODT, OTT;
* Portable Document Formats - PDF;
* Email - PST, OST, EML, EMLX, MSG;
* Ebook - EPUB, FB2, CHM;
* Archive - ZIP;
* Markup - HTML, XHTML, MHTML, MD, XML;
For more details on the GroupDocs.Parser for Java API, please visit GroupDocs website at:
https://www.groupdocs.com/products/parser/java
Note: GroupDocs.Parser for Java will run in evaluation mode. In order to test full features of the product, please request a free 30-day temporary license.