GroupDocs.Parser for Java 22.11 Release Notes
Full List of Issues Covering all Changes in this Release
Key | Summary | Category |
---|---|---|
PARSERNET-1961 | Implement the ability to use OCR for images and PDF documents | New Feature |
PARSERNET-1903 | Implement the support for attachment extraction from presentations | New Feature |
PARSERNET-1904 | Implement the support for attachment extraction from spreadsheets | New Feature |
PARSERNET-1905 | Implement the support for attachment extraction from word processing documents | New Feature |
Public API and Backward Incompatible Changes
Implement the support for attachment extraction from presentations, spreadsheets and word processing documents
Description
These features provide the ability to extract attachments from documents.
Public API changes
No public API changes.
Usage
The following example shows how to extract a text from document attachments:
// Create an instance of Parser class
try (Parser parser = new Parser(fileName)) {
// Extract attachments from the container
Iterable<ContainerItem> attachments = parser.getContainer();
// Check if container extraction is supported
if (attachments == null) {
System.out.println("Container extraction isn't supported");
}
// Iterate over zip entities
for (ContainerItem item : attachments) {
// Print the file path
System.out.println(item.getFilePath());
// Print metadata
for (MetadataItem metadata : item.getMetadata()) {
System.out.println(String.format("%s: %s", metadata.getName(), metadata.getValue()));
}
try {
// Create Parser object for the zip entity content
try (Parser attachmentParser = item.openParser()) {
// Extract an zip entity text
try (TextReader reader = attachmentParser.getText()) {
System.out.println(reader == null ? "No text" : reader.readToEnd());
}
}
} catch (UnsupportedDocumentFormatException ex) {
System.out.println("Isn't supported.");
}
}
}
Implement the ability to use OCR for images and PDF documents
Description
This feature provides the ability to extract a text and text areas using OCR.
Public API changes
GroupDocs.Parser.Options.Features public class was updated with changes as follows:
GroupDocs.Parser.Options.PageTextAreaOptions public class was updated with changes as follows:
- Added PageTextAreaOptions(bool) and PageTextAreaOptions(bool, OcrOptions) constructors;
- Added isUseOcr and OcrOptions properties.
GroupDocs.Parser.Options.TextOptions public class was updated with changes as follows:
- Added TextOptions(bool, bool) and TextOptions(bool, bool, OcrOptions) constructors;
- Added isUseOcr and OcrOptions properties.
GroupDocs.Parser.Options.ParserSettings public class was updated with changes as follows:
- Added ParserSettings(OcrConnectorBase) and ParserSettings(ILogger, OcrConnectorBase) constructors;
- Added OcrConnector property.
GroupDocs.Parser.Options.Parser public class was updated with changes as follows:
OcrConnectorBase, OcrEventHandler, OcrOptions classes were added into GroupDocs.Parser.Options namespace.
Usage
The following example shows how to extract a text from the image file:
// Create an instance of ParserSettings class with OCR Connector
ParserSettings settings = new ParserSettings(new AsposeOcrOnPremise());
// Create an instance of Parser class with settings
try (Parser parser = new Parser(Constants.SampleScan, settings)) {
// Create an instance of TextOptions to use OCR
TextOptions options = new TextOptions(false, true);
// Extract a text using OCR
try (TextReader reader = parser.getText(options)) {
// Print a text or 'not supported' message
System.out.println(reader == null ? "Text extraction isn't supported" : reader.readToEnd());
}
}
See OCR Usage Basics for more details.