GroupDocs.Parser for Java 22.11 Release Notes

This page contains release notes for GroupDocs.Parser for Java 22.11

Full List of Issues Covering all Changes in this Release

Key	Summary	Category
PARSERNET-1961	Implement the ability to use OCR for images and PDF documents	New Feature
PARSERNET-1903	Implement the support for attachment extraction from presentations	New Feature
PARSERNET-1904	Implement the support for attachment extraction from spreadsheets	New Feature
PARSERNET-1905	Implement the support for attachment extraction from word processing documents	New Feature

Public API and Backward Incompatible Changes

Implement the support for attachment extraction from presentations, spreadsheets and word processing documents

Description

These features provide the ability to extract attachments from documents.

Public API changes

No public API changes.

Usage

The following example shows how to extract a text from document attachments:

// Create an instance of Parser class
try (Parser parser = new Parser(fileName)) {
    // Extract attachments from the container
    Iterable<ContainerItem> attachments = parser.getContainer();
    // Check if container extraction is supported
    if (attachments == null) {
        System.out.println("Container extraction isn't supported");
    }
    // Iterate over zip entities
    for (ContainerItem item : attachments) {
        // Print the file path
        System.out.println(item.getFilePath());
        // Print metadata
        for (MetadataItem metadata : item.getMetadata()) {
            System.out.println(String.format("%s: %s", metadata.getName(), metadata.getValue()));
        }
        try {
            // Create Parser object for the zip entity content
            try (Parser attachmentParser = item.openParser()) {
                // Extract an zip entity text
                try (TextReader reader = attachmentParser.getText()) {
                    System.out.println(reader == null ? "No text" : reader.readToEnd());
                }
            }
        } catch (UnsupportedDocumentFormatException ex) {
            System.out.println("Isn't supported.");
        }
    }
}

Implement the ability to use OCR for images and PDF documents

Description

This feature provides the ability to extract a text and text areas using OCR.

Public API changes

GroupDocs.Parser.Options.Features public class was updated with changes as follows:

Added isPreview and isOcr properties;

GroupDocs.Parser.Options.PageTextAreaOptions public class was updated with changes as follows:

Added PageTextAreaOptions(bool) and PageTextAreaOptions(bool, OcrOptions) constructors;
Added isUseOcr and OcrOptions properties.

GroupDocs.Parser.Options.TextOptions public class was updated with changes as follows:

Added TextOptions(bool, bool) and TextOptions(bool, bool, OcrOptions) constructors;
Added isUseOcr and OcrOptions properties.

GroupDocs.Parser.Options.ParserSettings public class was updated with changes as follows:

Added ParserSettings(OcrConnectorBase) and ParserSettings(ILogger, OcrConnectorBase) constructors;
Added OcrConnector property.

GroupDocs.Parser.Options.Parser public class was updated with changes as follows:

Added Parser(string, ParserSettings) and Parser(Stream, ParserSettings) constructors;

OcrConnectorBase, OcrEventHandler, OcrOptions classes were added into GroupDocs.Parser.Options namespace.

Usage

The following example shows how to extract a text from the image file:

// Create an instance of ParserSettings class with OCR Connector
ParserSettings settings = new ParserSettings(new AsposeOcrOnPremise());
// Create an instance of Parser class with settings
try (Parser parser = new Parser(Constants.SampleScan, settings)) {
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    try (TextReader reader = parser.getText(options)) {
        // Print a text or 'not supported' message
        System.out.println(reader == null ? "Text extraction isn't supported" : reader.readToEnd());
    }
}

See OCR Usage Basics for more details.