GroupDocs.Parser for .NET 24.6 Release Notes

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
PARSERNET-2337Implement the ability to extract a text from imagesFeature

Public API and Backward Incompatible Changes

Implement the ability to extract a text from images

Description

This feature allows to extract a text from images and PDFs (which don’t contain a plain text). This functionality works only on .NET Core 3.1 and later versions - .NET Framework isn’t supported. This release supports only OCR for English language.

Public API changes

OcrConnectorBase public class was updated with changes as follows:

Usage

The following example shows how to extract a text from images and PDFs:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    using(TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

TextOptions can be omitted if the file is an image:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
    // Extract a text using OCR
    using(TextReader reader = parser.GetText())
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}