Browse our Products

GroupDocs.Parser for .NET 24.6 Release Notes

This page contains release notes for GroupDocs.Parser for .NET 24.6

Full List of Issues Covering all Changes in this Release

Key	Summary	Category
PARSERNET-2337	Implement the ability to extract a text from images	Feature

Public API and Backward Incompatible Changes

Implement the ability to extract a text from images

Description

This feature allows to extract a text from images and PDFs (which don’t contain a plain text). This functionality works only on .NET Core 3.1 and later versions - .NET Framework isn’t supported. This release supports only OCR for English language.

Public API changes

OcrConnectorBase public class was updated with changes as follows:

Added IsTextAreasSupported property.
Added IsTextPageSupported property.
Added IsTextSupported property.

Usage

The following example shows how to extract a text from images and PDFs:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    using(TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

TextOptions can be omitted if the file is an image:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
    // Extract a text using OCR
    using(TextReader reader = parser.GetText())
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}