GroupDocs.Parser for .NET 24.7 Release Notes

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
PARSERNET-2393Implement OCR support in the .NET Framework versionFeature
PARSERNET-2394Implement the support for DjVu documentsFeature
PARSERNET-2429Improve OCR functionalityImprovement
PARSERNET-2404Improve document page preview APIImprovement
PARSERNET-2405Improve the barcode extraction from multipage TIFFImprovement

Public API and Backward Incompatible Changes

Implement OCR support in the .NET Framework version

Description

This feature allows to extract text from images and PDFs in .NET Framework version of GroupDocs.Parser.

To use the OCR functionality in .NET Framework set PlatformTarget to x64. If downloadable (msi or zip) version of GroupDocs.Parser is used, see readme.txt file for the additional information.

Public API changes

No API changes.

Usage

The following example shows how to extract a text from images and PDFs:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
    // Create an instance of TextOptions to use OCR
    TextOptions options = new TextOptions(false, true);
    // Extract a text using OCR
    using(TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

TextOptions can be omitted if the file is an image:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
    // Extract a text using OCR
    using(TextReader reader = parser.GetText())
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

Implement the support for DjVu documents

Description

This feature allows to extract a text from DjVu documents.

Public API changes

No API changes.

Usage

The following example shows how to extract a text from DjVu documents:

// Create an instance of Parser class
using (Parser parser = new Parser("book.djvu"))
{
    // Extract a text from the document
    using(TextReader reader = parser.GetText())
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

Improve OCR functionality

Description

This improvement add the ability to use the spell checker with the internal OCR.

Public API changes

OcrOptions public class was updated with changes as follows:

OcrConnectorBase public class was updated with changes as follows:

  • IsTextPageSupported property was marked as Obsolete
  • RecognizeText(Stream, int, OcrOptions) and RecognizeTextAreas(Stream, int, Size, OcrOptions) methods were marked as Obsolete
  • RecognizeText(Stream, OcrOptions) and RecognizeTextAreas(Stream, Size, OcrOptions) methods were added

Usage

The following example shows how to extract a text from images using spell checker:

// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
    // Create an instance of TextOptions to use OCR and spell checker
    TextOptions options = new TextOptions(false, true, new OcrOptions(null, null, true));
    // Extract a text using OCR
    using(TextReader reader = parser.GetText(options))
    {
        // Print a text or 'not supported' message
        Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
    }
}

Improve document page preview API

Description

This improvement adds the API to generate the document page preview.

Public API changes

Parser public class was updated with changes as follows:

Added PagePreviewFormat public enum

Added PagePreviewOptions public class

Usage

The following example shows how to generate page preview for the PDF document:

// Create an instance of Parser class
using (Parser parser = new Parser("document.pdf"))
{
    // Get the page count
    int pageCount = parser.GetDocumentInfo().PageCount;
    // Iterate over the document pages
    for (int i = 0; i < pageCount; i++)
    {
        // Generate the preview of the document page
        using (Stream stream = parser.GetPagePreview(i))
        {
            // Save the preview to the PNG file
            using (Stream fileStream = File.Create($"page_{i}.png"))
            {
                stream.CopyTo(fileStream);
            }
        }
    }
}

Improve the barcode extraction from multipage TIFF

Description

This improvement adds the ability to extract barcodes from multipage TIFF images.

Public API changes

No API changes.

Usage

The following example shows how to extract barcodes from multipage TIFF images:

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SamplePdfWithBarcodes))
{
    // Check if the barcodes extraction is supported
    if (!parser.Features.Barcodes)
    {
        Console.WriteLine("Document doesn't support barcodes extraction.");
        return;
    }

    // Extract barcodes from the second page.
    IEnumerable<PageBarcodeArea> barcodes = parser.GetBarcodes(1);

    // Iterate over barcodes
    foreach (PageBarcodeArea barcode in barcodes)
    {
        // Print the page index
        Console.WriteLine("Page: " + barcode.Page.Index.ToString());
        // Print the barcode value
        Console.WriteLine("Value: " + barcode.Value);
    }
}