GroupDocs.Parser for .NET 25.2 Release Notes

This page contains release notes for GroupDocs.Parser for .NET 25.2

Full List of Issues Covering all Changes in this Release

Key	Summary	Category
PARSERNET-2398	Implement the ability to parse images by template	Feature
PARSERNET-2629	Implement the ability to set pages limit for search functionality	Improvement
PARSERNET-2619	PDF parser causing issue when using stream	Bug

Public API and Backward Incompatible Changes

Implement the ability to parse images by template

Description

This feature allows using template-based parsing for images and scanned PDF files. It uses built-in OCR (which can be replaced with a third-party one if desired). The functionality works the same as for regular PDF files, but with some limitations: only simple fields, tables, and barcodes are supported. Other field types will be ignored.

Public API changes

OcrConnectorBase public class was updated with changes as follows:

RecognizeTextAreas(Stream, Page, OcrOptions) virtual method was added.
RecognizeTextAreas(Stream, IEnumerable, string Page, OcrOptions) virtual method was added.
RecognizeTextAreas(Stream, int, Size, OcrOptions) method was marked as Obsolete.
RecognizeTextAreas(Stream, Size, OcrOptions) method was marked as Obsolete.

ParseByTemplateOptions public class was added

Template public class was updated with changes as follows:

IsOcrCompatible property was added.

Parser public class was updated with changes as follows:

ParseByTemplate(Template, ParseByTemplateOptions) method was added.
ParsePagesByTemplate(Template, ParseByTemplateOptions) method was added.

Usage

The following example shows how to parse image by the template:

// Load a document template from the file
Template template = Template.Load("template.xml");

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SamplePdfWithBarcodes))
{
    // Parse the document by the template (force OCR usage)
    DocumentData data = parser.ParseByTemplate(template, new ParseByTemplateOptions(true));

    // Print all extracted data
    for (int i = 0; i < data.Count; i++)
    {
        // Print the field's name and text
        Console.WriteLine(data[i].Name + ": " + data[i].Text);
    }
}

Implement the ability to set pages limit for search functionality

Description

This improvement allows limiting the search to a maximum page number. For example, it can be used when you need to find keywords on the first page of a document.

Public API changes

SearchOptions public class was updated with changes as follows:

SearchOptions(bool, bool, bool, int, HighlightOptions, HighlightOptions) constructor was added.
MaxPageIndex property was added.

Usage

The following example shows how to limit search pages:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Search with a regular expression with case matching and max page index = 0 (only the first page)
    IEnumerable<SearchResult> sr = parser.Search("page number: [0-9]+", new SearchOptions(true, false, true, 0, null, null));
    // Check if search is supported
    if(sr == null)
    {
        Console.WriteLine("Search isn't supported");
        return;
    }

    // Iterate over search results
    foreach(SearchResult s in sr)
    {
        // Print an index and found text:
        Console.WriteLine(string.Format("At {0}: {1}", s.Position, s.Text));
    }
}