GroupDocs.Parser for .NET 25.2 Release Notes

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
PARSERNET-2398Implement the ability to parse images by templateFeature
PARSERNET-2629Implement the ability to set pages limit for search functionalityImprovement
PARSERNET-2619PDF parser causing issue when using streamBug

Public API and Backward Incompatible Changes

Implement the ability to parse images by template

Description

This feature allows using template-based parsing for images and scanned PDF files. It uses built-in OCR (which can be replaced with a third-party one if desired). The functionality works the same as for regular PDF files, but with some limitations: only simple fields, tables, and barcodes are supported. Other field types will be ignored.

Public API changes

OcrConnectorBase public class was updated with changes as follows:

ParseByTemplateOptions public class was added

Template public class was updated with changes as follows:

Parser public class was updated with changes as follows:

Usage

The following example shows how to parse image by the template:

// Load a document template from the file
Template template = Template.Load("template.xml");

// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SamplePdfWithBarcodes))
{
    // Parse the document by the template (force OCR usage)
    DocumentData data = parser.ParseByTemplate(template, new ParseByTemplateOptions(true));

    // Print all extracted data
    for (int i = 0; i < data.Count; i++)
    {
        // Print the field's name and text
        Console.WriteLine(data[i].Name + ": " + data[i].Text);
    }
}

Implement the ability to set pages limit for search functionality

Description

This improvement allows limiting the search to a maximum page number. For example, it can be used when you need to find keywords on the first page of a document.

Public API changes

SearchOptions public class was updated with changes as follows:

  • SearchOptions(bool, bool, bool, int, HighlightOptions, HighlightOptions) constructor was added.

  • MaxPageIndex property was added.

Usage

The following example shows how to limit search pages:

// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
    // Search with a regular expression with case matching and max page index = 0 (only the first page)
    IEnumerable<SearchResult> sr = parser.Search("page number: [0-9]+", new SearchOptions(true, false, true, 0, null, null));
    // Check if search is supported
    if(sr == null)
    {
        Console.WriteLine("Search isn't supported");
        return;
    }

    // Iterate over search results
    foreach(SearchResult s in sr)
    {
        // Print an index and found text:
        Console.WriteLine(string.Format("At {0}: {1}", s.Position, s.Text));
    }
}