GroupDocs.Parser for .NET 25.2 Release Notes
Full List of Issues Covering all Changes in this Release
Key | Summary | Category |
---|---|---|
PARSERNET-2398 | Implement the ability to parse images by template | Feature |
PARSERNET-2629 | Implement the ability to set pages limit for search functionality | Improvement |
PARSERNET-2619 | PDF parser causing issue when using stream | Bug |
Public API and Backward Incompatible Changes
Implement the ability to parse images by template
Description
This feature allows using template-based parsing for images and scanned PDF files. It uses built-in OCR (which can be replaced with a third-party one if desired). The functionality works the same as for regular PDF files, but with some limitations: only simple fields, tables, and barcodes are supported. Other field types will be ignored.
Public API changes
OcrConnectorBase public class was updated with changes as follows:
RecognizeTextAreas(Stream, Page, OcrOptions) virtual method was added.
RecognizeTextAreas(Stream, IEnumerable
, string Page, OcrOptions) virtual method was added. RecognizeTextAreas(Stream, int, Size, OcrOptions) method was marked as Obsolete.
RecognizeTextAreas(Stream, Size, OcrOptions) method was marked as Obsolete.
ParseByTemplateOptions public class was added
Template public class was updated with changes as follows:
- IsOcrCompatible property was added.
Parser public class was updated with changes as follows:
ParseByTemplate(Template, ParseByTemplateOptions) method was added.
ParsePagesByTemplate(Template, ParseByTemplateOptions) method was added.
Usage
The following example shows how to parse image by the template:
// Load a document template from the file
Template template = Template.Load("template.xml");
// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SamplePdfWithBarcodes))
{
// Parse the document by the template (force OCR usage)
DocumentData data = parser.ParseByTemplate(template, new ParseByTemplateOptions(true));
// Print all extracted data
for (int i = 0; i < data.Count; i++)
{
// Print the field's name and text
Console.WriteLine(data[i].Name + ": " + data[i].Text);
}
}
Implement the ability to set pages limit for search functionality
Description
This improvement allows limiting the search to a maximum page number. For example, it can be used when you need to find keywords on the first page of a document.
Public API changes
SearchOptions public class was updated with changes as follows:
SearchOptions(bool, bool, bool, int, HighlightOptions, HighlightOptions) constructor was added.
MaxPageIndex property was added.
Usage
The following example shows how to limit search pages:
// Create an instance of Parser class
using(Parser parser = new Parser(filePath))
{
// Search with a regular expression with case matching and max page index = 0 (only the first page)
IEnumerable<SearchResult> sr = parser.Search("page number: [0-9]+", new SearchOptions(true, false, true, 0, null, null));
// Check if search is supported
if(sr == null)
{
Console.WriteLine("Search isn't supported");
return;
}
// Iterate over search results
foreach(SearchResult s in sr)
{
// Print an index and found text:
Console.WriteLine(string.Format("At {0}: {1}", s.Position, s.Text));
}
}