GroupDocs.Parser for .NET 24.7 Release Notes
Full List of Issues Covering all Changes in this Release
Key | Summary | Category |
---|---|---|
PARSERNET-2393 | Implement OCR support in the .NET Framework version | Feature |
PARSERNET-2394 | Implement the support for DjVu documents | Feature |
PARSERNET-2429 | Improve OCR functionality | Improvement |
PARSERNET-2404 | Improve document page preview API | Improvement |
PARSERNET-2405 | Improve the barcode extraction from multipage TIFF | Improvement |
Public API and Backward Incompatible Changes
Implement OCR support in the .NET Framework version
Description
This feature allows to extract text from images and PDFs in .NET Framework version of GroupDocs.Parser.
To use the OCR functionality in .NET Framework set PlatformTarget to x64. If downloadable (msi or zip) version of GroupDocs.Parser is used, see readme.txt file for the additional information.
Public API changes
No API changes.
Usage
The following example shows how to extract a text from images and PDFs:
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
// Create an instance of TextOptions to use OCR
TextOptions options = new TextOptions(false, true);
// Extract a text using OCR
using(TextReader reader = parser.GetText(options))
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
TextOptions can be omitted if the file is an image:
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
// Extract a text using OCR
using(TextReader reader = parser.GetText())
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Implement the support for DjVu documents
Description
This feature allows to extract a text from DjVu documents.
Public API changes
No API changes.
Usage
The following example shows how to extract a text from DjVu documents:
// Create an instance of Parser class
using (Parser parser = new Parser("book.djvu"))
{
// Extract a text from the document
using(TextReader reader = parser.GetText())
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Improve OCR functionality
Description
This improvement add the ability to use the spell checker with the internal OCR.
Public API changes
OcrOptions public class was updated with changes as follows:
- Added OcrOptions(Rectangle, OcrEventHandler, bool) constructor
- Added UseSpellChecker property
OcrConnectorBase public class was updated with changes as follows:
- IsTextPageSupported property was marked as Obsolete
- RecognizeText(Stream, int, OcrOptions) and RecognizeTextAreas(Stream, int, Size, OcrOptions) methods were marked as Obsolete
- RecognizeText(Stream, OcrOptions) and RecognizeTextAreas(Stream, Size, OcrOptions) methods were added
Usage
The following example shows how to extract a text from images using spell checker:
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
// Create an instance of TextOptions to use OCR and spell checker
TextOptions options = new TextOptions(false, true, new OcrOptions(null, null, true));
// Extract a text using OCR
using(TextReader reader = parser.GetText(options))
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Improve document page preview API
Description
This improvement adds the API to generate the document page preview.
Public API changes
Parser public class was updated with changes as follows:
- GetPagePreview(int) and GetPagePreview(int, PagePreviewOptions) methods were added.
Added PagePreviewFormat public enum
Added PagePreviewOptions public class
Usage
The following example shows how to generate page preview for the PDF document:
// Create an instance of Parser class
using (Parser parser = new Parser("document.pdf"))
{
// Get the page count
int pageCount = parser.GetDocumentInfo().PageCount;
// Iterate over the document pages
for (int i = 0; i < pageCount; i++)
{
// Generate the preview of the document page
using (Stream stream = parser.GetPagePreview(i))
{
// Save the preview to the PNG file
using (Stream fileStream = File.Create($"page_{i}.png"))
{
stream.CopyTo(fileStream);
}
}
}
}
Improve the barcode extraction from multipage TIFF
Description
This improvement adds the ability to extract barcodes from multipage TIFF images.
Public API changes
No API changes.
Usage
The following example shows how to extract barcodes from multipage TIFF images:
// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SamplePdfWithBarcodes))
{
// Check if the barcodes extraction is supported
if (!parser.Features.Barcodes)
{
Console.WriteLine("Document doesn't support barcodes extraction.");
return;
}
// Extract barcodes from the second page.
IEnumerable<PageBarcodeArea> barcodes = parser.GetBarcodes(1);
// Iterate over barcodes
foreach (PageBarcodeArea barcode in barcodes)
{
// Print the page index
Console.WriteLine("Page: " + barcode.Page.Index.ToString());
// Print the barcode value
Console.WriteLine("Value: " + barcode.Value);
}
}