GroupDocs.Parser for .NET 24.6 Release Notes
This page contains release notes for GroupDocs.Parser for .NET 24.6
Full List of Issues Covering all Changes in this Release
Key | Summary | Category |
---|---|---|
PARSERNET-2337 | Implement the ability to extract a text from images | Feature |
Public API and Backward Incompatible Changes
Implement the ability to extract a text from images
Description
This feature allows to extract a text from images and PDFs (which don’t contain a plain text). This functionality works only on .NET Core 3.1 and later versions - .NET Framework isn’t supported. This release supports only OCR for English language.
Public API changes
OcrConnectorBase public class was updated with changes as follows:
- Added IsTextAreasSupported property.
- Added IsTextPageSupported property.
- Added IsTextSupported property.
Usage
The following example shows how to extract a text from images and PDFs:
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
// Create an instance of TextOptions to use OCR
TextOptions options = new TextOptions(false, true);
// Extract a text using OCR
using(TextReader reader = parser.GetText(options))
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
TextOptions can be omitted if the file is an image:
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
// Extract a text using OCR
using(TextReader reader = parser.GetText())
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}