Unlock the power of image-based data extraction with GroupDocs.Parser for .NET 24.6 DLLs-only package. This release lets you extract text from images and PDFs easily on Windows, Linux, and macOS-powered applications.
New Feature: Extract Text from Images and PDFs
This release of the .NET parser API enables extracting text from image files and PDF documents lacking plain text content. OCR technology is utilized by this innovative feature to precisely convert image-based content into modifiable text. Here is how you can extract text from a PDF document in C#:
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
// Create an instance of TextOptions to use OCR
TextOptions options = new TextOptions(false, true);
// Extract a text using OCR
using(TextReader reader = parser.GetText(options))
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Source*
This code sample illustrates extracting text from images:
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
// Extract a text using OCR
using(TextReader reader = parser.GetText())
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Source*
Please ensure your development environment is based on .NET Core 3.1 or later to effectively use this functionality. Currently, OCR supports the English language only.
Public API Changes
The OcrConnectorBase
class was updated with IsTextAreasSupported
, IsTextPageSupported
, and IsTextSupported
properties in the latest .NET API version.
You can view the list of all new features, enhancements, and bug fixes introduced in this release by visiting GroupDocs.Parser for .NET 24.6 Release Notes.