The latest release of GroupDocs.Parser for .NET 24.7 (DLLs only) is here! This package offers a refined set of features designed to equip developers building C# and VB.NET applications with dynamic document parsing capabilities.
OCR-based Text Extraction in Images and PDFs
Use the built-in OCR functionality in this C# document parsing API release to extract text from scanned documents and images immaculately within your .NET Framework projects. The following code examples illustrates extracting text from PDFs and images respectively.
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
// Create an instance of TextOptions to use OCR
TextOptions options = new TextOptions(false, true);
// Extract a text using OCR
using(TextReader reader = parser.GetText(options))
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
// Extract a text using OCR
using(TextReader reader = parser.GetText())
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Source*
Support for DjVu Documents
Parse and extract text from DjVu documents with unmatched convenience using version 24.7 of GroupDocs.Parser for .NET API. This update expands your file format compatibility and boosts your C# and VB.NET applications. This code example shows how to use this feature in your solutions.
// Create an instance of Parser class
using (Parser parser = new Parser("book.djvu"))
{
// Extract a text from the document
using(TextReader reader = parser.GetText())
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Source*
Empowered OCR Functionality
Create full-featured document parsing solutions and utilize spell-checking alongside OCR for better text accuracy. Check out the code sample shared below to learn how to integrate this functionality into your C# applications.
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
// Create an instance of TextOptions to use OCR and spell checker
TextOptions options = new TextOptions(false, true, new OcrOptions(null, null, true));
// Extract a text using OCR
using(TextReader reader = parser.GetText(options))
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Source*
Preview Document Pages
GroupDocs.Parser for .NET now makes it possible to generate high-fidelity page previews to quickly overview the document content, as showcased in this code sample.
// Create an instance of Parser class
using (Parser parser = new Parser("document.pdf"))
{
// Get the page count
int pageCount = parser.GetDocumentInfo().PageCount;
// Iterate over the document pages
for (int i = 0; i < pageCount; i++)
{
// Generate the preview of the document page
using (Stream stream = parser.GetPagePreview(i))
{
// Save the preview to the PNG file
using (Stream fileStream = File.Create($"page_{i}.png"))
{
stream.CopyTo(fileStream);
}
}
}
}
Source*
With this release of the document indexing API, you can conveniently extract barcodes from individual pages within your multipage TIFF files. This code example highlights the feature usage.
// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SamplePdfWithBarcodes))
{
// Check if the barcodes extraction is supported
if (!parser.Features.Barcodes)
{
Console.WriteLine("Document doesn't support barcodes extraction.");
return;
}
// Extract barcodes from the second page.
IEnumerable<PageBarcodeArea> barcodes = parser.GetBarcodes(1);
// Iterate over barcodes
foreach (PageBarcodeArea barcode in barcodes)
{
// Print the page index
Console.WriteLine("Page: " + barcode.Page.Index.ToString());
// Print the barcode value
Console.WriteLine("Value: " + barcode.Value);
}
}
Source*
You can view the list of all new features, enhancements, and bug fixes introduced in this release by visiting GroupDocs.Parser for .NET 24.7 Release Notes.