An exciting new release of GroupDocs.Parser for .NET (v24.7, MSI) is here! This update equips developers working with .NET Framework applications to leverage a range of new functionalities designed to better manage document parsing workflows.
Image and PDF Text Extraction with OCR
Extract text from scanned documents and images programmatically within your .NET Framework projects using built-in OCR capabilities in the latest C# document parsing API release. The following code examples demonstrate extracting text from PDFs and images respectively.
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
// Create an instance of TextOptions to use OCR
TextOptions options = new TextOptions(false, true);
// Extract a text using OCR
using(TextReader reader = parser.GetText(options))
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.jpg"))
{
// Extract a text using OCR
using(TextReader reader = parser.GetText())
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Source*
DjVu Document Support
Parsing and extracting text from DjVu documents is easier than ever with version 24.7 of GroupDocs.Parser for .NET. This feature expands your file format compatibility and elevates your C# and VB.NET applications. This code example shows how to use this feature in your solutions.
// Create an instance of Parser class
using (Parser parser = new Parser("book.djvu"))
{
// Extract a text from the document
using(TextReader reader = parser.GetText())
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Source*
OCR Functionality Boost
Develop high-performance document parsing solutions and leverage spell-checking alongside OCR for improved text accuracy. Check out the code sample shared below to learn how to integrate this functionality into your C# applications.
// Create an instance of Parser class
using (Parser parser = new Parser("scanned.pdf"))
{
// Create an instance of TextOptions to use OCR and spell checker
TextOptions options = new TextOptions(false, true, new OcrOptions(null, null, true));
// Extract a text using OCR
using(TextReader reader = parser.GetText(options))
{
// Print a text or 'not supported' message
Console.WriteLine(reader == null ? "Text extraction isn't supported" : reader.ReadToEnd());
}
}
Source*
Document Page Previews
It is now possible to generate high-fidelity page previews for a quick visual overview of document content using GroupDocs.Parser for .NET, as showcased in this code sample.
// Create an instance of Parser class
using (Parser parser = new Parser("document.pdf"))
{
// Get the page count
int pageCount = parser.GetDocumentInfo().PageCount;
// Iterate over the document pages
for (int i = 0; i < pageCount; i++)
{
// Generate the preview of the document page
using (Stream stream = parser.GetPagePreview(i))
{
// Save the preview to the PNG file
using (Stream fileStream = File.Create($"page_{i}.png"))
{
stream.CopyTo(fileStream);
}
}
}
}
Source*
This release of the document indexing API lets you effortlessly extract barcodes from individual pages within multipage TIFF files. This code example highlights the feature usage.
// Create an instance of Parser class
using (Parser parser = new Parser(Constants.SamplePdfWithBarcodes))
{
// Check if the barcodes extraction is supported
if (!parser.Features.Barcodes)
{
Console.WriteLine("Document doesn't support barcodes extraction.");
return;
}
// Extract barcodes from the second page.
IEnumerable<PageBarcodeArea> barcodes = parser.GetBarcodes(1);
// Iterate over barcodes
foreach (PageBarcodeArea barcode in barcodes)
{
// Print the page index
Console.WriteLine("Page: " + barcode.Page.Index.ToString());
// Print the barcode value
Console.WriteLine("Value: " + barcode.Value);
}
}
Source*
You can view the list of all new features, enhancements, and bug fixes introduced in this release by visiting GroupDocs.Parser for .NET 24.7 Release Notes.