GroupDocs.Search for .NET 24.1 Release Notes

Major Features

There are the following features, enhancements, and fixes in this release:

  • Implement indexing of the extracted data in the distributed index

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
SEARCHNET-3040Implement indexing of the extracted data in the distributed indexEnhancement

Public API and Backward Incompatible Changes

Implement indexing of the extracted data in the distributed index

This enhancement provides the ability to add data already extracted from documents to the distributed index. This allows you to completely separate data extraction from indexing, and also allows you to have complete control over when and where data is extracted.

Public API changes

Method Void Add(GroupDocs.Search.Common.ExtractedData[], GroupDocs.Search.Options.IndexingOptions) has been added to GroupDocs.Search.Scaling.Indexer class.

Use cases

The following example demonstrates how to add the extracted data to the search network.

C#

ExtractedData[] data = new ExtractedData[filePaths.Length];

// Creation of the extractor object
Extractor extractor = new Extractor();
ExtractionOptions extractionOptions = new ExtractionOptions();
extractionOptions.ImageIndexingOptions.EnabledForSeparateImages = true;
extractionOptions.ImageIndexingOptions.EnabledForEmbeddedImages = true;
extractionOptions.ImageIndexingOptions.EnabledForContainerItemImages = true;
extractionOptions.OcrIndexingOptions.EnabledForSeparateImages = true;
extractionOptions.OcrIndexingOptions.EnabledForEmbeddedImages = true;
extractionOptions.OcrIndexingOptions.EnabledForContainerItemImages = true;
extractionOptions.UseRawTextExtraction = false;

for (int i = 0; i < filePaths.Length; i++)
{
    // Creation of the document object
    string filePath = filePaths[i];
    DateTime modificationDate = File.GetLastWriteTime(filePath);
    string fileName = Path.GetFileName(filePath);
    string extension = Path.GetExtension(filePath);
    Stream stream = File.OpenRead(filePath);
    Document document = Document.CreateFromStream(
        fileName,
        modificationDate,
        extension,
        stream);

    // Extraction of the data from the document
    ExtractedData extractedData = extractor.Extract(document, extractionOptions);
    data[i] = extractedData;

    stream.Close();
}

// Indexing of the extracted data
Indexer indexer = node.Indexer;
IndexingOptions options = new IndexingOptions();
options.IsAsync = false;
indexer.Add(data, options);