GroupDocs.Search for .NET 25.1 Release Notes

Full List of Issues Covering all Changes in this Release

KeySummaryCategory
SEARCHNET-3252Implement the ability to separately extract data from documents in the search networkFeature
SEARCHNET-3405Raises ArgumentException when opening a POTX fileFix
SEARCHNET-3406Raises ArgumentOutOfRangeException when opening a PDF fileFix

Public API and Backward Incompatible Changes

Implement the ability to separately extract data from documents in the search network

This functionality allows you to extract data from documents in the search network without indexing the extracted data. This separation of the extraction functionality allows you to create a temporary network, dedicating significant computing resources for a short period of time only to extraction. Then the computing resources for extraction can be freed up, and a smaller portion of the computing resources needed for searching the network will be used.

Public API changes

Property GroupDocs.Search.Common.ExtractedData Data has been added to GroupDocs.Search.Scaling.Events.DataExtractedEventArgs class.
Method Void Extract(System.Collections.Generic.IList<GroupDocs.Search.Common.Document>, System.Collections.Generic.IList<System.String>, GroupDocs.Search.Options.IndexingOptions) has been added to GroupDocs.Search.Scaling.Indexer class.
Property GroupDocs.Search.Scaling.SearchNetworkStatus Status has been added to GroupDocs.Search.Scaling.Indexer class.
Event System.EventHandler ExtractionCompleted has been added to GroupDocs.Search.Scaling.Events.NodeEventHub class.
Field GroupDocs.Search.Scaling.SearchNetworkStatus Extracting has been added to GroupDocs.Search.Scaling.SearchNetworkStatus enum.

Use cases
List<ExtractedData> extractedData = new List<ExtractedData>();

// Subscribing to the event to receive data extraction results
node.Events.DataExtracted += (s, e) =>
{
    extractedData.Add(e.Data);
};

Stream[] streams = new Stream[filePaths.Length];
Document[] documents = new Document[filePaths.Length];
string[] passwords = new string[filePaths.Length];
for (int i = 0; i < filePaths.Length; i++)
{
    string filePath = filePaths[i];
    DateTime modificationDate = File.GetLastWriteTime(filePath);
    string fileName = Path.GetFileName(filePath);
    string extension = Path.GetExtension(filePath);
    Stream stream = File.OpenRead(filePath);
    streams[i] = stream;
    Document document = Document.CreateFromStream(
        fileName,
        modificationDate,
        extension,
        stream);
    documents[i] = document;
}

IndexingOptions options = new IndexingOptions();
options.UseRawTextExtraction = false;
options.ImageIndexingOptions.EnabledForSeparateImages = true;
options.ImageIndexingOptions.EnabledForEmbeddedImages = true;
options.ImageIndexingOptions.EnabledForContainerItemImages = true;
options.OcrIndexingOptions.EnabledForSeparateImages = true;
options.OcrIndexingOptions.EnabledForEmbeddedImages = true;
options.OcrIndexingOptions.EnabledForContainerItemImages = true;

// Extracting data from documents
node.Indexer.Extract(documents, passwords, options);

for (int i = 0; i < streams.Length; i++)
{
    streams[i].Close();
}

// Adding extracted data to the index
node.Indexer.Add(extractedData, options);

Raises ArgumentException when opening a POTX file

Indexing of some specific POTX documents has been fixed.

Public API changes

None.

Use cases

None.

Raises ArgumentOutOfRangeException when opening a PDF file

Indexing of some specific PDF documents has been fixed.

Public API changes

None.

Use cases

None.