GroupDocs.Search for .NET 25.2 Release Notes
This page contains release notes for GroupDocs.Search for .NET 25.2
Full List of Issues Covering all Changes in this Release
Key | Summary | Category |
---|---|---|
SEARCHNET-3445 | Implement the ability of custom splitting of text into words | Feature |
Public API and Backward Incompatible Changes
Implement the ability of custom splitting of text into words
This functionality allows you to implement text segmentation (for example, hieroglyphic, such as Chinese, Japanese, Korean) using external libraries. For a more detailed description, see the documentation article about Custom text segmenter.
Public API changes
Interface IWordSplitter has been added to GroupDocs.Search.Common namespace.
Method System.Collections.Generic.IEnumerable<System.String> Split(System.String) has been added to GroupDocs.Search.Common.IWordSplitter interface.
Property GroupDocs.Search.Common.IWordSplitter WordSplitter has been added to GroupDocs.Search.Events.FileIndexingEventArgs class.
Use cases
// Implementing custom word splitter
public class JiebaWordSplitter : IWordSplitter
{
private readonly JiebaSegmenter segmenter;
public JiebaWordSplitter()
{
segmenter = new JiebaSegmenter();
}
public IEnumerable<string> Split(string text)
{
IEnumerable<string> segments = segmenter.Cut(text, cutAll: false);
return segments;
}
}
...
string indexFolder = @"c:\MyIndex\";
string documentsFolder = @"c:\MyDocuments\";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Using Jieba segmenter to break text into words
JiebaWordSplitter jiebaWordSplitter = new JiebaWordSplitter();
index.Events.FileIndexing += (s, e) =>
{
if (e.DocumentFullPath.EndsWith("Chinese.txt"))
{
// We know that the text in this document is in Chinese
e.WordSplitter = jiebaWordSplitter;
}
};
// Indexing documents
index.Add(documentsFolder);
// Searching in the index
string query = "考虑"; // Consider
SearchResult result = index.Search(query);