<dependency>
    <groupId>com.groupdocs</groupId>
    <artifactId>groupdocs-search</artifactId>
    <version>20.4</version>
</dependency>
copied!  
compile(group: 'com.groupdocs', name: 'groupdocs-search', version: '20.4')
copied!  
<dependency org="com.groupdocs" name="groupdocs-search" rev="20.4">
    <artifact name="groupdocs-search" ext="jar"/>
</dependency>
copied!  
libraryDependencies += "com.groupdocs" % "groupdocs-search" % "20.4"
copied!  

High Code Java API to Index & Search Documents

banner

Product Page | Docs | Demos | API Reference | Examples | Blog | Free Support | Temporary License

GroupDocs.Search for Java is an on-premise Java API to help indexing document content & metadata, perform searches (boolean, faceted, fuzzy, Homephone) & custom text extraction, apply search filters, and highlight results.

Search & Index Java On-premise API Features

Indexing API Features

  • Create search index, apply index settings, & subscribe to index events.
  • Supports indexing documents from file, stream, or a data structure.
  • Merge multiple search indexes into one.
  • Support is available for:
    • additional fields
    • regular characters (separators & letters)
    • blended characters (these special characters are indexed as separators as well as letters, e.g. hyphen)
    • characters indexed as a whole word
    • character replacement during indexing
    • custom text extractors
  • Index files protected with password
  • Provides the compact and metadata index options.
  • Supports different level of compression to save extracted text in index.
  • Ability to filter documents during indexing.
  • Option to delete indexed paths from index.
  • While indexing, convert all characters to lowercase or remove diacritics from text using Character replacement.
  • Ability to specify desired set of characters as letters.
  • Implement the support for a custom text extractor and then use that custom extractor for indexing.
  • Delete or remove desired documents from the search index.
  • Remove or delete indexed folders & files from the Index.
  • Mark indexed documents with text labels without re-indexing.
  • Filter documents during the search via applied document attributes.
  • Apply various types of filters while indexing, such as:
    • Creation Time Filter (i.e. skip files created earlier/later than a certain date, or outside the provided date range)
    • Modification Time Filter (same as Creation Time Filter but works on the document modification date)
    • File Path Filter (apply regex to skip the files with full paths not matching the specified pattern)
    • File Length Filter (specify lower/upper bound, or the range of acceptable file length in bytes)
    • File Extension Filter (only files matching the list of specified file extensions will be indexed)
    • Logical NOT Filter (invert the logic of an internal filter)
    • Logical AND Filter (composite filter that requires all internal filters to succeed)
    • Logical OR Filter (composite filter that requires at least one internal filter to succeed)
  • Rename any indexed document without requiring it to reindex during the update
  • Add additional fields to indexed documents to associate more metadata.
  • Ability to save the document text in the Index.

Searching API Features

  • Supports various types of searches, such as:
    • Boolean Search: – Supports AND, OR, NOT operators. – Combine multiple Boolean search quries to compose comlex quries.
    • Case Sensitive Search: Considers uppercase & lowercase characters as distinct.
    • Date Range Search: Searches based on provided date range in specified date format.
    • Faceted Search: Searches only within specified fields instead of whole document.
    • Fuzzy Search: Search that detects wrong spellings words correctly using fuzzy logic.
    • Homophone Search: Search for words which are similar in sound (pronunciation) to the searched word.
  • Fetch the text of indexed documents in the HTML format.
  • Apply various filters while searching documents, such as:
    • File Path Filter (apply regex to fetch the files with full paths matching the specified pattern)
    • File Extension Filter (returns the files matching the list of specified file extensions)
    • Attribute Filter (returns the files with whom the specified attributes are associated)
    • Combined Filters (apply composite filters AND, OR, NOT to compose complex queries)
  • After the search the found resultant words & phrases within the document content can be highlighted.
  • Enable the keyboard layout correction option to replace the unsupported keyword characters with the actual characters.
  • Search for different word forms, such as, noun, adjective, forms of verbs etc.

Search Dictionary Management API Features

  • Various types of dictionaries can be used & managed, such as:
    • Alias Dictionary
    • Alphabet Dictionary
    • Character Replacements Dictionary
    • Document Passwords Dictionary
    • Homophone Dictionary
    • Spelling Corrector
    • Stop Word Dictionary
    • Synonym Dictionary
    • Word Forms Provider

Supported Document Search File Formats

The indexing content operation is supported for the following file formats:

Microsoft Word®: DOC/DOT/DOCX/DOCM/DOTX/DOTM/RTF/TXT
OpenOffice Writer®: ODT/OTT
Microsoft Excel®: XLS/XLT/XLSX/XLSM/XLSB/XLTX/XLTM/XLA/XLAM
OpenOffice Calc®: ODS/OTS/CSV/TSV/SpreadsheetML
Microsoft PowerPoint®: PPT/PPS/POT/PPTX/PPTM/POTX/POTM/PPSX/PPSM
OpenOffice Impress®: ODP
Microsoft Outlook®: PST/OST/EML/MSG
Apple® Mail Message: EMLX
Microsoft OneNote®: ONE
Markup: HTML/XHTML/MHTML/MD/XML
eBook: CHM/EPUB/FB2
Archive: ZIP
Fixed Layout: PDF

The indexing metadata operation is supported for the following file formats:

Microsoft Word®: DOC/DOT/DOCX/DOCM/DOTX/DOTM/RTF/TXT
OpenOffice Writer®: ODT/OTT
Microsoft Excel®: XLS/XLT/XLSX/XLSM/XLSB/XLTX/XLTM/XLA/XLAM
OpenOffice Calc®: ODS/OTS/CSV/TSV/SpreadsheetML
Microsoft PowerPoint®: PPT/PPS/POT/PPTX/PPTM/POTX/POTM/PPSX/PPSM
OpenOffice Impress®: ODP
Microsoft Outlook®: PST/OST/EML/MSG
Apple® Mail Message: EMLX
Microsoft OneNote®: ONE
Microsoft Project®: MPP
Microsoft Visio®: VSD/VSS
Markup: HTML/XHTML/MHTML/MD/XML
eBook: CHM/EPUB/FB2
Archive: ZIP
Audio: MP3/WAV
Video: AVI/MOV/QT/FLV/ASF
Image: BMP/GIF/JP2/PNG/WEBP/TIFF/JPG/DJVU
Adobe Photoshop®: PSD
Medical Imaging: DCM/DICOM
Metadata: EMF/WMF
Fixed Layout: PDF
BitTorrent: TORRENT

For details and limitations please visit, Supported Document Formats.

System Requirements

  • Microsoft Windows: Windows Desktop & Server (x86, x64), Microsoft Azure
  • macOS: Mac OS X
  • Linux: Ubuntu, OpenSUSE, CentOS, and others
  • Java Versions: J2SE 7.0 (1.7), J2SE 8.0 (1.8) or above (for example Java 10)

GroupDocs.Search for Java does not require any external software or third party tool to be installed. Just follow one of the ways as described in Installation and Configuration.

Get Started

GroupDocs hosts all Java APIs at the GroupDocs Repository. You can easily use GroupDocs.Search for Java API directly in your Maven projects with simple configurations. For the detailed instructions please visit Installation from GroupDocs Repository using Maven documentation page.

Sample Java code to use the Blended Characters in Search Indexing

String indexFolder = "c:\\MyIndex\\";
String documentFolder = "c:\\MyDocuments\\";
 
// Creating an index in the specified folder
Index index = new Index(indexFolder);
 
// Setting hyphen character type to blended
index.getDictionaries().getAlphabet().setRange(new char[] { '-' }, CharacterType.Blended);
 
// Indexing documents from the specified folder
index.add(documentFolder);
 
// Searching in the index
SearchResult result1 = index.search("Elliot-Murray-Kynynmound");
SearchResult result2 = index.search("Elliot");
SearchResult result3 = index.search("Murray");
SearchResult result4 = index.search("Kynynmound");

Product Page | Docs | Demos | API Reference | Examples | Blog | Free Support | Temporary License

VersionRelease Date
24.4April 22, 2024
24.2February 6, 2024
24.1January 15, 2024
23.6June 15, 2023
23.3March 24, 2023
22.11November 30, 2022
22.10October 24, 2022
21.2January 25, 2022
20.8January 25, 2022
19.2January 25, 2022
18.12January 25, 2022
21.8August 18, 2021
21.3March 18, 2021
20.11November 19, 2020
20.6June 23, 2020
20.4April 16, 2020
19.12December 11, 2019
19.5.1July 15, 2019
19.5May 31, 2019
19.3March 7, 2019
18.11November 1, 2018