<repositories>
<repository>
<id>repository.groupdocs.com</id>
<name>GroupDocs Repository</name>
<url>https://releases.groupdocs.com/java/repo/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-search</artifactId>
<version>24.4</version>
</dependency>
</dependencies>
repositories {
maven {
url 'https://releases.groupdocs.com/java/repo/'
}
}
compile(group: 'com.groupdocs', name: 'groupdocs-search', version: '24.4')
<ivysettings>
<settings defaultResolver="chain"/>
<resolvers>
<chain name="chain">
<ibiblio name="GroupDocs Repository" m2compatible="true" root="https://releases.groupdocs.com/java/repo/"/>
</chain>
</resolvers>
</ivysettings>
<dependency org="com.groupdocs" name="groupdocs-search" rev="24.4">
<artifact name="groupdocs-search" ext="jar"/>
</dependency>
resolvers += Resolver.url("GroupDocs Repository", url("https://releases.groupdocs.com/java/repo/"))
libraryDependencies += "com.groupdocs" % "groupdocs-search" % "24.4"
High Code Java API to Index & Search Documents
Product Page | Docs | Demos | API Reference | Examples | Blog | Free Support | Temporary License
GroupDocs.Search for Java is an on-premise Java API to help indexing document content & metadata, perform searches (boolean, faceted, fuzzy, Homephone) & custom text extraction, apply search filters, and highlight results.
Search & Index Java On-premise API Features
Indexing API Features
- Create search index, apply index settings, & subscribe to index events.
- Supports indexing documents from file, stream, or a data structure.
- Merge multiple search indexes into one.
- Support is available for:
- additional fields
- regular characters (separators & letters)
- blended characters (these special characters are indexed as separators as well as letters, e.g. hyphen)
- characters indexed as a whole word
- character replacement during indexing
- custom text extractors
- Index files protected with password
- Provides the compact and metadata index options.
- Supports different level of compression to save extracted text in index.
- Ability to filter documents during indexing.
- Option to delete indexed paths from index.
- While indexing, convert all characters to lowercase or remove diacritics from text using Character replacement.
- Ability to specify desired set of characters as letters.
- Implement the support for a custom text extractor and then use that custom extractor for indexing.
- Delete or remove desired documents from the search index.
- Remove or delete indexed folders & files from the Index.
- Mark indexed documents with text labels without re-indexing.
- Filter documents during the search via applied document attributes.
- Apply various types of filters while indexing, such as:
- Creation Time Filter (i.e. skip files created earlier/later than a certain date, or outside the provided date range)
- Modification Time Filter (same as Creation Time Filter but works on the document modification date)
- File Path Filter (apply regex to skip the files with full paths not matching the specified pattern)
- File Length Filter (specify lower/upper bound, or the range of acceptable file length in bytes)
- File Extension Filter (only files matching the list of specified file extensions will be indexed)
- Logical NOT Filter (invert the logic of an internal filter)
- Logical AND Filter (composite filter that requires all internal filters to succeed)
- Logical OR Filter (composite filter that requires at least one internal filter to succeed)
- Rename any indexed document without requiring it to reindex during the update
- Add additional fields to indexed documents to associate more metadata.
- Ability to save the document text in the Index.
Searching API Features
- Supports various types of searches, such as:
- Boolean Search: – Supports AND, OR, NOT operators. – Combine multiple Boolean search quries to compose comlex quries.
- Case Sensitive Search: Considers uppercase & lowercase characters as distinct.
- Date Range Search: Searches based on provided date range in specified date format.
- Faceted Search: Searches only within specified fields instead of whole document.
- Fuzzy Search: Search that detects wrong spellings words correctly using fuzzy logic.
- Homophone Search: Search for words which are similar in sound (pronunciation) to the searched word.
- Fetch the text of indexed documents in the
HTML
format. - Apply various filters while searching documents, such as:
- File Path Filter (apply regex to fetch the files with full paths matching the specified pattern)
- File Extension Filter (returns the files matching the list of specified file extensions)
- Attribute Filter (returns the files with whom the specified attributes are associated)
- Combined Filters (apply composite filters AND, OR, NOT to compose complex queries)
- After the search the found resultant words & phrases within the document content can be highlighted.
- Enable the keyboard layout correction option to replace the unsupported keyword characters with the actual characters.
- Search for different word forms, such as, noun, adjective, forms of verbs etc.
Search Dictionary Management API Features
- Various types of dictionaries can be used & managed, such as:
- Alias Dictionary
- Alphabet Dictionary
- Character Replacements Dictionary
- Document Passwords Dictionary
- Homophone Dictionary
- Spelling Corrector
- Stop Word Dictionary
- Synonym Dictionary
- Word Forms Provider
Supported Document Search File Formats
The indexing content operation is supported for the following file formats:
Microsoft Word®: DOC/DOT/DOCX/DOCM/DOTX/DOTM/RTF/TXT
OpenOffice Writer®: ODT/OTT
Microsoft Excel®: XLS/XLT/XLSX/XLSM/XLSB/XLTX/XLTM/XLA/XLAM
OpenOffice Calc®: ODS/OTS/CSV/TSV/SpreadsheetML
Microsoft PowerPoint®: PPT/PPS/POT/PPTX/PPTM/POTX/POTM/PPSX/PPSM
OpenOffice Impress®: ODP
Microsoft Outlook®: PST/OST/EML/MSG
Apple® Mail Message: EMLX
Microsoft OneNote®: ONE
Markup: HTML/XHTML/MHTML/MD/XML
eBook: CHM/EPUB/FB2
Archive: ZIP
Fixed Layout: PDF
The indexing metadata operation is supported for the following file formats:
Microsoft Word®: DOC/DOT/DOCX/DOCM/DOTX/DOTM/RTF/TXT
OpenOffice Writer®: ODT/OTT
Microsoft Excel®: XLS/XLT/XLSX/XLSM/XLSB/XLTX/XLTM/XLA/XLAM
OpenOffice Calc®: ODS/OTS/CSV/TSV/SpreadsheetML
Microsoft PowerPoint®: PPT/PPS/POT/PPTX/PPTM/POTX/POTM/PPSX/PPSM
OpenOffice Impress®: ODP
Microsoft Outlook®: PST/OST/EML/MSG
Apple® Mail Message: EMLX
Microsoft OneNote®: ONE
Microsoft Project®: MPP
Microsoft Visio®: VSD/VSS
Markup: HTML/XHTML/MHTML/MD/XML
eBook: CHM/EPUB/FB2
Archive: ZIP
Audio: MP3/WAV
Video: AVI/MOV/QT/FLV/ASF
Image: BMP/GIF/JP2/PNG/WEBP/TIFF/JPG/DJVU
Adobe Photoshop®: PSD
Medical Imaging: DCM/DICOM
Metadata: EMF/WMF
Fixed Layout: PDF
BitTorrent: TORRENT
For details and limitations please visit, Supported Document Formats.
System Requirements
- Microsoft Windows: Windows Desktop & Server (x86, x64), Microsoft Azure
- macOS: Mac OS X
- Linux: Ubuntu, OpenSUSE, CentOS, and others
- Java Versions:
J2SE 7.0 (1.7)
,J2SE 8.0 (1.8)
or above (for example Java 10)
GroupDocs.Search for Java does not require any external software or third party tool to be installed. Just follow one of the ways as described in Installation and Configuration.
Get Started
GroupDocs hosts all Java APIs at the GroupDocs Repository. You can easily use GroupDocs.Search for Java API directly in your Maven projects with simple configurations. For the detailed instructions please visit Installation from GroupDocs Repository using Maven documentation page.
Sample Java code to use the Blended Characters in Search Indexing
String indexFolder = "c:\\MyIndex\\";
String documentFolder = "c:\\MyDocuments\\";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
// Setting hyphen character type to blended
index.getDictionaries().getAlphabet().setRange(new char[] { '-' }, CharacterType.Blended);
// Indexing documents from the specified folder
index.add(documentFolder);
// Searching in the index
SearchResult result1 = index.search("Elliot-Murray-Kynynmound");
SearchResult result2 = index.search("Elliot");
SearchResult result3 = index.search("Murray");
SearchResult result4 = index.search("Kynynmound");
Product Page | Docs | Demos | API Reference | Examples | Blog | Free Support | Temporary License
Version | Release Date |
---|---|
24.4 | April 22, 2024 |
24.2 | February 6, 2024 |
24.1 | January 15, 2024 |
23.6 | June 15, 2023 |
23.3 | March 24, 2023 |
22.11 | November 30, 2022 |
22.10 | October 24, 2022 |
21.2 | January 25, 2022 |
20.8 | January 25, 2022 |
19.2 | January 25, 2022 |
18.12 | January 25, 2022 |
21.8 | August 18, 2021 |
21.3 | March 18, 2021 |
20.11 | November 19, 2020 |
20.6 | June 23, 2020 |
20.4 | April 16, 2020 |
19.12 | December 11, 2019 |
19.5.1 | July 15, 2019 |
19.5 | May 31, 2019 |
19.3 | March 7, 2019 |
18.11 | November 1, 2018 |
GroupDocs.Total GroupDocs.Search API on premise DOC DOT DOCX DOCM DOTX DOTM RTF TXT ODT OTT XLS XLT XLSX XLSM XLSB XLTX XLTM XLA XLAM ODS OTS CSV TSV SpreadsheetML PPT PPS POT PPTX PPTM POTX POTM PPSX PPSM ODP PST OST EML MSG EMLX ONE MPP VSD VSS HTML XHTML MHTML MD XML CHM EPUB FB2 ZIP MP3 WAV AVI MOV QT FLV ASF BMP GIF JP2 PNG WEBP TIFF JPG DJVU PSD DCM DICOM EMF WMF PDF TORRENT search-index regular characters blended regular-characters blended-characters index password protected password-protected compression extract extraction diacritics extractor indexing attributes filter regex regular expression regular-expression faceted search fuzzy homophone dictionary alias alphabet synonym windows macOS Linux J2SE document automation