public final class NoteTextExtractor extends TextExtractor implements ISearchable, IHighlightExtractor, IRegexSearchable, IPageTextExtractor
Provides the text extractor for OneNote documents.
Extracting a text from OneNote section:
// Create a text extractor for OneNote sections
NoteTextExtractor extractor = new NoteTextExtractor(stream);
// Extract a text
System.out.println(extractor.extractAll());
Extracting text by pages:
// Create a text extractor for OneNote sections
NoteTextExtractor extractor = new NoteTextExtractor(stream);
// Iterate pages
for (int pageIndex = 0; pageIndex < extractor.getPageCount(); pageIndex++) {
// Extract a text from the page which index is pageIndex
System.out.println(extractor.extractPage(pageIndex));
}
Constructor and Description |
---|
NoteTextExtractor(InputStream stream)
Initializes a new instance of the
NoteTextExtractor class. |
NoteTextExtractor(InputStream stream,
LoadOptions loadOptions)
Initializes a new instance of the
NoteTextExtractor class. |
NoteTextExtractor(String fileName)
Initializes a new instance of the
NoteTextExtractor class. |
NoteTextExtractor(String fileName,
LoadOptions loadOptions)
Initializes a new instance of the
NoteTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
protected void |
dispose(boolean disposing)
Releases the unmanaged resources used by the extractor.
|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
String |
extractPage(int pageIndex)
Reads all characters from the page with
pageIndex and returns the data as a string. |
int |
getPageCount()
Gets a total count of the pages.
|
protected String |
prepareLine()
Returns a line of the text.
|
void |
reset()
Resets the current document.
|
void |
search(SearchOptions options,
ISearchHandler handler,
ISearchEngine searchEngine,
List<String> keywords)
Searches the keywords.
|
void |
search(SearchOptions options,
ISearchHandler handler,
List<String> keywords)
Searches the keywords.
|
void |
searchWithRegex(String expression,
ISearchHandler handler,
RegexSearchOptions searchOptions)
Searches the expression.
|
checkDisposed, close, dispose, extractAll, extractLine, extractText, extractTextLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public NoteTextExtractor(String fileName)
Initializes a new instance of the NoteTextExtractor
class.
fileName
- The path to the file.public NoteTextExtractor(String fileName, LoadOptions loadOptions)
Initializes a new instance of the NoteTextExtractor
class.
fileName
- The path to the file.loadOptions
- The options of loading the file.public NoteTextExtractor(InputStream stream)
Initializes a new instance of the NoteTextExtractor
class.
stream
- The stream of the document.public NoteTextExtractor(InputStream stream, LoadOptions loadOptions)
Initializes a new instance of the NoteTextExtractor
class.
stream
- The stream of the document.loadOptions
- The options of loading the file.public int getPageCount()
Gets a total count of the pages.
getPageCount
in interface IPageTextExtractor
public void search(SearchOptions options, ISearchHandler handler, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.keywords
- A collection of words to search.public void search(SearchOptions options, ISearchHandler handler, ISearchEngine searchEngine, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.searchEngine
- An instance of the search engine.keywords
- A collection of words to search.public void searchWithRegex(String expression, ISearchHandler handler, RegexSearchOptions searchOptions)
Searches the expression.
searchWithRegex
in interface IRegexSearchable
expression
- A regular expression.handler
- An instance of the search handler.searchOptions
- Options for searching.public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions.public void reset()
Resets the current document.
ExtractLine
method will return the first line of the document.
reset
in class TextExtractor
public String extractPage(int pageIndex)
Reads all characters from the page with pageIndex
and returns the data as a string.
extractPage
in interface IPageTextExtractor
pageIndex
- The index of the page.protected void dispose(boolean disposing)
Releases the unmanaged resources used by the extractor.
dispose
in class TextExtractor
disposing
- A boolean true if invoked from Dispose; otherwise, false.protected String prepareLine()
Returns a line of the text.
prepareLine
in class TextExtractor
Copyright © 2019. All rights reserved.