public final class EpubTextExtractor extends EpubTextExtractorBase implements IHighlightExtractor, ISearchable, IRegexSearchable, IStructuredExtractor
Provides the text extractor for EPUB documents.
Extracts a line of characters from a document:
// Create a text extractor for EPUB documents
// Create a text extractor for EPUB documents
EpubTextExtractor extractor = new EpubTextExtractor(stream);
// Extract a line of the text
String line = extractor.extractLine();
// If the line is null, then the end of the file is reached
while (line != null) {
// Print a line to the console
System.out.println(line);
// Extract another line
line = extractor.extractLine();
}
Extracts all characters from a document:
// Create a text extractor for EPUB documents
EpubTextExtractor extractor = new EpubTextExtractor(stream);
// Extract a text
System.out.println(extractor.extractAll());
For more detailed work with document EpubPackage
class is used. Each EPUB document contains one ore more packages.
Count property returns a total number of packages:
int packageCount = extractor.getCount();
Indexer property returns a package:
EpubPackage epubPackage = extractor.get_Item(0);
Constructor and Description |
---|
EpubTextExtractor(InputStream stream)
Initializes a new instance of the
EpubTextExtractor class. |
EpubTextExtractor(String fileName)
Initializes a new instance of the
EpubTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
protected String |
extractItem(String itemPath)
Extracts a text from the document's item.
|
void |
extractStructured(StructuredHandler handler)
Extracts a structured text.
|
void |
search(SearchOptions options,
ISearchHandler handler,
ISearchEngine searchEngine,
List<String> keywords)
Searches the keywords.
|
void |
search(SearchOptions options,
ISearchHandler handler,
List<String> keywords)
Searches the keywords.
|
void |
searchWithRegex(String expression,
ISearchHandler handler,
RegexSearchOptions searchOptions)
Searches the expression.
|
get_Item, getCount, openContainerItem, prepareLine, reset
checkDisposed, close, dispose, dispose, extractAll, extractLine, extractText, extractTextLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public EpubTextExtractor(String fileName)
Initializes a new instance of the EpubTextExtractor
class.
fileName
- The path to the file.public EpubTextExtractor(InputStream stream)
Initializes a new instance of the EpubTextExtractor
class.
stream
- The stream of the document.public void search(SearchOptions options, ISearchHandler handler, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.keywords
- A collection of words to search.public void search(SearchOptions options, ISearchHandler handler, ISearchEngine searchEngine, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.searchEngine
- An instance of the search engine.keywords
- A collection of words to search.public void searchWithRegex(String expression, ISearchHandler handler, RegexSearchOptions searchOptions)
Searches the expression.
searchWithRegex
in interface IRegexSearchable
expression
- A regular expression.handler
- An instance of the search handler.searchOptions
- Options for searching.public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions.public void extractStructured(StructuredHandler handler)
Extracts a structured text.
extractStructured
in interface IStructuredExtractor
handler
- Structured text extraction handler.protected String extractItem(String itemPath)
Extracts a text from the document's item.
extractItem
in class EpubTextExtractorBase
itemPath
- A path to the document's item.Copyright © 2018. All rights reserved.