public class Extractor extends Object
Provides the functionality for fast extracting text and metadata from documents.
Extracting metadata:
// Create an extractor
Extractor extractor = new Extractor();
// Extract a metadata
MetadataCollection metadata = extractor.extractMetadata(fileName);
// If a file format isn't supported
if (metadata == null) {
// Print a message
System.out.println("The document format is not supported");
}
Modifier and Type | Field and Description |
---|---|
static Extractor |
DEFAULT
A default extractor.
|
Constructor and Description |
---|
Extractor()
Initializes a new instance of the
Extractor class. |
Extractor(MediaTypeDetector mediaTypeDetector,
EncodingDetector encodingDetector,
INotificationReceiver notificationReceiver)
Initializes a new instance of the
Extractor class. |
Extractor(MediaTypeDetector mediaTypeDetector,
EncodingDetector encodingDetector,
INotificationReceiver notificationReceiver,
DocumentFormatter documentFormatter)
Initializes a new instance of the
Extractor class. |
Modifier and Type | Method and Description |
---|---|
String |
extractFormattedText(InputStream stream)
Extracts a formatted text.
|
String |
extractFormattedText(InputStream stream,
LoadOptions loadOptions)
Extracts a formatted text.
|
String |
extractFormattedText(String fileName)
Extracts a formatted text.
|
String |
extractFormattedText(String fileName,
LoadOptions loadOptions)
Extracts a formatted text.
|
MetadataCollection |
extractMetadata(InputStream stream)
Extracts the metadata.
|
MetadataCollection |
extractMetadata(InputStream stream,
LoadOptions loadOptions)
Extracts the metadata.
|
MetadataCollection |
extractMetadata(String fileName)
Extracts the metadata.
|
MetadataCollection |
extractMetadata(String fileName,
LoadOptions loadOptions)
Extracts the metadata.
|
String |
extractText(InputStream stream)
Extracts a text.
|
String |
extractText(InputStream stream,
LoadOptions loadOptions)
Extracts a text.
|
String |
extractText(String fileName)
Extracts a text.
|
String |
extractText(String fileName,
LoadOptions loadOptions)
Extracts a text.
|
EncodingDetector |
getEncodingDetector()
Gets an encoding detector.
|
MediaTypeDetector |
getMediaTypeDetector()
Gets a media type detector.
|
protected void |
sendNotificationMessage(INotificationReceiver receiver,
NotificationMessage message)
Sends notification message to
receiver and factory receiver (if presented). |
public static final Extractor DEFAULT
A default extractor.
public Extractor()
Initializes a new instance of the Extractor
class.
public Extractor(MediaTypeDetector mediaTypeDetector, EncodingDetector encodingDetector, INotificationReceiver notificationReceiver)
Initializes a new instance of the Extractor
class.
mediaTypeDetector
- An instance of the MediaTypeDetector
.encodingDetector
- An instance of the EncodingDetector
.notificationReceiver
- INotificationReceiver
to process messages.public Extractor(MediaTypeDetector mediaTypeDetector, EncodingDetector encodingDetector, INotificationReceiver notificationReceiver, DocumentFormatter documentFormatter)
Initializes a new instance of the Extractor
class.
mediaTypeDetector
- An instance of the MediaTypeDetector
.encodingDetector
- An instance of the EncodingDetector
.notificationReceiver
- INotificationReceiver
to process messages.documentFormatter
- An instance of the DocumentFormatter
.public MediaTypeDetector getMediaTypeDetector()
Gets a media type detector.
MediaTypeDetector
.public EncodingDetector getEncodingDetector()
Gets an encoding detector.
EncodingDetector
.public MetadataCollection extractMetadata(String fileName)
Extracts the metadata.
fileName
- The name of the file.
public MetadataCollection extractMetadata(String fileName, LoadOptions loadOptions)
Extracts the metadata.
fileName
- The name of the file.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected
by the extension of the file or by the content of the file.
public MetadataCollection extractMetadata(InputStream stream)
Extracts the metadata.
stream
- The stream of the document.
public MetadataCollection extractMetadata(InputStream stream, LoadOptions loadOptions)
Extracts the metadata.
stream
- The stream of the document.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
public String extractText(String fileName)
Extracts a text.
fileName
- The name of the file.
public String extractText(String fileName, LoadOptions loadOptions)
Extracts a text.
fileName
- The name of the file.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
public String extractText(InputStream stream)
Extracts a text.
stream
- The stream of the document.
public String extractText(InputStream stream, LoadOptions loadOptions)
Extracts a text.
stream
- The stream of the document.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
public String extractFormattedText(String fileName)
Extracts a formatted text.
fileName
- The name of the file.
public String extractFormattedText(String fileName, LoadOptions loadOptions)
Extracts a formatted text.
fileName
- The name of the file.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
public String extractFormattedText(InputStream stream)
Extracts a formatted text.
stream
- The stream of the document.
public String extractFormattedText(InputStream stream, LoadOptions loadOptions)
Extracts a formatted text.
stream
- The stream of the document.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
protected void sendNotificationMessage(INotificationReceiver receiver, NotificationMessage message)
Sends notification message to receiver
and factory receiver (if presented).
receiver
- The notification receiver.message
- The message with a notification.Copyright © 2019. All rights reserved.