public class ExtractorFactory extends Object implements IContainerFactory
Provides the functionality for creating extractors for documents.
ExtractorFactory
provides the functionality to create instances of extractors classes.
It contains the following methods:
CreateTextExtractor |
Creates a text extractor for the file.
If the document format is not detected, the method returns <strong>null</strong> .
|
CreateFormattedTextExtractor |
Creates a formatted text extractor for the file.
If the document format is not detected, the method returns <strong>null</strong> .
|
CreateContainer |
Creates a container object for the file.
If the document format is not detected, the method returns <strong>null</strong> .
|
CreateMetadataExtractor |
Creates a metadata extractor.
If the document format is not detected, the method returns <strong>null</strong> .
|
For detecting the document format MediaTypeDetector
is used.
By default all the supported document formats are detected. You can change this behavior by passing a custom
MediaTypeDetector
instance to the factory constructor.
For formatted text extractors a PlainDocumentFormatter
is used.
You can change a formatter by passing an instance of a formatter to the factory constructor.
Creating a text extractor:
// Create a factory
ExtractorFactory factory = new ExtractorFactory();
// Create a text extractor
TextExtractor extractor = factory.createTextExtractor(fileName);
// Print a text from the document or message if a file format isn't supported
System.out.println(extractor != null ? extractor.extractAll() : "The document format is not supported");
Creating a formatted text extractor:
// Create a factory
ExtractorFactory factory = new ExtractorFactory();
// Create a formatted text extractor
TextExtractor extractor = factory.createFormattedTextExtractor(fileName);
// Print a formatted text from the document or message if a file format isn't supported
System.out.println(extractor != null ? extractor.extractAll() : "The document format is not supported");
Creating a formatted text extractor with Markdown formatter:
// Create a factory with MarkdownDocumentFormatter as a default formatter
ExtractorFactory factory = new ExtractorFactory(new MarkdownDocumentFormatter());
// Create a formatted text extractor
TextExtractor extractor = factory.createFormattedTextExtractor(fileName);
// Print a Markdown-formatted text from the document or message if a file format isn't supported
System.out.println(extractor != null ? extractor.extractAll() : "The document format is not supported");
Creating a text extractor only for spreadsheets:
// Create a factory which can detect only spreadsheet's media types
ExtractorFactory factory = new ExtractorFactory(null, new CellsMediaTypeDetector());
// Create a formatted text extractor
TextExtractor extractor = factory.createFormattedTextExtractor(fileName);
// Print a formatted text from the document or message if a file format isn't supported
System.out.println(extractor != null ? extractor.extractAll() : "The document format is not supported");
Creating a container:
// Create a factory
ExtractorFactory factory = new ExtractorFactory(null, new CellsMediaTypeDetector());
// Create a container
Container container = factory.createContainer(fileName);
// If a file format isn't supported
if (container == null) {
// Print a message
System.out.println("The document format is not supported");
}
Constructor and Description |
---|
ExtractorFactory()
Initializes a new instance of the
ExtractorFactory class. |
ExtractorFactory(DocumentFormatter documentFormatter)
Initializes a new instance of the
ExtractorFactory class. |
ExtractorFactory(DocumentFormatter documentFormatter,
MediaTypeDetector mediaTypeDetector)
Initializes a new instance of the
ExtractorFactory class. |
ExtractorFactory(DocumentFormatter documentFormatter,
MediaTypeDetector mediaTypeDetector,
EncodingDetector encodingDetector)
Initializes a new instance of the
ExtractorFactory class. |
ExtractorFactory(DocumentFormatter documentFormatter,
MediaTypeDetector mediaTypeDetector,
EncodingDetector encodingDetector,
INotificationReceiver notificationReceiver)
Initializes a new instance of the
ExtractorFactory class. |
Modifier and Type | Method and Description |
---|---|
Container |
createContainer(InputStream stream)
Creates a container.
|
Container |
createContainer(InputStream stream,
LoadOptions loadOptions)
Creates a container.
|
Container |
createContainer(String fileName)
Creates a container.
|
Container |
createContainer(String fileName,
LoadOptions loadOptions)
Creates a container.
|
TextExtractor |
createFormattedTextExtractor(InputStream stream)
Creates a formatted text extractor.
|
TextExtractor |
createFormattedTextExtractor(InputStream stream,
LoadOptions loadOptions)
Creates a formatted text extractor.
|
TextExtractor |
createFormattedTextExtractor(String fileName)
Creates a formatted text extractor.
|
TextExtractor |
createFormattedTextExtractor(String fileName,
LoadOptions loadOptions)
Creates a formatted text extractor.
|
MetadataExtractor |
createMetadataExtractor(InputStream stream)
Creates a metadata extractor.
|
MetadataExtractor |
createMetadataExtractor(InputStream stream,
LoadOptions loadOptions)
Creates a metadata extractor.
|
MetadataExtractor |
createMetadataExtractor(String fileName)
Creates a metadata extractor.
|
MetadataExtractor |
createMetadataExtractor(String fileName,
LoadOptions loadOptions)
Creates a metadata extractor.
|
TextExtractor |
createTextExtractor(InputStream stream)
Creates a text extractor.
|
TextExtractor |
createTextExtractor(InputStream stream,
LoadOptions loadOptions)
Creates a text extractor.
|
TextExtractor |
createTextExtractor(String fileName)
Creates a text extractor.
|
TextExtractor |
createTextExtractor(String fileName,
LoadOptions loadOptions)
Creates a text extractor.
|
DocumentFormatter |
getDocumentFormatter()
Gets a document formatter.
|
EncodingDetector |
getEncodingDetector()
Gets an encoding detector.
|
MediaTypeDetector |
getMediaTypeDetector()
Gets a media type detector.
|
protected void |
sendNotificationMessage(INotificationReceiver receiver,
NotificationMessage message)
Sends notification message to
receiver and factory receiver (if presented). |
public ExtractorFactory()
Initializes a new instance of the ExtractorFactory
class.
public ExtractorFactory(DocumentFormatter documentFormatter)
Initializes a new instance of the ExtractorFactory
class.
documentFormatter
- An instance of the DocumentFormatter
.public ExtractorFactory(DocumentFormatter documentFormatter, MediaTypeDetector mediaTypeDetector)
Initializes a new instance of the ExtractorFactory
class.
documentFormatter
- An instance of the DocumentFormatter
.mediaTypeDetector
- An instance of the MediaTypeDetector
.public ExtractorFactory(DocumentFormatter documentFormatter, MediaTypeDetector mediaTypeDetector, EncodingDetector encodingDetector)
Initializes a new instance of the ExtractorFactory
class.
documentFormatter
- An instance of the DocumentFormatter
.mediaTypeDetector
- An instance of the MediaTypeDetector
.encodingDetector
- An instance of the EncodingDetector
.public ExtractorFactory(DocumentFormatter documentFormatter, MediaTypeDetector mediaTypeDetector, EncodingDetector encodingDetector, INotificationReceiver notificationReceiver)
Initializes a new instance of the ExtractorFactory
class.
documentFormatter
- An instance of the DocumentFormatter
.mediaTypeDetector
- An instance of the MediaTypeDetector
.encodingDetector
- An instance of the EncodingDetector
.notificationReceiver
- INotificationReceiver
to process messages.public DocumentFormatter getDocumentFormatter()
Gets a document formatter.
DocumentFormatter
.public MediaTypeDetector getMediaTypeDetector()
Gets a media type detector.
MediaTypeDetector
.public EncodingDetector getEncodingDetector()
Gets an encoding detector.
EncodingDetector
.public TextExtractor createTextExtractor(String fileName) throws FileNotFoundException
Creates a text extractor.
fileName
- The name of the file.
FileNotFoundException
public TextExtractor createTextExtractor(String fileName, LoadOptions loadOptions) throws FileNotFoundException
Creates a text extractor.
fileName
- The name of the file.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected
by the extension of the file or by the content of the file.
FileNotFoundException
public TextExtractor createTextExtractor(InputStream stream)
Creates a text extractor.
stream
- The stream of the document.
public TextExtractor createTextExtractor(InputStream stream, LoadOptions loadOptions)
Creates a text extractor.
stream
- The stream of the document.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
public TextExtractor createFormattedTextExtractor(String fileName) throws FileNotFoundException
Creates a formatted text extractor.
fileName
- The name of the file.
FileNotFoundException
public TextExtractor createFormattedTextExtractor(String fileName, LoadOptions loadOptions) throws FileNotFoundException
Creates a formatted text extractor.
fileName
- The name of the file.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected
by the extension of the file or by the content of the file.
FileNotFoundException
public TextExtractor createFormattedTextExtractor(InputStream stream)
Creates a formatted text extractor.
stream
- The stream of the document.
public TextExtractor createFormattedTextExtractor(InputStream stream, LoadOptions loadOptions)
Creates a formatted text extractor.
stream
- The stream of the document.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
public MetadataExtractor createMetadataExtractor(String fileName) throws FileNotFoundException
Creates a metadata extractor.
fileName
- The name of the file.
FileNotFoundException
public MetadataExtractor createMetadataExtractor(String fileName, LoadOptions loadOptions) throws FileNotFoundException
Creates a metadata extractor.
fileName
- The name of the file.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected
by the extension of the file or by the content of the file.
FileNotFoundException
public MetadataExtractor createMetadataExtractor(InputStream stream)
Creates a metadata extractor.
stream
- The stream of the document.
public MetadataExtractor createMetadataExtractor(InputStream stream, LoadOptions loadOptions)
Creates a metadata extractor.
stream
- The stream of the document.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
public Container createContainer(String fileName) throws FileNotFoundException
Creates a container.
createContainer
in interface IContainerFactory
fileName
- The name of the file.
FileNotFoundException
public Container createContainer(String fileName, LoadOptions loadOptions) throws FileNotFoundException
Creates a container.
createContainer
in interface IContainerFactory
fileName
- The name of the file.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected
by the extension of the file or by the content of the file.
FileNotFoundException
public Container createContainer(InputStream stream)
Creates a container.
createContainer
in interface IContainerFactory
stream
- The stream of the document.
public Container createContainer(InputStream stream, LoadOptions loadOptions)
Creates a container.
createContainer
in interface IContainerFactory
stream
- The stream of the document.loadOptions
- The options of loading the file.
loadOptions.MediaType
is null, media type will be detected by the content of the file.
protected void sendNotificationMessage(INotificationReceiver receiver, NotificationMessage message)
Sends notification message to receiver
and factory receiver (if presented).
receiver
- The notification receiver.message
- The message with a notification.Copyright © 2018. All rights reserved.