The NLP Annotator index stage performs Natural Language Processing tasks.
Note
|
This stage is deprecated as of Fusion 5.2.0. |
You can choose from 3 different NLP implementations:
Set up and behavior differ depending on the implementation.
OpenNLP
The OpenNLP implementation is ready to use out-of-the-box. Simply specify "opennlp" as the "Model ID" property.
These annotation tasks are supported:
-
NER
-
Sentence detection
-
POS Tagging
-
Shallow Parsing (Chunking)
SpaCy
The SpaCy implementation is ready to use out-of-the-box. The default SpaCy implementation uses the en_core_web_sm model. Specify "spacy" as the "Model ID" property.
These annotation tasks are supported:
-
NER
-
Sentence detection
-
POS tagging
Schemes used for labels for each of the annotation tasks can be found at https://spacy.io/api/annotation.
Spark NLP
The Spark NLP implementation requires you first download a Spark NLP model and upload it to Fusion.
-
Download a model from https://nlp.johnsnowlabs.com/docs/en/pipelines. Note: Only the pre-trained NER model is supported. If choosing an NER model, download NerDLModel instead of NerCRFModel.
-
Upload the model to Fusion using the following curl command:
curl -u [username]:[password] \ -X POST \ "https://[fusion host]/api/ai/ml-models?modelId=[desired model ID]&type=spark-nlp" \ -F "file=@/path/to/model.zip"
For example, if you want to use the "Explain Document ML" model:
-
Download the latest version of the "Explain Document ML model" (
explain_document_ml_en_2.1.0_2.4_1563203154682.zip
at the time of this writing) -
Upload the model to Fusion:
curl -u [username]:[password] -X POST "https://[fusion host]/api/ai/ml-models?modelId=explain_document_ml&type=spark-nlp" -F "file=@/path/to/explain_document_ml_en_2.1.0_2.4_1563203154682.zip"`
-
When configuring this stage, specify "explain_document_ml" as the Model ID
For Spark NLP, the annotation tasks are supported depends on the model used.
-
Add NLP Annotator index stage.
-
Supply the Model ID ("opennlp", "spacy", or the model ID given to the uploaded Spark NLP model).
-
Configure the index pipeline stage.
-
Specify the source, label pattern, and target (destination) fields:
-
source field: the raw text with name entities to be extracted.
-
label pattern: regex pattern that matches the NER/POS labels: for example,
PER.
will match extracted name entities with labelPERSON
, whileNN.
will match tagged nouns. -
target field: the outcome extraction/tagging and so on.
-
Configuration
Tip
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|