- 1. Configure the index pipeline
- 2. Configure the query pipeline
- Pipeline Setup Examples
- Evaluate the query pipeline
Before beginning this procedure, train a machine learning model using either the FAQ method or the cold start method.
Note
|
For instructions for Fusion 5.3 and up, see Configure The Smart Answers Pipelines (5.3 and Up). |
Regardless of how you set up your model, the deployment procedure is the same:
The following default index and query pipelines for Smart Answers are automatically created when you create a Fusion app:
Default index pipelines | Default query pipelines | ||
---|---|---|---|
|
For encoding one field. |
|
Calculates vectors distances between an encoded query and one document vector field. Should be used together with |
|
For encoding two fields (question and answer pairs, for example). |
|
Calculates vectors distances between an encoded query and two document vector fields. After that, scores are ensembled. Should be used together with the |
See Configure the index pipeline below. |
See Configure the query pipeline below. |
1. Configure the index pipeline
-
Open the Index Workbench.
-
Load or create your datasource using the default question-answering index pipeline.
-
In the Machine Learning stage, change the value of Model ID to match the model deployment name you chose when you configured the model training job.
-
Change
documentFeatureField
to the document field name to be processed and encoded into dense vectors. documentFeatureField variable in the “Model input transformation script” to the document field name to be processed and encoded into dense vectors. -
In the Model input transformation script field, enter the script below, replacing the
documentFeatureField
variable value (body_t
by default) with the document field name to be processed and encoded into dense vectors./* Name of the document field to feed into the encoder. */ var documentFeatureField = "body_t" /* Model input construction. */ var modelInput = new java.util.HashMap() modelInput.put("text", doc.getFirstFieldValue(documentFeatureField)) modelInput.put("pipeline", "index") modelInput.put("compress", "true") modelInput.put("unidecode", "true") modelInput.put("lowercase", "false") modelInput
-
Save the datasource.
-
Index your data.
2. Configure the query pipeline
-
Open the Query Workbench.
-
Load one of the default question-answering query pipelines.
-
In the Query Fields stage, update Return Fields to return additional fields that should be displayed with each answer, such as fields corresponding to title, text, or ID.
It is recommended that you remove the asterisk (
*
) field and specify each individual field you want to return, as returning too many fields will affect runtime performance.NoteDo not remove compressed_document_vector_s
,document_clusters_ss
, andscore
as these fields are necessary for later stages -
In the Machine Learning stage, change the Model ID value to match the model deployment name you chose when you configured the model training job.
-
Save the query pipeline.
Pipeline Setup Examples
Example 1: Index and retrieve the question and answer separately
Based on your search Web page design, you can put best-matched questions and answers in separate sections, or if you only want to retrieve answers and serve to chatbot app, please index them separately in different documents.
For example, in the picture below, we construct the input file for the index pipeline such that the text part of the question/answer is stored in answer_t
, and we add an additional field type_s
whose value is "question" or "answer" to separate the two types.
In the Machine Learning stage, we specify documentFeatureField
as answer_t
in the Model input transformation script so that compressed_document_vector_s
is generated based on this field.
At search time, we can apply a filter query on the type_s
field to return either a question or an answer.
You can achieve a similar result by using the default question-answering
index and query pipelines.
(For more detail, see Smart Answers Detailed Pipeline Setup.)
Example 2: Index and retrieve the question and answer together
If you prefer to show question and answer together in one document (that is, treat the question as the title and the answer as the description), you can index them together in the same document. It’s similar to the question-answering-dual-fields
index and query pipelines default setup.
For example, in the picture below, we added two Machine Learning stages and named them Answers Encoding and Questions Encoding respectively.
In the Questions Encoding stage, we specify documentFeatureField
to be question_t
, and change the default values for compressedVectorField
, vectorField
, clustersField
, and distancesField
to compressed_question_vector_s
, question_vector_ds
, question_clusters_is
, and question_distances_ds
respectively, in the Model output transformation script.
In the Answers Encoding stage, we specify documentFeatureField
to be answer_t
, and change the default values for compressedVectorField
, vectorField
, clustersField
, and distancesField
to answer_vector_ds
, answer_clusters_ss
and answer_distances_ds
respectively.
(For more detail, see Smart Answers Detailed Pipeline Setup.)
The indexed document is shown in the picture below.
Since we have two dense vectors generated in the index (compressed_question_vector_s
and compressed_answer_vector_s
), at query time, we need to compute query to question distance and query to answer distance. This can be setup as the picture shown below. We added two Vectors distance per Query/Document stages and named them QQ Distance and QA Distance respectively. In the QQ Distance stage, we changed the default values for Document Vector Field and Document Vectors Distance Field to compressed_question_vector_s and qq_distance respectively. In the QA Distance stage, we changed the default values for Document Vector Field, Document Vectors, and Distance Field to compressed_answer_vector_s and qa_distance respectively.
Now we have two distances (query-to-question distance and query-to-answer distance) and we can ensemble them together with Solr score to get a final ranking score. This is recommended especially when you have limited FAQ dataset and want to utilize both question and answer information. This ensemble can be done in the Compute mathematical expression stage as shown below.
Evaluate the query pipeline
The Smart Answers Evaluate Pipeline job (Evaluate QnA Pipeline job in Fusion 5.1 and 5.2) evaluates the rankings of results from any Smart Answers pipeline and finds the best set of weights in the ensemble score. See Evaluate a Smart Answers Pipeline for setup instructions.