Step 1: Run the Dataset
Access the Dataset
Navigate to the Evaluation Datasets section and select the relevant dataset for the agent you want to optimize.
Run the Dataset
Click on Run Dataset to initiate the test. This will send all inputs to the agent, allowing it to process them based on the defined workflows and expected outputs.
Step 2: Get Evaluation Results
Access Workflow Accuracy
Go to the Workflow Accuracy section once the dataset has finished running.
Select the recent evaluation run to view the accuracy results.
View Evaluation Metrics
Check key metrics, such as Workflow Match Accuracy and Workflow Accuracy, to see how well the agent performed across different workflows.
Workflow Matching Accuracy shows the success rate for each workflow, indicating specific areas where the agent might be underperforming.
Get Detailed Results
Click on Get Evaluation Results to access a more detailed breakdown of each step and how the agent’s outputs compare to the expected results.
Step 3: Analyze What Went Wrong
Identify Underperforming Workflows
Review workflows with lower accuracy scores to identify which types of tasks the agent struggles with.
Examine Step-by-Step Results
For each underperforming workflow, analyze the specific steps where the agent’s output did not match the expected result.
Look at individual errors or mismatches to understand common issues, such as misinterpretations, missing information, or incorrect formatting.
Identify Patterns
Determine if there are recurring issues across multiple steps or workflows. This can highlight areas where the agent requires improvement, such as better handling of ambiguous data or following structured prompts.
Step 4: Optimize the Agent
Refine Training Data or Workflow Logic
Based on your analysis, update the agent’s training data to address specific weaknesses.
Adjust the agent’s workflow logic or response templates to better align with expected outputs.
Improve Expected Outputs or Prompts
Update expected outputs and prompts as needed to provide clearer guidance for the agent.
Ensure that the evaluation criteria accurately reflect the desired outcomes and do not unintentionally penalize acceptable variations.
Test Changes in a Smaller Dataset (Optional)
If significant changes were made, you may want to test the improvements on a smaller subset of the dataset to ensure the updates are effective.
Step 5: Run the Dataset Again
Repeat the Dataset Run
Run the dataset again to test the optimized agent. This will allow you to verify if the changes have improved performance.
Compare Results
Check the updated accuracy scores and evaluation results to confirm improvements in the workflows and steps that were previously underperforming.
Iterate as Needed
Continue this process iteratively, refining the agent based on each evaluation cycle, until you achieve satisfactory performance across all workflows.