Steps 1-3: Creating a Testing Dataset and Running it

Step 1: Create Testing Dataset

Navigate to Evaluation Datasets

In the Beam AI Evaluation Framework, go to the Evaluation Datasets section from the main menu.

Select the Relevant Agent

Choose the agent you want to evaluate (e.g., Order Processing Agent) from the list of available agents.

Create a New Dataset

Click on Add Record or an equivalent option to create a new testing dataset.
Name the dataset meaningfully, so it’s clear what scenarios it will cover (e.g., “Order Processing Test Cases” or “Common User Queries”).

Save the Dataset

Once the dataset is created, ensure it is saved. You can always return to this dataset to add more inputs as needed.

Step 2: Define Sample Inputs

Access the Created Dataset

Open the dataset you just created. You’ll see an interface to add specific test inputs for the agent.

Add Sample Inputs

For each scenario, click on Add Input to start defining individual sample inputs.
Descriptive Name: Provide each input with a clear, descriptive name to indicate the scenario it represents (e.g., "Order Inquiry with Missing Data").
Attachments: You can add attachments if the test case requires additional files or documents for the agent to process.
Dataset Selection: Ensure that each input is assigned to the correct dataset. This links the input directly to the testing dataset you created.

Vary Input Types and Complexity

Include a diverse set of inputs to cover various use cases:
- Standard cases the agent is expected to handle regularly.
- Edge cases, like incomplete or ambiguous data, to test how robustly the agent performs.
- Errors or typos that real users might make.

Define Expected Agent Workflow

For each input, specify the Expected Workflow ID. This is the workflow the agent should follow when processing this input, ensuring the input is handled according to the correct sequence or process.

Step 3: Setting Up a Dataset Run

Create a Dataset Run

After adding inputs, click on the dataset name (e.g., "My Dataset") to create a dataset run.
Click on Add Record in the Dataset Runs section. This will prepare the dataset for an evaluation run.

Review Dataset Inputs

Ensure the dataset run includes all inputs you've defined. This is the setup that will be used to evaluate the agent's responses.

Step 4: Running the Dataset

Open Dataset Run Side Window

After setting up the dataset run, click on it to open the side window, which displays the run details, including the list of inputs.

Run the Dataset

In the side window, click Run Dataset to initiate the evaluation process. This will send all inputs in the dataset to the agent, allowing you to assess its responses.

Best Practices for Creating Evaluation Datasets

To ensure comprehensive testing and reliable evaluation results, follow these best practices:

Include Realistic Scenarios

Use inputs that reflect real-world use cases the agent will encounter.
Capture a variety of scenarios to understand how the agent performs under standard conditions.

Cover Edge Cases

Include uncommon or extreme inputs that the agent might encounter.
Examples of edge cases could be missing data, unexpected formats, or high input volume.
Testing these cases helps ensure the agent can handle diverse situations robustly.

Vary Input Types

Include different types of inputs (e.g., text, numbers, dates) to test how the agent responds to varied data formats.
This ensures the agent performs consistently regardless of input type.

Simulate Common Errors

Add inputs that include common user mistakes, like typos or incomplete information.
This allows you to observe if the agent responds appropriately to erroneous inputs.

Start Today

Start building AI agents to automate processes

Join our platform and start building AI agents for various types of automations.

Request access

Start Today

Start building AI agents to automate processes

Join our platform and start building AI agents for various types of automations.

Request access

Start Today

Start building AI agents to automate processes

Join our platform and start building AI agents for various types of automations.

Request access

Platform

AI Agents

Solutions

Resources

About

Steps 1-3: Creating a Testing Dataset and Running it

Steps 1-3: Creating a Testing Dataset and Running it

Steps 1-3: Creating a Testing Dataset and Running it

Step 1: Create Testing Dataset

Navigate to Evaluation Datasets

Select the Relevant Agent

Create a New Dataset

Save the Dataset

Step 2: Define Sample Inputs

Access the Created Dataset

Add Sample Inputs

Vary Input Types and Complexity

Define Expected Agent Workflow

Step 3: Setting Up a Dataset Run

Create a Dataset Run

Review Dataset Inputs

Step 4: Running the Dataset

Open Dataset Run Side Window

Run the Dataset

Best Practices for Creating Evaluation Datasets

Include Realistic Scenarios

Cover Edge Cases

Vary Input Types

Simulate Common Errors

Start building AI agents to automate processes

Start building AI agents to automate processes

Start building AI agents to automate processes