Beam Academy: AI-Agent-Fundamentals

Steps 1-3: Creating a testing dataset and running it

Step 1: Create Testing Dataset

  1. Navigate to Evaluation Datasets

  • In the Beam AI Evaluation Framework, go to the Evaluation Datasets section from the main menu.

  1. Select the Relevant Agent

  • Choose the agent you want to evaluate (e.g., Order Processing Agent) from the list of available agents.

  1. Create a New Dataset

  • Click on Add Record or an equivalent option to create a new testing dataset.

  • Name the dataset meaningfully, so it’s clear what scenarios it will cover (e.g., “Order Processing Test Cases” or “Common User Queries”).

  1. Save the Dataset

  • Once the dataset is created, ensure it is saved. You can always return to this dataset to add more inputs as needed.

Step 2: Define Sample Inputs

  1. Access the Created Dataset

  • Open the dataset you just created. You’ll see an interface to add specific test inputs for the agent.

  1. Add Sample Inputs

  • For each scenario, click on Add Input to start defining individual sample inputs.

  • Descriptive Name: Provide each input with a clear, descriptive name to indicate the scenario it represents (e.g., "Order Inquiry with Missing Data").

  • Attachments: You can add attachments if the test case requires additional files or documents for the agent to process.

  • Dataset Selection: Ensure that each input is assigned to the correct dataset. This links the input directly to the testing dataset you created.

  1. Vary Input Types and Complexity

  • Include a diverse set of inputs to cover various use cases:

    • Standard cases the agent is expected to handle regularly.

    • Edge cases, like incomplete or ambiguous data, to test how robustly the agent performs.

    • Errors or typos that real users might make.

  1. Define Expected Agent Workflow

  • For each input, specify the Expected Workflow ID. This is the workflow the agent should follow when processing this input, ensuring the input is handled according to the correct sequence or process.

Step 3: Setting Up a Dataset Run

  1. Create a Dataset Run

  • After adding inputs, click on the dataset name (e.g., "My Dataset") to create a dataset run.

  • Click on Add Record in the Dataset Runs section. This will prepare the dataset for an evaluation run.

  1. Review Dataset Inputs

  • Ensure the dataset run includes all inputs you've defined. This is the setup that will be used to evaluate the agent's responses.

Step 4: Running the Dataset

  1. Open Dataset Run Side Window

  • After setting up the dataset run, click on it to open the side window, which displays the run details, including the list of inputs.

  1. Run the Dataset

  • In the side window, click Run Dataset to initiate the evaluation process. This will send all inputs in the dataset to the agent, allowing you to assess its responses.

Best Practices for Creating Evaluation Datasets

To ensure comprehensive testing and reliable evaluation results, follow these best practices:

  1. Include Realistic Scenarios

  • Use inputs that reflect real-world use cases the agent will encounter.

  • Capture a variety of scenarios to understand how the agent performs under standard conditions.

  1. Cover Edge Cases

  • Include uncommon or extreme inputs that the agent might encounter.

  • Examples of edge cases could be missing data, unexpected formats, or high input volume.

  • Testing these cases helps ensure the agent can handle diverse situations robustly.

  1. Vary Input Types

  • Include different types of inputs (e.g., text, numbers, dates) to test how the agent responds to varied data formats.

  • This ensures the agent performs consistently regardless of input type.

  1. Simulate Common Errors

  • Add inputs that include common user mistakes, like typos or incomplete information.

  • This allows you to observe if the agent responds appropriately to erroneous inputs.

Start Today

Start building AI agents to automate processes

Join our platform and start building AI agents for various types of automations.