Prompt-Based Testing

Test your system prompts across different models and scenarios without building a full agent. Perfect for prompt optimization and A/B testing.

Getting Started

1. Configure Your Model

Choose your preferred LLM provider and model:

OpenAI Models

Available Models:• GPT-4o, GPT-4o mini • GPT-4 Turbo, GPT-4 • GPT-3.5 Turbo

Anthropic Models

Available Models:• Claude 3.5 Sonnet • Claude 3 Opus, Claude 3 Sonnet • Claude 3 Haiku

Google Gemini Models

Available Models:• Gemini 1.5 Pro • Gemini 1.5 Flash • Gemini 1.0 Pro

2. Input Your System Prompt

Enter your complete system prompt in the text area. This is what will be used to guide the model’s behavior during testing. Example:

You are a helpful customer service agent for a delivery company. 
Your goal is to resolve customer issues quickly and empathetically.

Guidelines:
1. Always ask for the order number first
2. Acknowledge the customer's frustration
3. Provide clear next steps
4. Offer compensation when appropriate

Keep responses under 100 words and maintain a professional tone.

Remove Temporary Variables: Make sure to remove any placeholder text like {customer_name} or {order_id} from your prompt. Use static examples instead.

3. Configure Model Parameters

Temperature (0.0 - 1.0)

0.0-0.3: Consistent, predictable responses
0.4-0.7: Balanced creativity and consistency
0.8-1.0: More creative and varied responses

Response Format

Text: Standard text responses
JSON: Structured JSON output (specify schema in prompt)

4. Set Evaluation Context

This is crucial for accurate testing. Add any information from your prompt that the evaluator needs to know: What to Include:

Key guidelines or rules from your prompt
Expected response format or structure
Specific goals or success criteria
Any constraints or limitations

Example Evaluation Context:

The agent should:
- Always ask for order number first
- Acknowledge customer frustration
- Keep responses under 100 words
- Maintain professional tone
- Offer compensation when issues warrant it

Success criteria:
- Issue resolution within 3 exchanges
- Customer satisfaction maintained
- Company policies followed

Why This Matters: The simulated user and evaluator don’t see your system prompt during testing. The evaluation context ensures they understand what behavior to expect and how to measure success.

5. Advanced Features

Tool Calls & Function Calling If your prompt involves tool calls or function calling, use our Custom Chat Agent setup instead, which provides full control over tool definitions and execution.

Best Practices

Prompt Clarity

Clear Instructions• Use specific, actionable guidelines • Include examples of good responses • Define success criteria explicitly

Testing Strategy

Effective Testing• Start with edge cases • Test across different user personas • Compare performance across models

Example Workflow

Input your system prompt
Select OpenAI GPT-4o with temperature 0.3
Add evaluation context about expected behavior
Choose scenarios like “frustrated delivery customer”
Run simulations across multiple personas
Analyze results and iterate on your prompt

Common Use Cases

Customer Service: Testing support agent responses
Content Creation: Evaluating writing assistant prompts
Educational: Testing tutoring or explanation prompts
Classification: Testing categorization and tagging prompts
Summarization: Testing document or conversation summaries

Ready to test more complex scenarios? Check out our Custom Chat Agent guide for tool calling, multi-turn conversations, and custom API integrations.

Getting Started

Quick Start by Use Case

Voice Integration

Testing & Evaluation

Getting Started

1. Configure Your Model

2. Input Your System Prompt

3. Configure Model Parameters

4. Set Evaluation Context

5. Advanced Features

Best Practices

Prompt Clarity

Testing Strategy

Example Workflow

Common Use Cases

Getting Started

Quick Start by Use Case

Voice Integration

Testing & Evaluation

​Getting Started

​1. Configure Your Model

​2. Input Your System Prompt

​3. Configure Model Parameters

​4. Set Evaluation Context

​5. Advanced Features

​Best Practices

Prompt Clarity

Testing Strategy

​Example Workflow

​Common Use Cases

Getting Started

1. Configure Your Model

2. Input Your System Prompt

3. Configure Model Parameters

4. Set Evaluation Context

5. Advanced Features

Best Practices

Example Workflow

Common Use Cases