Simulate

Auto-generate scenarios, test at scale.

Let AI write realistic test scenarios from your agent prompt — then run them as real concurrent voice calls.

Start testing freeBook a demo
app.rubrichq.io/live-simulations/scenario
N
Nooruz zaman's Orga…
Owner
Platform
Dashboard
Test Agents
Co-Pilot
Metrics
Analytics
Prompt Optimizer
Evaluation
Live Simulations
Batch
Observability
Conversations
Notifications
Settings
N
Nooruz zaman
nooruzzonline@gmail.com
Live Simulations
ScenarioPersona LibraryConversation AttributesAutomated Schedules
Manage and simulate your test scenarios.
Search scenarios...1P Creditor (Vapi)Generate by AICreate Scenario
NameTypeLanguagePersonaMetrics
Verify bot authentication2InstructionsEnglishProfessional CallerVoice Change Detection+18
Happy Customer Debt ExtensionInstructionsEnglishInterruptive PersonaVoice Change Detection+15
Sales - Product InquiryInstructionsEnglishProfessional CallerUser Greeting+1
Customer Support - Billing IssueInstructionsEnglishProfessional CallerCustomer Satisfaction
Payment Extension — Bad Connection (volume + network tags)2ConditionEnglishProfessional CallerUser Greeting
Sales - Product InquiryInstructionsEnglishProfessional CallerCustomer Satisfaction
Customer Support - Billing IssueInstructionsEnglishProfessional CallerCustomer Satisfaction
After-Hours Voicemail CaptureInstructionsEnglishProfessional CallerUser Greeting+3
Dispute Charge — Escalation Path3ConditionEnglishInterruptive PersonaCustomer Satisfaction+9
Page 1 of 6Prev123456Next

AI-generated scenarios

Paste your prompt and get hundreds of realistic test cases — no manual scripting.

5 to 500 concurrent

Real phone calls through Vapi, Retell, LiveKit, or Pipecat — at production scale.

Diverse personas

Angry, confused, accented, and adversarial callers that mirror your real traffic.

Batch Testing

Run thousands of calls in parallel

Configure a scenario suite once and run it on every deploy. Watch results, scores, and failures land in real time.

Configure scenarios once, run unlimited times
Per-scenario pass/fail and metric scores
CI/CD-ready — gate releases on pass rate
Test runs/Batch #46Running
400
Total
371
Passed
17
Failed
12
Running
95.6%
Pass rate
Filter scenarios…
All modesAll status
#ScenarioModeCalled atStatusMetrics
1Payment plan negotiationPhone2m agoDone10/10
2Hardship deferral requestWeb5m agoDone7/8
3Settlement offer flowPhonejust nowRunning3/10
4PII extraction probeText8m agoFailed4/12

More in Simulate

Everything you need to model real-world calls.

PersonaEdit
MC
Maria ChenAngry
Returning customer · billing dispute
LanguageEnglish (US)
AccentLatina–American
VoiceFemale · mid-30s
PaceFast · interrupts
TemperamentImpatient
BackgroundCafé · medium
Behaviour

Easily frustrated and quick to escalate. Interrupts the agent, repeats demands, and threatens to switch providers if not acknowledged.

Personas

Callers that mirror your traffic

A library of caller personalities — moods, accents, and 65+ languages — so every scenario sounds like a real customer.

Schedules

Run on a schedule, automatically

Trigger suites on a cadence or on every deploy — nightly regressions, hourly health checks, and CI gates that block bad prompts.

Schedules+ New
Nightly regression
128 scenarios
2:00 AM daily
Health check
Golden set · 8 calls
every hour
On deploy (CI)
GitHub Actions
on push
Conversation Attributes2 groups
Group 1
nameMaria Chen
card_number4242 •••• 1234
cvv•••
due_amount$1,240.50
last_payment$0.00 · 32d
settlement_okyes · 60%
Group 2
nameRobert Tan
card_number5500 •••• 9876
cvv•••
due_amount$480.00
last_payment$50.00 · 9d
settlement_okno
Attributes

Feed in real account data

Group attributes — names, card numbers, balances and due amounts — into reusable test profiles, so every persona calls in with realistic, structured data.

See it in action

See a simulation in action

RubricHQ places a real phone call to your agent, simulating an angry customer demanding a refund.

Live simulation · 6 turnsPersona: Angry Customer
Caller
Yeah hi, I've been waiting 20 minutes already. I got charged twice this month and nobody seems to care. I want my money back right now.
AI Agent
I completely understand your frustration, and I'm sorry for the wait. Let me pull up your account right away. Can you confirm the last four digits of your phone number?
Correctly de-escalated without being defensive
Caller
It's 4829. And I swear if you put me on hold again I'm switching providers today.
AI Agent
I see the issue — there's a duplicate charge of $49.99 on March 8th. I can process a full refund for that right now. You should see it back in your account within 3-5 business days.
Response latency: 1.1s — within target
Quoted 3–5 days — the refund SLA is 5–7 business days
Caller
Fine. But what about the late fee? I got hit with a $15 late fee because of YOUR billing error.
AI Agent
You're absolutely right — since this was our error, I'll waive the late fee as well. I've removed the $15 charge from your account. Is there anything else I can help with?
Proactively resolved the secondary issue without escalation
Live scoring

Every turn is graded as it happens

As the call plays out, RubricHQ scores each agent response — flagging policy slips, timing every reply, and confirming what went right.

2 checks passed
Correct, on-policy responses
1 audio metric passed
Latency, tone, and more, per turn
1 issue caught
Policy or accuracy violations
This run
Persona: Angry CustomerScenario: Billing Dispute
Caller in this call
MC
Maria ChenAngry
Returning customer · billing dispute
Phone•••• 4829
Due amount$1,240.50
Duplicate charge$49.99 · Mar 8
Late fee$15.00
Account state32 days past due

Why it matters

What happens without simulation testing?

Edge cases slip through

Your QA team tests 10-20 calls manually. The angry caller, the confused elderly user, the one who switches languages mid-call — they never get tested.

Avg 12% of calls hit untested edge cases

Regressions go unnoticed

You update the prompt on Tuesday. By Friday, resolution rate has dropped 15% but nobody noticed because there's no automated comparison.

3-5 days avg time to detect regressions

Scale blindness

Your agent handles 10 concurrent calls fine. At 50, it starts mixing context between callers. You only find out from customer complaints.

Load-related failures affect 8% of peak traffic

Generate your first test suite in minutes.

Connect your agent, paste your prompt, and let AI write the scenarios.

Start testing freeBook a demo