Background A/B Testing

Handit.ai automatically tests optimizations against your current prompt using a small portion of your data in the background. This approach is cost-efficient, completely secure for users, and provides clear metrics for comparison in the Release Hub.

A/B testing happens invisibly in the background using only a representative sample of your data. Users always get responses from your current prompt while we test optimizations on a subset and compare evaluation results.

How It Works

Our background A/B testing process is simple and efficient:

Sample Selection

We select a small, representative portion of your production queries for testing

Background Processing

Both your current prompt and the optimization process the same sample inputs in parallel

Evaluation Comparison

Our evaluators score both responses, providing direct quality metrics

Release Hub Results

Statistical comparisons appear in Release Hub for your review and deployment decisions

Agent Performance Dashboard

Cost-Efficient Testing

Smart Sampling Approach:

Instead of testing on all your data, we use a strategic sampling method:

Sample Size: Only 10-15% of your production queries
Representative Data: Automatically selected to match your traffic patterns
Cost Impact: Minimal increase in evaluation costs
Statistical Validity: Sufficient sample size for reliable results

Example Testing Volume:

If you process 1,000 queries daily:

Production queries: 1,000 (normal processing, users get current prompt responses)
Background testing: 150 queries (15% sample for A/B testing)
Additional cost: ~15% evaluation increase
User impact: Zero (completely invisible)

User Security & Zero Impact

Complete User Protection:

Your users are never exposed to experimental prompts:

User Experience: Always receives responses from your current production prompt
No Delays: Background testing runs in parallel, no performance impact
No Risk: Experimental responses are only used for evaluation, never sent to users
Privacy: Same data handling as your normal operations

Background Testing Flow:

When a user submits a query:

User gets: Response from current production prompt (immediately)
Background: Same query processed by optimization (for evaluation only)
Evaluation: Both responses scored by evaluators
Results: Metrics stored for Release Hub comparison

Release Hub Metrics

Clear Performance Comparison:

Background A/B testing provides the metrics you see in Release Hub:

Example Comparison Results:

Customer Support Optimization - Empathy Enhancement

Current Prompt Performance:

Empathy Score: 3.2 out of 5 average
Helpfulness Score: 4.1 out of 5 average
Overall Quality: 3.8 out of 5 average

Optimized Prompt Performance:

Empathy Score: 4.4 out of 5 average (+37% improvement)
Helpfulness Score: 4.3 out of 5 average (+5% improvement)
Overall Quality: 4.2 out of 5 average (+11% improvement)

Statistical Confidence: 95% (based on 287 background evaluations)

Release Hub Recommendation: ✅ Ready for deployment

Release Hub - Prompt Performance Comparison

Automatic Process

No Setup Required:

Background A/B testing starts automatically when optimizations are generated:

Trigger: Self-improving AI creates new optimization
Testing: Background A/B test begins automatically
Duration: Runs until statistical significance achieved (typically 3-7 days)
Results: Appear in Release Hub when ready
Decision: You choose when to deploy based on proven results

Testing Status:

You can monitor progress in your dashboard:

Sample collection progress
Current confidence levels
Preliminary results
Estimated completion time

Next Steps

Ready to see background A/B testing in action?

Enable self-improving AI to automatically generate optimizations that are tested
Monitor background testing progress in your platform dashboard
Review results in Release Hub when statistical significance is achieved
Deploy proven optimizations with confidence

Background A/B testing is automatic. When you enable optimization, the system automatically tests improvements using cost-efficient sampling while keeping your users completely protected.