Evaluation Quickstart
Give your autonomous engineer the data it needs to detect issues and create fixes. Set up comprehensive AI quality assessment in under 5 minutes.
Prerequisites: Node.js and a Handit.ai Account . For complete setup including tracing and autonomous fixes, use our Main Quickstart.
Set up evaluation
Step 1: Install the CLI
npm install -g @handit.ai/cli
Step 2: Configure evaluators
handit-cli evaluators-setup
The CLI will guide you through:
- Connect evaluation models (GPT-4, Llama, etc.)
- Configure evaluators for quality dimensions you care about
- Set evaluation percentages (how often to evaluate)
- Link to your AI components automatically
Setup complete! Your evaluation system is active. Quality scores will appear in your dashboard within minutes.
Verify setup
✅ Check your dashboard: Go to dashboard.handit.ai - you should see:
- Quality scores appearing for your AI interactions
- Evaluation trends in Agent Performance
- Individual interaction breakdowns
Understanding your evaluation data
Quality Monitoring: Your dashboard shows live evaluation scores as your AI processes requests, giving your autonomous engineer data to identify problems quickly.
Pattern Recognition: Your autonomous engineer analyzes quality patterns to understand when performance drops and what improvements would have the biggest impact.
Targeted Insights: Rather than generic scores, you see specific assessments across quality dimensions—completeness, accuracy, empathy—enabling targeted fixes.
Next steps
Enable Autonomous Fixes: Connect GitHub Integration so your autonomous engineer can create pull requests with proven improvements.
Custom Evaluators: Create Custom Quality Assessments for your specific use case.
Advanced Features: Explore LLM as Judges and other advanced capabilities.
Your autonomous engineer can now detect quality issues! Add GitHub integration to enable autonomous fixes based on this evaluation data.
Troubleshooting
No evaluation data: Verify your AI is receiving traffic and evaluation percentages are above 0%.
Model issues: Check API keys are valid and have sufficient credits. Re-run handit-cli evaluators-setup
to reconfigure.
For help, visit our Support page or join our Discord community .