Skip to Content
🎉 Welcome to handit.ai Documentation!
EvaluationQuickstart

Evaluation Quickstart

Give your autonomous engineer the data it needs to detect issues and create fixes. Set up comprehensive AI quality assessment in under 5 minutes.

Prerequisites: Node.js and a Handit.ai Account . For complete setup including tracing and autonomous fixes, use our Main Quickstart.

Set up evaluation

Step 1: Install the CLI

terminal
npm install -g @handit.ai/cli

Step 2: Configure evaluators

terminal
handit-cli evaluators-setup

The CLI will guide you through:

  • Connect evaluation models (GPT-4, Llama, etc.)
  • Configure evaluators for quality dimensions you care about
  • Set evaluation percentages (how often to evaluate)
  • Link to your AI components automatically

Setup complete! Your evaluation system is active. Quality scores will appear in your dashboard within minutes.

Verify setup

✅ Check your dashboard: Go to dashboard.handit.ai  - you should see:

  • Quality scores appearing for your AI interactions
  • Evaluation trends in Agent Performance
  • Individual interaction breakdowns

Understanding your evaluation data

Quality Monitoring: Your dashboard shows live evaluation scores as your AI processes requests, giving your autonomous engineer data to identify problems quickly.

Pattern Recognition: Your autonomous engineer analyzes quality patterns to understand when performance drops and what improvements would have the biggest impact.

Targeted Insights: Rather than generic scores, you see specific assessments across quality dimensions—completeness, accuracy, empathy—enabling targeted fixes.

AI Agent Tracing Dashboard

Next steps

Enable Autonomous Fixes: Connect GitHub Integration so your autonomous engineer can create pull requests with proven improvements.

Custom Evaluators: Create Custom Quality Assessments for your specific use case.

Advanced Features: Explore LLM as Judges and other advanced capabilities.

Your autonomous engineer can now detect quality issues! Add GitHub integration to enable autonomous fixes based on this evaluation data.

Troubleshooting

No evaluation data: Verify your AI is receiving traffic and evaluation percentages are above 0%.

Model issues: Check API keys are valid and have sufficient credits. Re-run handit-cli evaluators-setup to reconfigure.

For help, visit our Support page or join our Discord community .

Last updated on