Complete Handit.ai Quickstart
The Open Source Engine that Auto-Improves Your AI.
Handit evaluates every agent decision, auto-generates better prompts and datasets, A/B-tests the fix, and lets you control what goes live.
What you’ll build: A fully observable, continuously evaluated, and automatically optimizing AI system that improves itself based on real production data.
Overview: The Complete Journey
Here’s what we’ll accomplish in three phases:
Phase 1: AI Observability ⏱️ 5 minutes
Set up comprehensive tracing to see inside your AI agents and understand what they’re doing
Phase 2: Quality Evaluation ⏱️ 10 minutes
Add automated evaluation to continuously assess performance across multiple quality dimensions
Phase 3: Self-Improving AI ⏱️ 15 minutes
Enable automatic optimization that generates better prompts, tests them, and provides proven improvements
The Result: Complete visibility into performance with automated optimization recommendations based on real production data.
Prerequisites
Before we start, make sure you have:
- A Handit.ai Account  (sign up if needed)
- 15-30 minutes to complete the setup
Phase 1: AI Observability (5 minutes)
Let’s add comprehensive tracing to see exactly what your AI is doing.
Step 1: Install the SDK
Step 2: Get Your Integration Token
- Log into your Handit.ai Dashboard 
- Go to Settings → Integrations
- Copy your integration token
Step 3: Add Basic Tracing
Now, let’s set up your main agent function, LLM calls and tool usage with tracing. You’ll need to set up four key components:
-
Initialize Handit.ai service
-
Set up your start tracing
-
Track LLMs calls, tools in your workflow
-
Set up your end tracing
Important: Each node in your workflow should have a unique node_name
to properly track its execution in the dashboard.
Phase 1 Complete! 🎉 You now have full observability with every operation, timing, input, output, and error visible in your dashboard.
➡️ Want to dive deeper? Check out our detailed Tracing Quickstart for advanced features and best practices.
Phase 2: Quality Evaluation (10 minutes)
Now let’s add automated evaluation to continuously assess quality across multiple dimensions.
Step 1: Connect Evaluation Models
- Go to Settings → Model Tokens
- Add your OpenAI or other model credentials
- These models will act as “judges” to evaluate responses
Step 2: Create Focused Evaluators
Create separate evaluators for each quality aspect. Critical principle: One evaluator = one quality dimension.
- Go to Evaluation → Evaluation Suite
- Click Create New Evaluator
Example Evaluator 1: Response Completeness
You are evaluating whether an AI response completely addresses the user's question.
Focus ONLY on completeness - ignore other quality aspects.
User Question: {input}
AI Response: {output}
Rate on a scale of 1-10:
1-2 = Missing major parts of the question
3-4 = Addresses some parts but incomplete
5-6 = Addresses most parts adequately
7-8 = Addresses all parts well
9-10 = Thoroughly addresses every aspect
Output format:
Score: [1-10]
Reasoning: [Brief explanation]
Example Evaluator 2: Accuracy Check
You are checking if an AI response contains accurate information.
Focus ONLY on factual accuracy - ignore other aspects.
User Question: {input}
AI Response: {output}
Rate on a scale of 1-10:
1-2 = Contains obvious false information
3-4 = Contains questionable claims
5-6 = Mostly accurate with minor concerns
7-8 = Accurate information
9-10 = Completely accurate and verifiable
Output format:
Score: [1-10]
Reasoning: [Brief explanation]
Step 3: Associate Evaluators to Your LLM Nodes
- Go to Agent Performance
- Select your LLM node (e.g., “response-generator”)
- Click on Manage Evaluators on the menu
- Add your evaluators
Step 4: Monitor Results
View real-time evaluation results in:
- Tracing tab: Individual evaluation scores
- Agent Performance: Quality trends over time
Tracing Dashboard - Individual Evaluation Scores:
Agent Performance Dashboard - Quality Trends:
Phase 2 Complete! 🎉 Continuous evaluation is now running across multiple quality dimensions with real-time insights into performance trends.
➡️ Want more sophisticated evaluators? Check out our detailed Evaluation Quickstart for advanced techniques.
Phase 3: Self-Improving AI (15 minutes)
Finally, let’s enable automatic optimization that generates better prompts and provides proven improvements.
Step 1: Connect Optimization Models
- Go to Settings → Model Tokens
- Select optimization model tokens
- Self-improving AI automatically activates once configured
Automatic Activation: Once optimization tokens are configured, the system automatically begins analyzing evaluation data and generating optimizations. No additional setup required!
Step 2: Monitor Optimization Results
The system is now automatically generating and testing improved prompts. Monitor results in two places:
Agent Performance Dashboard:
- View agent performance metrics
- Compare current vs optimized versions
- See improvement percentages
Release Hub:
- Go to Optimization → Release Hub
- View detailed prompt comparisons
- See statistical confidence and recommendations
Step 3: Deploy Optimizations
- Review Recommendations in Release Hub
- Compare Performance between current and optimized prompts
- Mark as Production for prompts you want to deploy
- Fetch via SDK in your application
Fetch Optimized Prompts:
Phase 3 Complete! 🎉 You now have a self-improving AI that automatically detects quality issues, generates better prompts, tests them in the background, and provides proven improvements.
➡️ Want advanced optimization features? Check out our detailed Optimization Quickstart for CI/CD integration and deployment strategies.
What You’ve Accomplished
Congratulations! You now have a complete AI observability and optimization system:
âś… Full Observability
- Complete visibility into operations
- Real-time monitoring of all LLM calls and tools
- Detailed execution traces with timing and error tracking
âś… Continuous Evaluation
- Automated quality assessment across multiple dimensions
- Real-time evaluation scores and trends
- Quality insights to identify improvement opportunities
âś… Self-Improving AI
- Automatic detection of quality issues
- AI-generated prompt optimizations
- Background A/B testing with statistical confidence
- Production-ready improvements delivered via SDK
Next Steps
- Join our Discord community  for support
- Check out GitHub Issues  for additional help
- Explore Tracing to monitor your AI agents
- Set up Evaluation to grade your AI outputs
- Configure Optimization for continuous improvement
Resources
- Tracing Documentation - Monitor AI agent performance
- Evaluation Documentation - Grade AI outputs automatically
- Optimization Documentation - Improve prompts continuously
- Visit our GitHub Issues  page
Ready to transform your AI? Visit beta.handit.ai  to get started with the complete Handit.ai platform today.
Troubleshooting
Tracing Not Working?
- Verify your API key is correct and set as environment variable
- Ensure you’re using the functions correct
Evaluations Not Running?
- Confirm model tokens are valid and have sufficient credits
- Verify LLM nodes are receiving traffic
- Check evaluation percentages are > 0%
Optimizations Not Generating?
- Ensure evaluation data shows quality issues (scores below threshold)
- Verify optimization model tokens are configured
- Confirm sufficient evaluation data has been collected
Need Help?
- Visit our Support page
- Join our Discord community 
- Check individual quickstart guides for detailed troubleshooting