Skip to Content
🎉 Welcome to handit.ai Documentation!
Quickstart

Complete Handit.ai Quickstart

The Open Source Engine that Auto-Improves Your AI.
Handit evaluates every agent decision, auto-generates better prompts and datasets, A/B-tests the fix, and lets you control what goes live.

What you’ll build: A fully observable, continuously evaluated, and automatically optimizing AI system that improves itself based on real production data.

Overview: The Complete Journey

Here’s what we’ll accomplish in three phases:

Phase 1: AI Observability ⏱️ 5 minutes

Set up comprehensive tracing to see inside your AI agents and understand what they’re doing

Phase 2: Quality Evaluation ⏱️ 10 minutes

Add automated evaluation to continuously assess performance across multiple quality dimensions

Phase 3: Self-Improving AI ⏱️ 15 minutes

Enable automatic optimization that generates better prompts, tests them, and provides proven improvements

The Result: Complete visibility into performance with automated optimization recommendations based on real production data.

Prerequisites

Before we start, make sure you have:

Phase 1: AI Observability (5 minutes)

Let’s add comprehensive tracing to see exactly what your AI is doing.

Step 1: Install the SDK

Step 2: Get Your Integration Token

  1. Log into your Handit.ai Dashboard 
  2. Go to Settings → Integrations
  3. Copy your integration token

Step 3: Add Basic Tracing

Now, let’s set up your main agent function, LLM calls and tool usage with tracing. You’ll need to set up four key components:

  1. Initialize Handit.ai service

  2. Set up your start tracing

  3. Track LLMs calls, tools in your workflow

  4. Set up your end tracing

⚠️

Important: Each node in your workflow should have a unique node_name to properly track its execution in the dashboard.

Phase 1 Complete! 🎉 You now have full observability with every operation, timing, input, output, and error visible in your dashboard.

➡️ Want to dive deeper? Check out our detailed Tracing Quickstart for advanced features and best practices.

Phase 2: Quality Evaluation (10 minutes)

Now let’s add automated evaluation to continuously assess quality across multiple dimensions.

Step 1: Connect Evaluation Models

  1. Go to Settings → Model Tokens
  2. Add your OpenAI or other model credentials
  3. These models will act as “judges” to evaluate responses

Step 2: Create Focused Evaluators

Create separate evaluators for each quality aspect. Critical principle: One evaluator = one quality dimension.

  1. Go to Evaluation → Evaluation Suite
  2. Click Create New Evaluator

Example Evaluator 1: Response Completeness

You are evaluating whether an AI response completely addresses the user's question. Focus ONLY on completeness - ignore other quality aspects. User Question: {input} AI Response: {output} Rate on a scale of 1-10: 1-2 = Missing major parts of the question 3-4 = Addresses some parts but incomplete 5-6 = Addresses most parts adequately 7-8 = Addresses all parts well 9-10 = Thoroughly addresses every aspect Output format: Score: [1-10] Reasoning: [Brief explanation]

Example Evaluator 2: Accuracy Check

You are checking if an AI response contains accurate information. Focus ONLY on factual accuracy - ignore other aspects. User Question: {input} AI Response: {output} Rate on a scale of 1-10: 1-2 = Contains obvious false information 3-4 = Contains questionable claims 5-6 = Mostly accurate with minor concerns 7-8 = Accurate information 9-10 = Completely accurate and verifiable Output format: Score: [1-10] Reasoning: [Brief explanation]

Step 3: Associate Evaluators to Your LLM Nodes

  1. Go to Agent Performance
  2. Select your LLM node (e.g., “response-generator”)
  3. Click on Manage Evaluators on the menu
  4. Add your evaluators

Step 4: Monitor Results

View real-time evaluation results in:

  • Tracing tab: Individual evaluation scores
  • Agent Performance: Quality trends over time

Tracing Dashboard - Individual Evaluation Scores: AI Agent Tracing Dashboard

Agent Performance Dashboard - Quality Trends: Agent Performance Dashboard

Phase 2 Complete! 🎉 Continuous evaluation is now running across multiple quality dimensions with real-time insights into performance trends.

➡️ Want more sophisticated evaluators? Check out our detailed Evaluation Quickstart for advanced techniques.

Phase 3: Self-Improving AI (15 minutes)

Finally, let’s enable automatic optimization that generates better prompts and provides proven improvements.

Step 1: Connect Optimization Models

  1. Go to Settings → Model Tokens
  2. Select optimization model tokens
  3. Self-improving AI automatically activates once configured

Automatic Activation: Once optimization tokens are configured, the system automatically begins analyzing evaluation data and generating optimizations. No additional setup required!

Step 2: Monitor Optimization Results

The system is now automatically generating and testing improved prompts. Monitor results in two places:

Agent Performance Dashboard:

  • View agent performance metrics
  • Compare current vs optimized versions
  • See improvement percentages

Agent Performance Dashboard

Release Hub:

  • Go to Optimization → Release Hub
  • View detailed prompt comparisons
  • See statistical confidence and recommendations

Release Hub - Prompt Performance Comparison

Step 3: Deploy Optimizations

  1. Review Recommendations in Release Hub
  2. Compare Performance between current and optimized prompts
  3. Mark as Production for prompts you want to deploy
  4. Fetch via SDK in your application

Fetch Optimized Prompts:

Phase 3 Complete! 🎉 You now have a self-improving AI that automatically detects quality issues, generates better prompts, tests them in the background, and provides proven improvements.

➡️ Want advanced optimization features? Check out our detailed Optimization Quickstart for CI/CD integration and deployment strategies.

What You’ve Accomplished

Congratulations! You now have a complete AI observability and optimization system:

âś… Full Observability

  • Complete visibility into operations
  • Real-time monitoring of all LLM calls and tools
  • Detailed execution traces with timing and error tracking

âś… Continuous Evaluation

  • Automated quality assessment across multiple dimensions
  • Real-time evaluation scores and trends
  • Quality insights to identify improvement opportunities

âś… Self-Improving AI

  • Automatic detection of quality issues
  • AI-generated prompt optimizations
  • Background A/B testing with statistical confidence
  • Production-ready improvements delivered via SDK

Next Steps

Resources

Ready to transform your AI? Visit beta.handit.ai  to get started with the complete Handit.ai platform today.

Troubleshooting

Tracing Not Working?

  • Verify your API key is correct and set as environment variable
  • Ensure you’re using the functions correct

Evaluations Not Running?

  • Confirm model tokens are valid and have sufficient credits
  • Verify LLM nodes are receiving traffic
  • Check evaluation percentages are > 0%

Optimizations Not Generating?

  • Ensure evaluation data shows quality issues (scores below threshold)
  • Verify optimization model tokens are configured
  • Confirm sufficient evaluation data has been collected

Need Help?

Last updated on