CLI Evaluator Setup
Set up comprehensive AI evaluation in minutes using the Handit CLI. This guide covers the complete CLI workflow for connecting evaluators, managing model tokens, and configuring evaluation settings.
Prerequisites: You need the Handit CLI installed and a Handit.ai account. If you haven’t installed the CLI yet, run npm install -g @handit.ai/cli
.
Quick Setup
Step 1: Run the Evaluators Setup Command
handit-cli evaluators-setup
The CLI will walk you through an interactive setup process:
Connect Evaluation Models
The CLI will prompt you to add model tokens for evaluation:
- OpenAI: GPT-4, GPT-3.5-turbo for high-quality evaluation
- Together AI: Llama models for cost-effective evaluation
- Other providers: Anthropic, Cohere, and more
Configure Model Tokens
For each model you want to use:
- Enter your API key when prompted
- Set usage preferences (evaluation vs optimization)
- Configure rate limits and budgets
Connect Existing Evaluators
The CLI will show your existing evaluators and help you:
- Associate them with specific LLM nodes
- Set evaluation percentages (how often to run each evaluator)
- Configure priority levels
Review and Apply Configuration
The CLI will show a summary of your setup and apply the configuration to your Handit.ai account.
Advanced Configuration
Managing Multiple Evaluators
You can run the CLI setup multiple times to add or modify evaluators:
# Run evaluator setup to modify configuration
handit-cli evaluators-setup
The CLI will guide you through:
- Adding new evaluation models (OpenAI, Together AI, etc.)
- Connecting evaluators to LLM nodes in your project
- Setting evaluation percentages for each evaluator
- Configuring model tokens for evaluation
Example Configuration
After setup, your evaluators might look like this:
Evaluator: "Response Completeness"
├── Evaluation Percentage: 10% (recommended for general quality)
├── Priority: Normal
└── Associated Nodes: customer-service-response
Evaluator: "Hallucination Detection"
├── Evaluation Percentage: 20% (higher for critical accuracy)
├── Priority: High
└── Associated Nodes: customer-service-response, technical-support-response
CLI-Generated Configuration
After running the setup, the CLI creates or updates configuration files in your project:
Environment Variables
# Evaluation model tokens (automatically configured)
HANDIT_EVALUATION_OPENAI_KEY=your-openai-key
HANDIT_EVALUATION_TOGETHER_KEY=your-together-key
Configuration File
{
"evaluators": [
{
"id": "completeness-check",
"name": "Response Completeness",
"associatedNodes": ["customer-service-response"],
"evaluationPercentage": 10,
"priority": "normal",
"modelId": "gpt-4"
}
],
"models": [
{
"id": "gpt-4",
"provider": "openai",
"model": "gpt-4",
"usage": "evaluation"
}
]
}
Best Practices
Evaluation Percentage Guidelines
âś… Recommended Percentages
- Critical quality aspects: 15-25%
- General quality checks: 5-15%
- Format/compliance checks: 5-10%
- Experimental evaluators: 1-5%
⚠️ Consider Your Traffic
- High traffic: Start with lower percentages
- Low traffic: Use higher percentages for faster insights
- Cost sensitive: Balance evaluation frequency with budget
- Critical applications: Prioritize accuracy over cost
Model Selection Strategy
For High Accuracy Needs:
- Primary: GPT-4 (highest quality evaluations)
- Backup: Claude-3 Opus (alternative high-quality option)
For Cost-Effective Evaluation:
- Primary: GPT-3.5-turbo (good balance of cost/quality)
- Backup: Llama-2-70B (open source alternative)
For Specific Use Cases:
- Code evaluation: GPT-4 or Claude-3
- Creative content: GPT-4 or Claude-3
- Factual accuracy: GPT-4 with web search
- Multilingual: GPT-4 or Claude-3
Troubleshooting
Common CLI Issues
Command not found:
# Reinstall the CLI
npm uninstall -g @handit.ai/cli
npm install -g @handit.ai/cli
Authentication errors:
- Ensure you’re logged into your Handit.ai account
- Check that your account has evaluation permissions
- Verify your integration token is valid
Model token validation fails:
- Double-check your API keys are correct
- Ensure the API keys have sufficient permissions
- Verify the model provider is supported
Configuration Issues
Evaluators not running:
- Check that LLM nodes are receiving traffic
- Verify evaluation percentages are > 0%
- Ensure model tokens have sufficient credits
Inconsistent results:
- Review evaluator prompts for clarity
- Consider if criteria are too subjective
- Test with known good/bad examples
Next Steps
- Monitor results in your Agent Performance dashboard
- Create Custom Evaluators for specific needs
- Set up Optimization to automatically improve based on evaluation results
- Explore Advanced Evaluation Features
Ready for optimization? Once your evaluations are running, set up automated optimization to improve your AI based on the evaluation results.