Skip to Content
🎉 Welcome to handit.ai Documentation!
EvaluationEvaluation GuideCLI Setup

CLI Evaluator Setup

Set up comprehensive AI evaluation in minutes using the Handit CLI. This guide covers the complete CLI workflow for connecting evaluators, managing model tokens, and configuring evaluation settings.

Prerequisites: You need the Handit CLI installed and a Handit.ai account. If you haven’t installed the CLI yet, run npm install -g @handit.ai/cli.

Quick Setup

Step 1: Run the Evaluators Setup Command

terminal
handit-cli evaluators-setup

The CLI will walk you through an interactive setup process:

Connect Evaluation Models

The CLI will prompt you to add model tokens for evaluation:

  • OpenAI: GPT-4, GPT-3.5-turbo for high-quality evaluation
  • Together AI: Llama models for cost-effective evaluation
  • Other providers: Anthropic, Cohere, and more

Configure Model Tokens

For each model you want to use:

  • Enter your API key when prompted
  • Set usage preferences (evaluation vs optimization)
  • Configure rate limits and budgets

Connect Existing Evaluators

The CLI will show your existing evaluators and help you:

  • Associate them with specific LLM nodes
  • Set evaluation percentages (how often to run each evaluator)
  • Configure priority levels

Review and Apply Configuration

The CLI will show a summary of your setup and apply the configuration to your Handit.ai account.

Advanced Configuration

Managing Multiple Evaluators

You can run the CLI setup multiple times to add or modify evaluators:

terminal
# Run evaluator setup to modify configuration handit-cli evaluators-setup

The CLI will guide you through:

  • Adding new evaluation models (OpenAI, Together AI, etc.)
  • Connecting evaluators to LLM nodes in your project
  • Setting evaluation percentages for each evaluator
  • Configuring model tokens for evaluation

Example Configuration

After setup, your evaluators might look like this:

Evaluator: "Response Completeness" ├── Evaluation Percentage: 10% (recommended for general quality) ├── Priority: Normal └── Associated Nodes: customer-service-response Evaluator: "Hallucination Detection" ├── Evaluation Percentage: 20% (higher for critical accuracy) ├── Priority: High └── Associated Nodes: customer-service-response, technical-support-response

CLI-Generated Configuration

After running the setup, the CLI creates or updates configuration files in your project:

Environment Variables

.env
# Evaluation model tokens (automatically configured) HANDIT_EVALUATION_OPENAI_KEY=your-openai-key HANDIT_EVALUATION_TOGETHER_KEY=your-together-key

Configuration File

handit-evaluators.config.json
{ "evaluators": [ { "id": "completeness-check", "name": "Response Completeness", "associatedNodes": ["customer-service-response"], "evaluationPercentage": 10, "priority": "normal", "modelId": "gpt-4" } ], "models": [ { "id": "gpt-4", "provider": "openai", "model": "gpt-4", "usage": "evaluation" } ] }

Best Practices

Evaluation Percentage Guidelines

âś… Recommended Percentages

  • Critical quality aspects: 15-25%
  • General quality checks: 5-15%
  • Format/compliance checks: 5-10%
  • Experimental evaluators: 1-5%

⚠️ Consider Your Traffic

  • High traffic: Start with lower percentages
  • Low traffic: Use higher percentages for faster insights
  • Cost sensitive: Balance evaluation frequency with budget
  • Critical applications: Prioritize accuracy over cost

Model Selection Strategy

For High Accuracy Needs:

  • Primary: GPT-4 (highest quality evaluations)
  • Backup: Claude-3 Opus (alternative high-quality option)

For Cost-Effective Evaluation:

  • Primary: GPT-3.5-turbo (good balance of cost/quality)
  • Backup: Llama-2-70B (open source alternative)

For Specific Use Cases:

  • Code evaluation: GPT-4 or Claude-3
  • Creative content: GPT-4 or Claude-3
  • Factual accuracy: GPT-4 with web search
  • Multilingual: GPT-4 or Claude-3

Troubleshooting

Common CLI Issues

Command not found:

# Reinstall the CLI npm uninstall -g @handit.ai/cli npm install -g @handit.ai/cli

Authentication errors:

  • Ensure you’re logged into your Handit.ai account
  • Check that your account has evaluation permissions
  • Verify your integration token is valid

Model token validation fails:

  • Double-check your API keys are correct
  • Ensure the API keys have sufficient permissions
  • Verify the model provider is supported

Configuration Issues

Evaluators not running:

  • Check that LLM nodes are receiving traffic
  • Verify evaluation percentages are > 0%
  • Ensure model tokens have sufficient credits

Inconsistent results:

  • Review evaluator prompts for clarity
  • Consider if criteria are too subjective
  • Test with known good/bad examples

Next Steps

Ready for optimization? Once your evaluations are running, set up automated optimization to improve your AI based on the evaluation results.

Last updated on