Skip to main content

Overview

The Health Check endpoint allows UserTrace to verify that your agent is operational and ready to handle test requests. This endpoint is called before starting simulations to ensure system readiness.

Endpoint

GET /health

Request Headers

Accept: application/json
Authentication is not required for health check endpoints as they only provide basic status information.

Request

Simple GET request with no body required:
curl -X GET https://your-sandbox.com/health

Response Format

status
string
required
Overall system status. Values: healthy, unhealthy, degraded
timestamp
string
required
ISO 8601 timestamp when the health check was performed
version
string
Current version of your agent or API
uptime
integer
System uptime in seconds (optional)
dependencies
object
Status of external dependencies (optional)
metrics
object
System performance metrics (optional)

Response Examples

Healthy System

{
  "status": "healthy",
  "timestamp": "2024-01-21T10:30:00Z",
  "version": "1.0.0",
  "uptime": 3600,
  "dependencies": {
    "database": "healthy",
    "llm_service": "healthy",
    "external_apis": "healthy"
  },
  "metrics": {
    "response_time_avg": 250,
    "error_rate": 0.1,
    "memory_usage": 45,
    "cpu_usage": 30
  }
}

Degraded System

{
  "status": "degraded",
  "timestamp": "2024-01-21T10:30:00Z",
  "version": "1.0.0",
  "uptime": 7200,
  "dependencies": {
    "database": "healthy",
    "llm_service": "healthy",
    "external_apis": "unhealthy"
  },
  "metrics": {
    "response_time_avg": 800,
    "error_rate": 5.2,
    "memory_usage": 78,
    "cpu_usage": 85
  }
}

Minimal Response

{
  "status": "healthy",
  "timestamp": "2024-01-21T10:30:00Z",
  "version": "1.0.0"
}

Status Codes

HTTP StatusStatus ValueDescription
200healthySystem is fully operational
200degradedSystem is operational but performance is impacted
503unhealthySystem is not operational

Error Responses

Service Unavailable (503)

{
  "status": "unhealthy",
  "timestamp": "2024-01-21T10:30:00Z",
  "version": "1.0.0",
  "error": {
    "message": "Database connection failed",
    "details": "Unable to connect to primary database"
  }
}

Network Issues (500)

{
  "error": {
    "message": "Health check service unavailable",
    "type": "internal_error",
    "code": "health_check_failed"
  }
}

Implementation Guidelines

Health Check Logic

Your health check should verify:
  1. Basic Connectivity: Server is responding to requests
  2. Dependencies: Database, LLM services, external APIs
  3. Resources: Memory, CPU, disk space within acceptable limits
  4. Configuration: Required environment variables and settings

Response Time

  • Health checks should respond within 2 seconds
  • UserTrace will timeout requests after 10 seconds
  • Consider caching health status for frequently called checks

Frequency

  • UserTrace calls health checks before each simulation
  • Implement lightweight checks that don’t impact system performance
  • Avoid expensive operations in health check logic

Best Practices

  1. Keep it Simple: Health checks should be lightweight and fast
  2. Include Dependencies: Check critical external services
  3. Use Caching: Cache expensive dependency checks
  4. Return Meaningful Data: Include metrics that help diagnose issues
  5. Handle Failures Gracefully: Always return a response, even for partial failures
Monitoring: Consider using the health check endpoint for your own monitoring and alerting systems.