Troubleshooting

Common issues, diagnostic commands, and resolution steps for QBITEL Bridge deployments.

Diagnostic Commands

# Check pod status
kubectl get pods -n qbitel-service-mesh

# View pod logs
kubectl logs -n qbitel-service-mesh deployment/qbitel-engine -f

# Describe a failing pod
kubectl describe pod -n qbitel-service-mesh <pod-name>

# Check service endpoints
kubectl get svc -n qbitel-service-mesh

# Health check
curl http://localhost:8000/health

# Check Kubernetes events
kubectl get events -n qbitel-service-mesh --sort-by='.lastTimestamp'

Common Issues

AI Engine Not Starting

Symptoms: Pod in CrashLoopBackOff or container exits immediately.

Common Causes:

  • Missing or invalid DATABASE_URL environment variable
  • Database not reachable from the pod
  • Insufficient memory (requires 2 GB+ for ML models)
  • Missing Python dependencies

Resolution:

# Check pod logs for the error
kubectl logs -n qbitel-service-mesh deployment/qbitel-engine --previous

# Verify database connectivity
kubectl exec -n qbitel-service-mesh deployment/qbitel-engine -- \
  python -c "import sqlalchemy; print('DB OK')"

# Check memory limits
kubectl describe pod -n qbitel-service-mesh <pod-name> | grep -A5 "Limits"

LLM Provider Unavailable

Symptoms: Copilot and zero-touch features return errors; health check shows "llm: unavailable".

Common Causes:

  • Ollama service not running
  • Incorrect OLLAMA_URL configuration
  • Model not downloaded

Resolution:

# Verify Ollama is running
curl http://localhost:11434/api/tags

# Pull the required model
ollama pull llama3.2:8b

# Verify the model is available
ollama list

Protocol Discovery Failures

Symptoms: Discovery requests return low confidence or no results.

Common Causes:

  • Insufficient traffic samples (need 10+ messages for reliable discovery)
  • Traffic data not properly base64-encoded
  • Confidence threshold set too high

Resolution:

# Lower the confidence threshold
curl -X POST http://localhost:8000/api/v1/discover \
  -H "Content-Type: application/json" \
  -d '{"packet_data": "...", "metadata": {"confidence_threshold": 0.5}}'

# Verify base64 encoding
echo -n "your data" | base64 | base64 -d

TLS/mTLS Connection Errors

Symptoms: "certificate verify failed" or "handshake failure" errors.

Resolution:

# Check certificate expiration
openssl x509 -in /path/to/cert.pem -noout -dates

# Verify CA chain
openssl verify -CAfile ca.pem server-cert.pem

# Regenerate certificates
./scripts/generate-webhook-certs.sh

Admission Webhook Rejecting Deployments

Symptoms: Kubernetes deployments fail with policy violation messages.

Resolution:

# Check webhook logs for rejection reason
kubectl logs -n qbitel-container-security deployment/admission-webhook

# Temporarily bypass webhook for debugging
kubectl label namespace your-namespace qbitel.ai/webhook-

# Ensure pods have security contexts and resource limits

High Memory Usage

Symptoms: OOMKilled pods or degraded performance.

Resolution:

  • Increase memory limits in deployment manifests or Helm values
  • Use DEVICE=cpu if GPU is not needed
  • Use a smaller LLM model (e.g., llama3.2:8b instead of 70b)
  • Check for memory leaks in custom protocol parsers

Log Analysis

# Filter logs by level
kubectl logs deployment/qbitel-engine -n qbitel-service-mesh | \
  python -c "import sys,json; [print(json.loads(l)['message']) for l in sys.stdin if json.loads(l).get('level')=='ERROR']"

# Search for specific error patterns
kubectl logs deployment/qbitel-engine -n qbitel-service-mesh --since=1h | \
  grep -i "error\|exception\|failed"

Getting Support

  • Check the Monitoring dashboards for anomalies
  • Review the Production Checklist for missing configurations
  • Consult runbooks in ops/monitoring/runbooks/
  • File an issue on the GitHub repository with logs and configuration details