Troubleshooting

Common issues, diagnostic commands, and resolution steps for QBITEL Bridge deployments.

Diagnostic Commands

# Check pod status
kubectl get pods -n qbitel-service-mesh

# View pod logs
kubectl logs -n qbitel-service-mesh deployment/qbitel-engine -f

# Describe a failing pod
kubectl describe pod -n qbitel-service-mesh <pod-name>

# Check service endpoints
kubectl get svc -n qbitel-service-mesh

# Health check
curl http://localhost:8000/health

# Check Kubernetes events
kubectl get events -n qbitel-service-mesh --sort-by='.lastTimestamp'

Common Issues

AI Engine Not Starting

Symptoms: Pod in CrashLoopBackOff or container exits immediately.

Common Causes:

Missing or invalid DATABASE_URL environment variable
Database not reachable from the pod
Insufficient memory (requires 2 GB+ for ML models)
Missing Python dependencies

Resolution:

# Check pod logs for the error
kubectl logs -n qbitel-service-mesh deployment/qbitel-engine --previous

# Verify database connectivity
kubectl exec -n qbitel-service-mesh deployment/qbitel-engine -- \
  python -c "import sqlalchemy; print('DB OK')"

# Check memory limits
kubectl describe pod -n qbitel-service-mesh <pod-name> | grep -A5 "Limits"

LLM Provider Unavailable

Symptoms: Copilot and zero-touch features return errors; health check shows "llm: unavailable".

Common Causes:

Ollama service not running
Incorrect OLLAMA_URL configuration
Model not downloaded

Resolution:

# Verify Ollama is running
curl http://localhost:11434/api/tags

# Pull the required model
ollama pull llama3.2:8b

# Verify the model is available
ollama list

Protocol Discovery Failures

Symptoms: Discovery requests return low confidence or no results.

Common Causes:

Insufficient traffic samples (need 10+ messages for reliable discovery)
Traffic data not properly base64-encoded
Confidence threshold set too high

Resolution:

# Lower the confidence threshold
curl -X POST http://localhost:8000/api/v1/discover \
  -H "Content-Type: application/json" \
  -d '{"packet_data": "...", "metadata": {"confidence_threshold": 0.5}}'

# Verify base64 encoding
echo -n "your data" | base64 | base64 -d

TLS/mTLS Connection Errors

Symptoms: "certificate verify failed" or "handshake failure" errors.

Resolution:

# Check certificate expiration
openssl x509 -in /path/to/cert.pem -noout -dates

# Verify CA chain
openssl verify -CAfile ca.pem server-cert.pem

# Regenerate certificates
./scripts/generate-webhook-certs.sh

Admission Webhook Rejecting Deployments

Symptoms: Kubernetes deployments fail with policy violation messages.

Resolution:

# Check webhook logs for rejection reason
kubectl logs -n qbitel-container-security deployment/admission-webhook

# Temporarily bypass webhook for debugging
kubectl label namespace your-namespace qbitel.ai/webhook-

# Ensure pods have security contexts and resource limits

High Memory Usage

Symptoms: OOMKilled pods or degraded performance.

Resolution:

Increase memory limits in deployment manifests or Helm values
Use DEVICE=cpu if GPU is not needed
Use a smaller LLM model (e.g., llama3.2:8b instead of 70b)
Check for memory leaks in custom protocol parsers

Log Analysis

# Filter logs by level
kubectl logs deployment/qbitel-engine -n qbitel-service-mesh | \
  python -c "import sys,json; [print(json.loads(l)['message']) for l in sys.stdin if json.loads(l).get('level')=='ERROR']"

# Search for specific error patterns
kubectl logs deployment/qbitel-engine -n qbitel-service-mesh --since=1h | \
  grep -i "error\|exception\|failed"

Getting Support

Check the Monitoring dashboards for anomalies
Review the Production Checklist for missing configurations
Consult runbooks in ops/monitoring/runbooks/
File an issue on the GitHub repository with logs and configuration details