Troubleshooting
Common issues, diagnostic commands, and resolution steps for QBITEL Bridge deployments.
Diagnostic Commands
# Check pod status
kubectl get pods -n qbitel-service-mesh
# View pod logs
kubectl logs -n qbitel-service-mesh deployment/qbitel-engine -f
# Describe a failing pod
kubectl describe pod -n qbitel-service-mesh <pod-name>
# Check service endpoints
kubectl get svc -n qbitel-service-mesh
# Health check
curl http://localhost:8000/health
# Check Kubernetes events
kubectl get events -n qbitel-service-mesh --sort-by='.lastTimestamp' Common Issues
AI Engine Not Starting
Symptoms: Pod in CrashLoopBackOff or container exits immediately.
Common Causes:
- Missing or invalid
DATABASE_URLenvironment variable - Database not reachable from the pod
- Insufficient memory (requires 2 GB+ for ML models)
- Missing Python dependencies
Resolution:
# Check pod logs for the error
kubectl logs -n qbitel-service-mesh deployment/qbitel-engine --previous
# Verify database connectivity
kubectl exec -n qbitel-service-mesh deployment/qbitel-engine -- \
python -c "import sqlalchemy; print('DB OK')"
# Check memory limits
kubectl describe pod -n qbitel-service-mesh <pod-name> | grep -A5 "Limits" LLM Provider Unavailable
Symptoms: Copilot and zero-touch features return errors; health check shows "llm: unavailable".
Common Causes:
- Ollama service not running
- Incorrect
OLLAMA_URLconfiguration - Model not downloaded
Resolution:
# Verify Ollama is running
curl http://localhost:11434/api/tags
# Pull the required model
ollama pull llama3.2:8b
# Verify the model is available
ollama list Protocol Discovery Failures
Symptoms: Discovery requests return low confidence or no results.
Common Causes:
- Insufficient traffic samples (need 10+ messages for reliable discovery)
- Traffic data not properly base64-encoded
- Confidence threshold set too high
Resolution:
# Lower the confidence threshold
curl -X POST http://localhost:8000/api/v1/discover \
-H "Content-Type: application/json" \
-d '{"packet_data": "...", "metadata": {"confidence_threshold": 0.5}}'
# Verify base64 encoding
echo -n "your data" | base64 | base64 -d TLS/mTLS Connection Errors
Symptoms: "certificate verify failed" or "handshake failure" errors.
Resolution:
# Check certificate expiration
openssl x509 -in /path/to/cert.pem -noout -dates
# Verify CA chain
openssl verify -CAfile ca.pem server-cert.pem
# Regenerate certificates
./scripts/generate-webhook-certs.sh Admission Webhook Rejecting Deployments
Symptoms: Kubernetes deployments fail with policy violation messages.
Resolution:
# Check webhook logs for rejection reason
kubectl logs -n qbitel-container-security deployment/admission-webhook
# Temporarily bypass webhook for debugging
kubectl label namespace your-namespace qbitel.ai/webhook-
# Ensure pods have security contexts and resource limits High Memory Usage
Symptoms: OOMKilled pods or degraded performance.
Resolution:
- Increase memory limits in deployment manifests or Helm values
- Use
DEVICE=cpuif GPU is not needed - Use a smaller LLM model (e.g.,
llama3.2:8binstead of70b) - Check for memory leaks in custom protocol parsers
Log Analysis
# Filter logs by level
kubectl logs deployment/qbitel-engine -n qbitel-service-mesh | \
python -c "import sys,json; [print(json.loads(l)['message']) for l in sys.stdin if json.loads(l).get('level')=='ERROR']"
# Search for specific error patterns
kubectl logs deployment/qbitel-engine -n qbitel-service-mesh --since=1h | \
grep -i "error\|exception\|failed" Getting Support
- Check the Monitoring dashboards for anomalies
- Review the Production Checklist for missing configurations
- Consult runbooks in
ops/monitoring/runbooks/ - File an issue on the GitHub repository with logs and configuration details