ci: docker_tag=ext - Update IA_agents documentation with Loki configuration fixes
- Update diagnostic-loki-unhealthy.md with root cause analysis - Add loki-configuration-guide.md with complete configuration guide - Update best-practices-deployment.md with critical Loki configuration rules - Update quick-reference-monitoring.md with Loki network configuration note - Update README.md with Loki configuration requirements - Document the solution for Loki unhealthy issue (0.0.0.0 vs 127.0.0.1) - Add comprehensive Loki configuration with proper network settings - Include validation tests and troubleshooting guide for Loki
This commit is contained in:
parent
a11ecc80de
commit
db94a1afaf
@ -19,6 +19,7 @@ Ce répertoire contient toute la documentation nécessaire pour les agents IA tr
|
||||
### **3. Documents d'Architecture (NOUVEAU)**
|
||||
- [`deployment-architecture.md`](deployment-architecture.md) - Architecture de déploiement par phases
|
||||
- [`best-practices-deployment.md`](best-practices-deployment.md) - Bonnes pratiques et interdictions
|
||||
- [`loki-configuration-guide.md`](loki-configuration-guide.md) - Configuration Loki obligatoire
|
||||
|
||||
### **4. Documents de Déploiement**
|
||||
- [`prompts/prompt-deploy.md`](prompts/prompt-deploy.md) - Prompt de déploiement complet
|
||||
@ -150,6 +151,12 @@ Phase 5: Services Utilitaires
|
||||
- Tous les scripts utilisent `--env-file .env.master`
|
||||
- Pas de fichiers `.env` dispersés
|
||||
|
||||
### **Configuration Loki (CRITIQUE)**
|
||||
- **OBLIGATOIRE** : `http_listen_address: 0.0.0.0` (pas 127.0.0.1)
|
||||
- **OBLIGATOIRE** : `instance_addr: 0.0.0.0` (pas 127.0.0.1)
|
||||
- **Fichier** : `/conf/loki/loki-config.yaml` avec configuration complète
|
||||
- **Healthcheck** : Utiliser `wget` (pas `curl`)
|
||||
|
||||
## 🚨 Signaux d'Alerte
|
||||
|
||||
### **Si vous voyez ces commandes, c'est une ERREUR :**
|
||||
|
243
IA_agents/loki-configuration-guide.md
Normal file
243
IA_agents/loki-configuration-guide.md
Normal file
@ -0,0 +1,243 @@
|
||||
# Guide de Configuration Loki - LeCoffre Node
|
||||
|
||||
## 🎯 Configuration Obligatoire
|
||||
|
||||
### **Problème Résolu**
|
||||
Loki était en état "unhealthy" à cause d'une configuration réseau incorrecte. La solution est maintenant documentée et appliquée.
|
||||
|
||||
## 📁 Fichier de Configuration
|
||||
|
||||
### **Emplacement**
|
||||
```
|
||||
/conf/loki/loki-config.yaml
|
||||
```
|
||||
|
||||
### **Configuration Complète**
|
||||
```yaml
|
||||
auth_enabled: false
|
||||
|
||||
server:
|
||||
http_listen_port: 3100
|
||||
grpc_listen_port: 9096
|
||||
http_listen_address: 0.0.0.0 # ← CRITIQUE : Écoute sur toutes les interfaces
|
||||
grpc_listen_address: 0.0.0.0
|
||||
|
||||
common:
|
||||
instance_addr: 0.0.0.0 # ← CRITIQUE : Adresse sur toutes les interfaces
|
||||
path_prefix: /loki
|
||||
storage:
|
||||
filesystem:
|
||||
chunks_directory: /loki/chunks
|
||||
rules_directory: /loki/rules
|
||||
replication_factor: 1
|
||||
ring:
|
||||
kvstore:
|
||||
store: inmemory
|
||||
|
||||
schema_config:
|
||||
configs:
|
||||
- from: 2020-10-24
|
||||
store: tsdb
|
||||
object_store: filesystem
|
||||
schema: v13
|
||||
index:
|
||||
prefix: index_
|
||||
period: 24h
|
||||
|
||||
ruler:
|
||||
alertmanager_url: http://localhost:9093
|
||||
|
||||
# Configuration de l'ingester - SEULEMENT le paramètre crucial
|
||||
ingester:
|
||||
lifecycler:
|
||||
min_ready_duration: 5s # Réduit le délai de 15s à 5s
|
||||
|
||||
# Configuration des limites
|
||||
limits_config:
|
||||
reject_old_samples: true
|
||||
reject_old_samples_max_age: 168h
|
||||
max_cache_freshness_per_query: 10m
|
||||
split_queries_by_interval: 15m
|
||||
max_query_parallelism: 32
|
||||
max_streams_per_user: 0
|
||||
max_line_size: 256000
|
||||
ingestion_rate_mb: 16
|
||||
ingestion_burst_size_mb: 32
|
||||
per_stream_rate_limit: 3MB
|
||||
per_stream_rate_limit_burst: 15MB
|
||||
max_entries_limit_per_query: 5000
|
||||
max_query_series: 500
|
||||
max_query_length: 721h
|
||||
cardinality_limit: 100000
|
||||
max_streams_matchers_per_query: 1000
|
||||
max_concurrent_tail_requests: 10
|
||||
|
||||
# Configuration du storage
|
||||
storage_config:
|
||||
tsdb_shipper:
|
||||
active_index_directory: /loki/tsdb-index
|
||||
cache_location: /loki/tsdb-cache
|
||||
filesystem:
|
||||
directory: /loki/chunks
|
||||
|
||||
# Configuration du compactor
|
||||
compactor:
|
||||
working_directory: /loki/compactor
|
||||
compaction_interval: 10m
|
||||
retention_enabled: false
|
||||
delete_request_store: filesystem
|
||||
|
||||
# Analytics désactivés
|
||||
analytics:
|
||||
reporting_enabled: false
|
||||
```
|
||||
|
||||
## 🐳 Configuration Docker Compose
|
||||
|
||||
### **Service Loki**
|
||||
```yaml
|
||||
loki:
|
||||
image: grafana/loki:latest
|
||||
container_name: loki
|
||||
ports:
|
||||
- "0.0.0.0:3100:3100"
|
||||
volumes:
|
||||
- loki_data:/loki
|
||||
- ./conf/loki/loki-config.yaml:/etc/loki/loki-config.yaml:ro
|
||||
command: -config.file=/etc/loki/loki-config.yaml
|
||||
networks:
|
||||
btcnet:
|
||||
aliases:
|
||||
- loki
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:3100/ready"]
|
||||
interval: 30s
|
||||
timeout: 15s
|
||||
retries: 3
|
||||
start_period: 120s
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
## 🔧 Points Critiques
|
||||
|
||||
### **1. Configuration Réseau (OBLIGATOIRE)**
|
||||
```yaml
|
||||
# ❌ INCORRECT - Cause du problème "unhealthy"
|
||||
server:
|
||||
http_listen_port: 3100
|
||||
common:
|
||||
instance_addr: 127.0.0.1
|
||||
|
||||
# ✅ CORRECT - Résout le problème
|
||||
server:
|
||||
http_listen_address: 0.0.0.0
|
||||
grpc_listen_address: 0.0.0.0
|
||||
common:
|
||||
instance_addr: 0.0.0.0
|
||||
```
|
||||
|
||||
### **2. Healthcheck Docker**
|
||||
```yaml
|
||||
# ✅ Configuration recommandée
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://localhost:3100/ready"]
|
||||
interval: 30s
|
||||
timeout: 15s
|
||||
retries: 3
|
||||
start_period: 120s
|
||||
```
|
||||
|
||||
### **3. Configuration Ingester**
|
||||
```yaml
|
||||
# ✅ Réduit le délai de démarrage
|
||||
ingester:
|
||||
lifecycler:
|
||||
min_ready_duration: 5s
|
||||
```
|
||||
|
||||
## 🧪 Tests de Validation
|
||||
|
||||
### **Test 1: Vérification de l'Écoute Réseau**
|
||||
```bash
|
||||
docker exec loki netstat -tlnp | grep 3100
|
||||
# Résultat attendu : tcp 0 0 :::3100 :::* LISTEN
|
||||
```
|
||||
|
||||
### **Test 2: Test de Connectivité Externe**
|
||||
```bash
|
||||
curl -s -o /dev/null -w "%{http_code}" http://localhost:3100/ready
|
||||
# Résultat attendu : 200
|
||||
```
|
||||
|
||||
### **Test 3: Test de Connectivité Interne**
|
||||
```bash
|
||||
docker exec loki wget -q --spider http://localhost:3100/ready && echo "SUCCESS"
|
||||
# Résultat attendu : SUCCESS
|
||||
```
|
||||
|
||||
### **Test 4: Statut Docker**
|
||||
```bash
|
||||
docker compose --env-file .env.master ps loki
|
||||
# Résultat attendu : Up X seconds (healthy)
|
||||
```
|
||||
|
||||
## 🚨 Erreurs Communes
|
||||
|
||||
### **Erreur 1: "Loki unhealthy"**
|
||||
**Cause** : Configuration réseau incorrecte
|
||||
**Solution** : Configurer `http_listen_address: 0.0.0.0`
|
||||
|
||||
### **Erreur 2: "Healthcheck failed"**
|
||||
**Cause** : Utilisation de `curl` au lieu de `wget`
|
||||
**Solution** : Utiliser `wget` dans le healthcheck
|
||||
|
||||
### **Erreur 3: "Config validation failed"**
|
||||
**Cause** : Configuration compactor incorrecte
|
||||
**Solution** : Désactiver `retention_enabled` ou configurer `delete_request_store`
|
||||
|
||||
### **Erreur 4: "Connection refused"**
|
||||
**Cause** : Loki écoute sur 127.0.0.1 uniquement
|
||||
**Solution** : Configurer `instance_addr: 0.0.0.0`
|
||||
|
||||
## 📊 Monitoring
|
||||
|
||||
### **Logs à Surveiller**
|
||||
```bash
|
||||
# Logs de démarrage
|
||||
docker logs loki --tail 20
|
||||
|
||||
# Logs d'erreur
|
||||
docker logs loki | grep -i error
|
||||
|
||||
# Logs de santé
|
||||
docker logs loki | grep -i ready
|
||||
```
|
||||
|
||||
### **Métriques à Surveiller**
|
||||
- **Temps de démarrage** : < 2 minutes
|
||||
- **Statut healthcheck** : "healthy"
|
||||
- **Endpoint /ready** : HTTP 200
|
||||
- **Utilisation mémoire** : < 1GB
|
||||
|
||||
## 🎯 Bonnes Pratiques
|
||||
|
||||
### **1. Configuration**
|
||||
- Toujours utiliser la configuration complète
|
||||
- Vérifier les paramètres réseau
|
||||
- Tester la configuration avant déploiement
|
||||
|
||||
### **2. Déploiement**
|
||||
- Utiliser le script `start-monitoring.sh`
|
||||
- Surveiller les logs pendant le démarrage
|
||||
- Valider le statut healthcheck
|
||||
|
||||
### **3. Maintenance**
|
||||
- Surveiller les logs régulièrement
|
||||
- Vérifier l'espace disque
|
||||
- Maintenir la configuration à jour
|
||||
|
||||
---
|
||||
|
||||
**Document créé le 2025-09-21**
|
||||
**Version** : 1.0
|
||||
**Statut** : ✅ Configuration validée et fonctionnelle
|
Loading…
x
Reference in New Issue
Block a user