**Motivations:**
- SSH multiplexing (ControlMaster) was causing connection reset errors
- Complex socket management was creating more problems than it solved
- Script makes sequential SSH calls, so multiplexing not necessary
- Simpler is better - remove unnecessary complexity
**Root causes:**
- ControlMaster multiplexing was added to avoid MaxStartups errors
- But it caused connection reset, socket invalidation, and cleanup issues
- Root cause: unnecessary complexity that created more problems than it solved
- Sequential SSH calls don't need multiplexing anyway
**Correctifs:**
- Removed all ControlMaster/ControlPath configuration
- Removed cleanup_dead_ssh() function
- Removed cleanup_ssh() trap
- Removed SSH_CONTROL_DIR and SSH_CONTROL_PATH variables
- Simplified ssh_exec() to basic SSH call without multiplexing
- Kept only ConnectTimeout for basic connection management
**Evolutions:**
- Much simpler code without socket management complexity
- No more connection reset errors from multiplexing
- No more socket cleanup issues
- Each SSH call is independent and reliable
**Pages affectées:**
- deploy.sh: Removed all ControlMaster multiplexing code, simplified to basic SSH
**Motivations:**
- check_ssh_connection() was causing connection reset errors
- Trying to use a potentially dead ControlMaster connection for checking interferes with multiplexing
- SSH ControlMaster=auto can handle invalid sockets automatically
**Root causes:**
- check_ssh_connection() uses ssh with ControlPath to test connection
- If connection is dead, this causes 'Connection reset by peer' errors
- This check interferes with ControlMaster multiplexing
- Root cause: unnecessary pre-check that causes more problems than it solves
**Correctifs:**
- Removed check_ssh_connection() function completely
- Removed pre-execution connection validation from ssh_exec()
- Let SSH ControlMaster=auto handle invalid sockets automatically
- SSH will detect invalid socket and create new connection automatically
- Removed unnecessary keepalive options (not needed if no firewall/NAT issues)
- Simplified to basic ControlMaster configuration
**Evolutions:**
- Simpler code without interference from connection checks
- SSH handles connection management automatically
- No more connection reset errors from check_ssh_connection()
**Pages affectées:**
- deploy.sh: Removed check_ssh_connection() and simplified ssh_exec()
**Motivations:**
- SSH connections were being dropped by firewalls/NAT due to insufficient keepalives
- Connection reset by peer errors were occurring because connections were idle too long
- Need to prevent connection drops at the source, not just handle errors after
**Root causes:**
- ServerAliveInterval=60 was too long - many firewalls/NAT close idle connections after 30-45 seconds
- No TCPKeepAlive enabled - missing system-level keepalives
- ControlPersist=300 kept dead connections too long
- ServerAliveCountMax=3 was too lenient for detecting dead connections
- Root cause: connections were being closed by network equipment due to inactivity
**Correctifs:**
- Reduced ServerAliveInterval from 60 to 15 seconds to send keepalives more frequently
- Added TCPKeepAlive=yes to use system-level TCP keepalives
- Reduced ServerAliveCountMax from 3 to 2 to detect dead connections faster
- Reduced ControlPersist from 300 to 60 seconds to avoid keeping dead sockets
- Added Compression=no to reduce overhead
- Removed error detection/retry logic - fixing root cause instead of handling symptoms
- Updated check_ssh_connection() to use same keepalive options for consistency
**Evolutions:**
- Connections now stay alive through firewalls/NAT with frequent keepalives
- Dead connections detected and cleaned up faster
- No more connection reset errors due to proper keepalive configuration
**Pages affectées:**
- deploy.sh: Fixed SSH keepalive configuration to prevent connection drops
**Motivations:**
- ControlSocket errors still occurring during command execution
- Socket can become invalid DURING execution of long commands
- Need to detect socket errors in command output and clean up immediately
**Root causes:**
- Socket validation before command execution cannot detect if socket dies during execution
- Long commands (npm install, build) can cause connection to die mid-execution
- Next command finds invalid socket and SSH disables multiplexing but leaves socket
- Previous cleanup only happened before execution, not after detecting errors
**Correctifs:**
- Capture SSH command output to detect 'ControlSocket already exists' errors
- If socket error detected, immediately cleanup and retry once
- This is a specific retry for socket errors only, not a general retry mechanism
- Ensures dead sockets are cleaned up even if they die during command execution
**Evolutions:**
- Better handling of socket invalidation during long-running commands
- Automatic recovery from socket errors detected during execution
**Pages affectées:**
- deploy.sh: Enhanced ssh_exec() to detect and handle socket errors in output
**Motivations:**
- SSH errors 'ControlSocket already exists, disabling multiplexing' were occurring
- Socket could become invalid between check and execution
- ControlMaster=auto detects invalid socket but doesn't remove it, just disables multiplexing
- Dead socket remains and causes issues for subsequent commands
**Root causes:**
- ControlMaster socket can exist but connection can be dead
- Socket can become invalid between check_ssh_connection() and ssh_exec() execution
- ControlMaster=auto detects invalid socket but doesn't remove it, leaving dead socket
- Dead socket causes 'ControlSocket already exists' errors on subsequent commands
- Previous cleanup was not aggressive enough to remove dead sockets
**Correctifs:**
- Improved check_ssh_connection() to test actual connection with 'true' command instead of just 'ssh -O check'
- Enhanced cleanup_dead_ssh() to remove entire directory instead of just socket file
- This ensures socket is truly removed even if process is still holding it
- Removed complex multi-step cleanup, using simple directory removal and recreation
- Socket validation now tests actual connection usability, not just socket existence
**Evolutions:**
- More robust socket cleanup that guarantees removal of dead sockets
- Better detection of invalid sockets before command execution
**Pages affectées:**
- deploy.sh: Improved cleanup_dead_ssh() and check_ssh_connection() functions
**Motivations:**
- Retries are useless, connection must work on first attempt or fail
- Simplifying code by removing unnecessary retry logic
- Reducing complexity and potential for multiple connection attempts
**Root causes:**
- Retry mechanism was adding complexity without real benefit
- If SSH connection fails, retrying won't help, it should fail immediately
**Correctifs:**
- Removed all retry logic from ssh_exec() function
- Simplified to single attempt execution
- Kept socket validation and cleanup before execution
- Removed retry loop and counter variables
**Evolutions:**
- Simpler and more straightforward SSH connection handling
- Faster failure detection when connection issues occur
**Pages affectées:**
- deploy.sh: Simplified ssh_exec() function, removed retry mechanism
**Motivations:**
- Too many SSH connection attempts were being made during deployment
- ControlMaster socket cleanup was not aggressive enough
- Multiple SSH calls at step 5 created excessive connection attempts
**Root causes:**
- ControlMaster socket could remain but be invalid, causing SSH to disable multiplexing
- Each ssh_exec had up to 3 retries, and step 5 made 4-5 ssh_exec calls
- Socket cleanup was not forceful enough to remove invalid sockets
- Complex retry logic at step 5 created unnecessary SSH calls
**Correctifs:**
- Improved cleanup_dead_ssh() to forcefully remove socket and wait for proper closure
- Reduced max retries in ssh_exec from 3 to 2
- Simplified step 5 logic to reduce SSH calls from 4-5 to 2 (one check, one init if needed)
- Combined Git initialization commands into single SSH call to reduce connections
- Added sleep after socket cleanup to ensure proper closure
**Evolutions:**
- More efficient SSH connection management
- Reduced deployment time by minimizing connection attempts
**Pages affectées:**
- deploy.sh: Improved cleanup_dead_ssh(), reduced retries, simplified step 5 logic
**Motivations:**
- Deployment script was blocking at step 5 (Git repository verification)
- Command substitution with SSH could hang indefinitely
- Need to avoid blocking on SSH connection failures
**Root causes:**
- Command substitution \ with ssh_exec could block if SSH connection hangs
- Complex logic with grep on command output was fragile
- No proper timeout handling in the verification step
**Correctifs:**
- Simplified Git repository verification logic
- Removed command substitution that could block
- Use direct exit code checking instead of parsing output
- Improved error handling with explicit SSH exit code checking
- Added cleanup and retry mechanism for connection failures
**Evolutions:**
- More robust Git repository verification that doesn't block
**Pages affectées:**
- deploy.sh: Simplified Git repository verification step
**Motivations:**
- SSH ControlMaster connection errors were causing deployment failures
- Connection reset errors were not handled properly
- No retry mechanism for failed SSH connections
**Root causes:**
- SSH ControlMaster socket could become stale or be closed prematurely
- No validation of connection before use
- No cleanup of dead connections
- Silent failures in conditional checks
**Correctifs:**
- Added connection validation before each SSH command
- Implemented automatic cleanup of dead SSH connections
- Added retry mechanism (up to 3 attempts) with connection cleanup
- Enhanced SSH options for better connection stability (ConnectTimeout, ServerAliveInterval, ServerAliveCountMax)
- Improved error handling in Git repository verification step with explicit error detection and recovery
**Evolutions:**
- Enhanced SSH connection management with robust error handling
- Better error messages to distinguish connection errors from other failures
**Pages affectées:**
- deploy.sh: Enhanced ssh_exec() function, added helper functions, improved error handling
- fixKnowledge/ssh-connection-errors-deployment.md: Documentation of the problem, root cause, and solution