179 lines
5.5 KiB
Markdown
179 lines
5.5 KiB
Markdown
# SSH Connection Errors During Deployment
|
|
|
|
**Date**: 2024-12-19
|
|
**Auteur**: Équipe 4NK
|
|
|
|
## Problem Description
|
|
|
|
During deployment, SSH connection errors occurred when verifying the Git repository on the server. The errors were:
|
|
|
|
```
|
|
mux_client_request_session: read from master failed: Connection reset by peer
|
|
Failed to connect to new control master
|
|
mm_send_fd: sendmsg(2): Broken pipe
|
|
mux_client_request_session: send fds failed
|
|
```
|
|
|
|
These errors appeared at step 5 of the deployment script when checking if Git is initialized on the server.
|
|
|
|
## Impact
|
|
|
|
- **Severity**: Medium
|
|
- **Scope**: Deployment script reliability
|
|
- **User Impact**: Deployment could fail or continue with errors, potentially leaving the server in an inconsistent state
|
|
- **Frequency**: Intermittent, occurring when SSH ControlMaster connection is interrupted
|
|
|
|
## Root Cause
|
|
|
|
The SSH ControlMaster multiplexing connection was being closed prematurely or becoming stale, causing subsequent SSH commands to fail. The original `ssh_exec` function did not handle connection failures robustly:
|
|
|
|
1. **No connection validation**: The function did not check if the ControlMaster socket was still valid before use
|
|
2. **No retry mechanism**: Failed connections were not retried after cleanup
|
|
3. **No dead connection cleanup**: Stale connections were not detected and cleaned up before reuse
|
|
4. **Silent failures**: Connection errors in conditional checks could be misinterpreted as command failures
|
|
|
|
## Root Cause Analysis
|
|
|
|
The SSH ControlMaster feature creates a persistent connection to avoid multiple SSH handshakes. However:
|
|
|
|
- Network interruptions can close the master connection
|
|
- The ControlMaster socket file can become stale if the connection dies
|
|
- The script did not detect or handle these cases, leading to cascading failures
|
|
|
|
## Corrections Applied
|
|
|
|
### 1. Enhanced SSH Connection Management
|
|
|
|
**File**: `deploy.sh`
|
|
|
|
**Changes**:
|
|
- Added `cleanup_dead_ssh()` function to properly clean up dead SSH connections
|
|
- Added `check_ssh_connection()` function to validate ControlMaster connection before use
|
|
- Enhanced `ssh_exec()` function with:
|
|
- Connection validation before each command
|
|
- Automatic cleanup of dead connections
|
|
- Retry mechanism (up to 3 attempts)
|
|
- Additional SSH options for better connection stability:
|
|
- `ConnectTimeout=10`: Fail fast if connection cannot be established
|
|
- `ServerAliveInterval=60`: Keep connection alive
|
|
- `ServerAliveCountMax=3`: Detect dead connections quickly
|
|
|
|
### 2. Improved Error Handling at Step 5
|
|
|
|
**File**: `deploy.sh`
|
|
|
|
**Changes**:
|
|
- Enhanced Git repository verification to properly handle SSH connection errors
|
|
- Added explicit error detection and recovery mechanism
|
|
- Added automatic retry after connection cleanup
|
|
- Better error messages to distinguish between connection errors and Git initialization needs
|
|
|
|
## Modifications
|
|
|
|
### Files Modified
|
|
|
|
- `deploy.sh`:
|
|
- Enhanced `ssh_exec()` function with retry logic and connection validation
|
|
- Added `cleanup_dead_ssh()` and `check_ssh_connection()` helper functions
|
|
- Improved error handling in Git repository verification step
|
|
|
|
### Code Changes
|
|
|
|
**Before**:
|
|
```bash
|
|
ssh_exec() {
|
|
ssh -o ControlMaster=auto -o ControlPath="${SSH_CONTROL_PATH}" -o ControlPersist=300 ${SERVER} "$@"
|
|
}
|
|
```
|
|
|
|
**After**:
|
|
```bash
|
|
ssh_exec() {
|
|
# Validates connection, cleans up dead connections, retries on failure
|
|
# Includes connection stability options
|
|
}
|
|
```
|
|
|
|
## Deployment Procedures
|
|
|
|
### Automatic Deployment
|
|
|
|
The fix is automatically applied when using the deployment script:
|
|
|
|
```bash
|
|
./deploy.sh "commit message"
|
|
```
|
|
|
|
No manual intervention required. The script now handles SSH connection errors automatically.
|
|
|
|
### Verification
|
|
|
|
After deployment, verify that SSH connections are stable:
|
|
|
|
1. Check that deployment completes without SSH errors
|
|
2. Monitor for connection errors in subsequent deployments
|
|
3. Verify that retry mechanism works correctly
|
|
|
|
## Analysis Procedures
|
|
|
|
### Monitoring SSH Connection Issues
|
|
|
|
1. **Check deployment logs** for SSH connection errors:
|
|
```bash
|
|
# Review recent deployment output
|
|
```
|
|
|
|
2. **Verify SSH ControlMaster socket**:
|
|
```bash
|
|
# On the deployment machine
|
|
ls -la /tmp/ssh_control_*/
|
|
```
|
|
|
|
3. **Test SSH connection manually**:
|
|
```bash
|
|
ssh -O check -o ControlPath="/tmp/ssh_control_*/debian_92.243.27.35_22" debian@92.243.27.35
|
|
```
|
|
|
|
### Debugging Steps
|
|
|
|
If SSH connection errors persist:
|
|
|
|
1. Check network connectivity to the server
|
|
2. Verify SSH server configuration allows ControlMaster
|
|
3. Check for firewall or network issues
|
|
4. Review SSH server logs on the remote machine
|
|
5. Verify SSH key authentication is working
|
|
|
|
### Logs to Review
|
|
|
|
- Deployment script output (stdout/stderr)
|
|
- SSH client logs (if verbose mode enabled)
|
|
- Remote SSH server logs: `/var/log/auth.log` or similar
|
|
|
|
## Prevention
|
|
|
|
### Best Practices
|
|
|
|
1. **Connection validation**: Always validate SSH connections before use
|
|
2. **Retry logic**: Implement retry mechanisms for network operations
|
|
3. **Cleanup**: Properly clean up stale connections
|
|
4. **Error handling**: Distinguish between different types of failures
|
|
5. **Monitoring**: Monitor connection stability over time
|
|
|
|
### Future Improvements
|
|
|
|
- Add connection health metrics
|
|
- Implement exponential backoff for retries
|
|
- Add connection pooling if needed
|
|
- Consider alternative connection methods if ControlMaster proves unreliable
|
|
|
|
## Related Issues
|
|
|
|
None identified at this time.
|
|
|
|
## References
|
|
|
|
- SSH ControlMaster documentation
|
|
- Deployment script: `deploy.sh`
|
|
- Related documentation: `docs/deployment.md`
|