Failover
pgagroal can failover a PostgreSQL instance if clients can't write to it.
Configuration
In pgagroal.conf define:
failover = on
failover_script = /path/to/myscript.shThe script will be run as the same user as the pgagroal process so proper permissions (access and execution) must be in place.
Failover Script
The following information is passed to the script as parameters:
- Old primary host
- Old primary port
- New primary host
- New primary port
Example Script
A basic failover script could look like:
sh
#!/bin/bash
OLD_PRIMARY_HOST=$1
OLD_PRIMARY_PORT=$2
NEW_PRIMARY_HOST=$3
NEW_PRIMARY_PORT=$4
# Promote the new primary
ssh -tt -o StrictHostKeyChecking=no postgres@${NEW_PRIMARY_HOST} pg_ctl promote -D /mnt/pgdata
if [ $? -ne 0 ]; then
exit 1
fi
exit 0Script Requirements
- The script is assumed successful if it has an exit code of 0
- Otherwise both servers will be recorded as failed
- The script should handle promotion of the new primary server
- Consider implementing proper error handling and logging
Advanced Failover Scenarios
Multiple Replica Configuration
When multiple replicas are available, the failover script can implement logic to:
- Check replica lag to select the best candidate
- Ensure proper promotion sequence
- Update DNS or load balancer configuration
- Notify monitoring systems
Automatic Failback
Consider implementing automatic failback when the original primary becomes available:
sh
#!/bin/bash
# Check if original primary is healthy
if pg_isready -h $OLD_PRIMARY_HOST -p $OLD_PRIMARY_PORT; then
# Implement failback logic
echo "Original primary is healthy, considering failback"
fiMonitoring Failover
Monitor failover events through:
- Log files: Check pgagroal logs for failover events
- Prometheus metrics: Monitor server status changes
- External monitoring: Implement alerts for failover events
Best Practices
- Test failover scripts regularly in non-production environments
- Monitor replica lag to ensure replicas are suitable for promotion
- Implement proper logging in failover scripts for troubleshooting
- Consider network partitions and split-brain scenarios
- Document failover procedures for operational teams
- Use configuration management to ensure consistent failover scripts across environments