Step-by-Step Guide: Setting up a High Availability PostgreSQL Cluster with Load Balancing and Automatic Fail-over on AWS EC2 using Pgpool-II and Watchdog
Created By | Daniyal Khawaja |
---|---|
Type | Walk-Through |
Created |
Welcome to this step-by-step guide on setting up a high-availability Postgres cluster with load balancing and automatic failover on AWS EC2 using Pgpool and Watchdog. This guide walks you through the process of creating a highly available database cluster on AWS that provides improved reliability, automatic failover, and load balancing for improved performance.
Before we begin, note that this guide assumes that you have some basic knowledge of AWS, Postgres, Pgpool, and Watchdog. If you are new to any of these technologies, it is recommended that you review the documentation and tutorials provided by their respective vendors before proceeding.
With that said, let's get started!
Step 1: Setting up the Ec2 instances
- Log in to the AWS Management Console and navigate to the EC2 dashboard.
- Click the "Launch Instance" button to begin creating a new EC2 instance.
- Choose the Ubuntu Server AMI (Amazon Machine Image) from the list of available images.
- Select the instance type for your EC2 instances. For this setup, it's recommended to use at least medium instances with 2 vCPUs, 4 GB RAM, and 10 GB storage.
- Configure the network settings. Make sure to select the same sub-net for all three instances to ensure that they are on the same network.
- Review the instance details, and then click "Launch".
- Create a new key pair or select an existing one. This will allow you to access your instances using SSH.
- Launch your instances and wait for them to become available.
- Once your instances are up and running, use SSH to connect to each instance.
Step 2: Installing PostgreSQL and Pgpool-II.
Once you're logged in to each instance, run the following command to update and upgrade the system packages:
sudo apt-get update && sudo apt-get upgrade
Next, you'll need to download and install Postgres. Start by adding the Postgres repository to the sources list:
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
Import the Postgres signing key to trust the packages from the repository:
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
Update the package list again to include the Postgres packages:
sudo apt-get update
Finally, install Postgres by running the following command:
sudo apt-get -y install postgresql
Pgpool-II
Download Pgpool version 4.4.2 by running the following command
wget https://www.pgpool.net/mediawiki/download.php?f=pgpool-II-4.4.2.tar.gz
Install the required packages for building Pgpool by running the following command:
sudo apt-get install libpq-dev bison build-essential
Once the packages are installed, navigate to the directory where you downloaded Pgpool and extract the files by running:
tar -zxvf pgpool-II-4.4.2.tar.gz
Change to the extracted directory by running:
cd pgpool-II-4.4.2
Run the configure
script to configure the build process:
./configure
After the configuration is complete, run the make
command to build Pgpool:
make
Once the build is complete, install Pgpool by running:
sudo make install
PostgreSQL setup
To set up PostgreSQL on the main server , follow these steps:
Initialize the PostgreSQL data directory by running the following command:
sudo mkdir /var/db && sudo chmod a+rw /var/db && mkdir \
/var/db/postgres && \
/usr/lib/postgresql/15/bin/initdb -E UTF-8 /var/db/postgres/data15
Open the postgresql.conf
file for editing by running the following command:
cd /var/db/postgres/data15
nano postgresql.conf
In the postgresql.conf
file, make the following changes:
listen_addresses = '*'
port = 5432
shared_buffers = 1GB
work_mem = 2621kB
maintenance_work_mem = 256MB
dynamic_shared_memory_type = posix
effective_io_concurrency = 200
# WAL settings
wal_level = replica
wal_compression = on
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 3GB
min_wal_size = 1GB
# Archive settings
archive_mode = on
archive_command = 'cp "%p" "/usr/lib/postgresql/archivedir/%f"'
restore_command = 'scp ubuntu@<ip-address-of-main-server>:/usr/lib/postgresql/archivedir/%f "%p"'
max_wal_senders = 10
max_replication_slots = 10
wal_keep_size = '1GB'
synchronous_standby_names = '*'
wal_receiver_create_temp_slot = on
wal_receiver_status_interval = 5s
hot_standby_feedback = on
random_page_cost = 1.1
effective_cache_size = 3GB
default_statistics_target = 100
These changes include enabling the PostgreSQL server to listen on all network interfaces by setting listen_addresses
to '*'
and setting the port
to 5432
.
The shared_buffers
parameter has been set to 1GB, which controls the amount of memory that PostgreSQL uses for caching data. The work_mem
parameter has been set to 2621kB, which controls the amount of memory used for sorting and other operations. The maintenance_work_mem
parameter has been set to 256MB, which controls the amount of memory used for maintenance operations such as VACUUM and REINDEX.
The wal_level
has been set to replica
to enable replication. The archive_mode
parameter has been set to on
to enable archiving of WAL files. The archive_command
and restore_command
parameters specify how the archived WAL files are copied to and restored from the archive directory.
The max_wal_senders
, max_replication_slots
, wal_keep_size
, synchronous_standby_names
, wal_receiver_create_temp_slot
, and wal_receiver_status_interval
parameters are all related to synchronous replication and are set according to the specific needs of the cluster.
Finally, various other parameters such as random_page_cost
, effective_cache_size
, and default_statistics_target
have been set to improve performance.
To start the primary server, you need to first stop the PostgreSQL service if it is already running. This can be done with the command:
sudo systemctl stop postgresql.service
Next, you need to make sure that the /var/run/postgresql/
directory has the necessary permissions. You can do this with the command:
sudo chmod a+rw /var/run/postgresql/
After that, create a directory for storing the archived WAL files and give it the necessary permissions. You can use the following commands for this:
sudo mkdir /usr/lib/postgresql/archivedir/
sudo chmod a+rw /usr/lib/postgresql/archivedir
To start the primary server, run the following command:
/usr/lib/postgresql/15/bin/pg_ctl -D data15/ start -l logfile
Once the server is running, you can connect to it using the psql
command-line tool. Use the following command to create a new user with replication privileges:
psql -h localhost -U postgres
####after entering psql, run these commands
CREATE USER <username> WITH PASSWORD '<password>';
ALTER USER <username> WITH REPLICATION;
ALTER USER <username> WITH SUPERUSER;
To confirm that the user was created successfully, you can run the following command:
SELECT usename, usecreatedb, usesuper, userepl, passwd IS NOT NULL AS haspassword FROM pg_user;
Next, you need to create replication slots for the replica databases. To create a physical replication slot, run the following command:
SELECT pg_create_physical_replication_slot('<slot_name>');#(per replica)
To view the replication slots, you can run the following command:
SELECT slot_name, slot_type, active, wal_status FROM pg_replication_slots;
By following these steps, you should now have a primary PostgreSQL server running with the necessary permissions, a user with replication privileges, and replication slots created for the replica databases.
After creating the replication slots for replica databases, the next step is to add the replication servers to the main server's pg_hba.conf
file.
Suppose the replication servers have IP addresses ip1
and ip2
. To add them to the pg_hba.conf
file, run the following commands:
sudo nano /var/db/postgres/data15/pg_hba.conf
Then, add the following lines to the file:
host replication username ip1/32 trust
host postgres ubuntu ip1/32 trust
host postgres username ip1/32 trust
host replication username ip2/32 trust
host postgres ubuntu ip2/32 trust
host postgres username ip2/32 trust
host replication username <main-server-ip>/32 trust
host postgres ubuntu <main-server-ip>/32 trust
host postgres username <main-server-ip>/32 trust
Save and close the file. After that, restart the PostgreSQL server using the following command:
sudo /usr/lib/postgresql/15/bin/pg_ctl -D /var/db/postgres/data15/ restart -l logfile
PostgreSQL setup - On standby servers
Run the pg_basebackup
command to create a base backup of the primary server on the standby server:
sudo -u postgres /usr/lib/postgresql/15/bin/pg_basebackup \
--pgdata=/var/db/postgres/data15 --format=p --write-recovery-conf \
--checkpoint=fast --label=mffb --progress --host=<primary-ip> \
--port=5432 --username=<username>
Edit the postgresql.conf
file to add the following parameters:
primary_conninfo = 'user=<username> port=5432 host=172.31.15.104 application_name=172.31.6.95_username'
primary_slot_name = '<slot_name>'
hot_standby = on
Now execute these commands.
sudo skill postgres
sudo chmod a+rw /var/run/postgresql/
sudo mkdir /usr/lib/postgresql/archivedir/
sudo chmod a+rw /usr/lib/postgresql/archivedir
Now start the standby servers using the following command:
/usr/lib/postgresql/15/bin/pg_ctl -D data15/ start -l logfile
Running the commandpsql -h <standby-ip> -U <username>
on the standby server and then executingSELECT * FROM pg_stat_wal_receiver;
will display the status of the WAL receiver process. This will confirm whether the standby server is successfully receiving the WAL data from the primary server and is ready to serve as a standby server.
Configuring Pgpool-II with Watchdog and Load Balancing on the Main Server.
After setting up streaming replication, the next step is to configure pgpool and enable high availability and load balancing. The following steps should be taken on the main server:
Change into the “/usr/local/etc” directory
Move the sample files by running the following commands:
sudo mv pgpool.conf.sample pgpool.conf
sudo mv escalation.sh.sample escalation.sh
sudo mv follow_primary.sh.sample follow_primary.sh
sudo mv failover.sh.sample failover.sh
sudo chmod +x escalation.sh follow_primary.sh failover.sh
sudo chmod a+rw follow_primary.sh escalation.sh failover.sh
Open pgpool.conf with sudo nano pgpool.conf
and make the following changes:
backend_clustering_mode = 'streaming_replication'
listen_addresses = ''
port = 9999
pcp_listen_addresses = ''
pcp_port = 9898
backend_hostname0 = '<ip-of-primary-server>' (primary ip)
backend_port0 = 5432
backend_weight0 = 0
backend_data_directory0 = '/var/db/postgres/data15'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_application_name0 = 'primary_server'
backend_hostname1 = '<ip-of-standby-server-1>'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/db/postgres/data15'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'replica_server1'
backend_hostname2 = '<ip-of-standby-server-2>'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/db/postgres/data15'
backend_flag2 = 'ALLOW_TO_FAILOVER'
backend_application_name2 = 'replica_server2'
pool_passwd = 'pool_passwd'
max_pool = 10
log_standby_delay = 'always'
logging_collector = on
log_directory = '/home/ubuntu/pgpool_logs'
log_filename = 'pgpool-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
load_balance_mode = on
ignore_leading_white_space = on
read_only_function_list = 'get_.,select_.'
write_function_list = 'nextval,setval,set_.,update_.,delete_.,insert_.'
database_redirect_preference_list = 'postgres:0(0.5),postgres:1(0.5),postgres:2(0.5)'
sr_check_period = 10
sr_check_user = 'username'
sr_check_password = ''
sr_check_database = 'postgres'
delay_threshold = 20
prefer_lower_delay_standby = on
follow_primary_command = '/usr/local/etc/follow_primary.sh %d %h %p %D %m %H %M %P %r %R'
#Configure the health check settings:
health_check_period = 5
health_check_timeout = 20
health_check_user = 'username'
health_check_password = 'password'
health_check_database = 'postgres'
health_check_max_retries = 5
health_check_retry_delay = 5
failover_command = '/usr/local/etc/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
failover_on_backend_shutdown = on
detach_false_primary = on
search_primary_node_timeout = 10
recovery_user = 'username'
auto_failback = on
auto_failback_interval = 1min
# Watchdog settings
use_watchdog = on
trusted_servers = 'server1,server2,server3'
trusted_server_command = '/bin/ping -q -c3 %h'
hostname0 = 'server1'
wd_port0 = 9000
pgpool_port0 = 9999
hostname1 = 'server2'
wd_port1 = 9000
pgpool_port1 = 9999
hostname2 = 'server3'
wd_port2 = 9000
pgpool_port2 = 9999
wd_priority = 1
delegate_ip = '<leave blank for a sec>'
if_cmd_path = '/sbin'
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
arping_path = '/usr/sbin'
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
ping_path = '/bin'
wd_escalation_command = '/usr/local/etc/escalation.sh'
failover_when_quorum_exists = on
failover_require_consensus = on
wd_remove_shutdown_nodes = on
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
heartbeat_hostname0 = 'server1'
heartbeat_port0 = 9694
heartbeat_device0 = ''
heartbeat_hostname1 = 'server2'
heartbeat_port1 = 9694
heartbeat_device1 = ''
heartbeat_hostname2 = 'server3'
heartbeat_port2 = 9694
heartbeat_device2 = ''
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
To configure Watchdog for Virtual IP, first allocate an Elastic IP to the primary server through AWS. Then, use that IP in the delegate IP configuration .
delegate_ip = 'elastic IP'
now, change the following in follow_primary.sh:
PGHOME=/lib/postgresql/15
ARCHIVEDIR=/usr/lib/postgresql/archivedir
REPLUSER=daniyal
PCP_USER=daniyal
PGPOOL_PATH=/usr/local/bin/
POSTGRESQL_STARTUP_USER=ubuntu
SSH_KEY_FILE=id_rsa
and change the following in failover.sh:
PGHOME=/lib/postgresql/15
POSTGRESQL_STARTUP_USER=ubuntu
SSH_KEY_FILE=id_rsa
change the following in escalation.sh
POSTGRESQL_STARTUP_USER=ubuntu
SSH_KEY_FILE=id_rsa
SSH_OPTIONS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ~/.ssh/${SSH_KEY_FILE}"
PGPOOLS=(server1 server2 server3)
VIP=<delegate-ip>
DEVICE=eth0
Start pgpool on the primary node first.
Pgpool -D
Your output should be in the log file in the following format:
After completing the installation and configuration of the PostgreSQL cluster with pgpool and watchdog on AWS, you can check the replication statistics of the primary server using the following command:
psql -h <primary-ip> -p 9999 -U <username> -c "SELECT * FROM pg_stat_replication;"
You can also check the replication slots of the primary using the following command:
psql -h <primary-ip> -p 9999 -U <username> -c "SELECT * FROM pg_replication_slots;"
psql -h <primary-ip> -p 9999 -U <username> -c “show pool_nodes;”
To test if streaming replication is working fine, you can create a table from the primary and test if you can add or delete any value from it from the standby nodes.
To test for load balancing, you can use pgbench. Please refer to the documentation or Google for more information.
Remember to keep your system up to date with the latest security patches and to regularly back up your data to avoid any data loss. If you need further assistance or want to explore more advanced configurations, feel free to consult the official PostgreSQL documentation or seek help from the community.