OpenSQL_Technical Guide

홈

•

Tech Book

•

OpenSQL

•

OpenSQL_Technical Guide

•

[OpenSQL] repmgr

2023. 2. 7

[OpenSQL] repmgr

repmgr

repmgr 이란?

*PostgreSQL 서버의 복제와 장애 조치를 관리하기 위한 오픈 소스 도구 모음입니다. PostgreSQL의 내장 hot-standby 기능을 향상 시켜 standbt 서버를 설정하고, 복제를 모니터링하며, failover 또는 수동 switchover 작업과 관리 작업을 수행하는 도구를 수행합니다. *****- Repmgr 공식 문서

필수 조정 파라미터

repmgr을 사용하기 위한 PostgreSQL 파라미터 ( postgresql.conf ) 설정

파라미터	설명
hot_standby	repmgr 관리하는 각 서버에 연결할 수 있어야 하므로 항상 ON 설정
wal_level	replica 또는 logical 설정
max_wal_senders	2 이상으로 설정
max_replication_slots	0이 아닌 값으로 설정
wal_log_hints	pg_rewind를 사용하기위해서 ON 설정
archive_mode	ON 으로 설정
archive_command	archive 파일을 저장할 경로 설정
wal_keep_size	데이터베이스 서버에 유지되는 WAL 세그먼트의 크기를 지정

구성 파일 형식 ( repmgr.conf )

repmgr을 구성하기 위한 conf 파일 형식 및 샘플

— 예시

repmgr.conf

node_id=1
node_name=node1
conninfo =’host=node1 dbname=repmgr user=repmgr connect_timeout=2′
data_directory = ‘/var/lib/pgsql/12/data’

파라미터

파라미터	설명
node_id	repmgr 에서 사용되는 노드의 고유 식별자
node_name	repmgrd에서 사용되는 노드의 이름
conninfo	postgresql 데이터베이스에 연결하기 위한 연결 정보를 포함하는 문자열
data_directory	postgresql 데이터 디렉토리의 경로를 설정
config_directory	postgresql 설정 파일들이 위치한 경로 설정
replication_user	streaming 복제를 위해 사용되는 postgresql 사용자 계정을 설정하는 옵션
replication_type	repmgr이 사용할 복제 방식을 설정하는 옵션. 주로 physical 사용
location	repmgr이 데이터베이스 클러스터를 관리하는 위치를 식별하는 옵션
use_replication_slots	복제 슬롯을 사용할 것인지를 설정하는 옵션
witness_sync_interval	witness 노드와의 동기화 간격을 초 단위로 설정
log_level	로그레벨 설정
log_facility	로그메시지의 출력장치를 설정하는 옵션
log_file	repmgrd의 로그 파일 경로를 설정하는 옵션
log_status_interval	로그 파일에 상태 정보를 기록하는 간격을 초 단위롤 설정하는 옵션
event_notification_command	repmgrd에서 발생하는 이벤트를 처리하기 위해 실행할 명령어를 설정하는 옵션
event_notifications	repmgrd에서 발생하는 이벤트를 기록하고자 하는 이벤트 목록을 설정하는 옵션
pg_bindir	postgresql 바이너리 파일이 위치한 경로를 설정하는 옵션
repmgr_bindir	repmgr 바이너리 파일이 위치한 경로를 설정하는 옵션
use_primary_conninfo_password	primary_conninfo 에 포함된 패스워드를 사용할 것인지를 설정하는 옵션
passfile	패스워드 파일의 경로를 설정하는 옵션
pg_ctl_options	pg_ctl 명령에 전달될 옵션들을 설정
pg_basebackup_options	pg_basebackup 명령에 전달될 옵션들을 설정하는 옵션
rsync_options	rsync 명령에 전달될 옵션들을 설정하는 옵션
ssh_options	ssh 연결에 사용될 옵션들을 설정하는 옵션
tablespace_mapping	테이블 스페이스를 다른 파일시스템 경로로 매핑하기 위한 옵션
restore_command	WAL 파일을 사용하여 복구할 때 실행되는 명령어를 설정하는 옵션
archive_cleanup_command	WAL 아카이브 정리를 위해 실행되는 명령어를 설정하는 옵션
recovery_min_apply_delay	standby node에 대한 재해복구 시간을 설정하는 옵션
failover	autofailover 발생할 경우의 동작모드를 설정하는 옵션
priority	승격 후 프라이머리 노드로 선출될 때 우선 순위를 설정하는 옵션
connection_check_type	노드 상태를 확인하는 방법
reconnect_attempts	재연결을 시도하는 최대 횟수를 설정하는 옵션
reconnect_interval	재연결을 시도하는 간격을 설정하는 옵션
promote_command	승격할때 실행되는 명령어를 설정하는 옵션
follow_command	follow 작업을 실행할 때 실행되는 명령어를 설정하는 옵션
primary_notification_timeout	primary node의 알림 응답을 대기하는 시간을 초 단위로 설정하는 옵션
repmgrd_standby_startup_timeout	repmgrd가 standby node를 시작하는 동안 대기하는 최대 시간을 초 단위로 설정하는 옵션
monitoring_history	standby node들의 상태를 추적하는데 사용하는 기록을 남길 것인지를 설정하는 옵션
monitor_interval_secs	standby node들의 상태를 확인하는 간격을 초 단위로 설정
degraded_monitoring_timeout	standby node들 중 하나가 정상적인 동작을 하지 않을 경우 얼마나 오랫동안 기다려줄 것인지를 설정하는 옵션. 음수 값을 설정하면 무제한 대기 상태
async_query_timeout	비동기 쿼리에 대해 응답을 받을 때까지 기다리는 시간을 초 단위로 설정하는 옵션
repmgrd_pid_file	repmgrd 프로세스의 PID 파일 경로를 설정하는 옵션
repmgrd_exit_on_inactive_node	repmgrd 프로세스가 비활성 노드를 발견하면 종료할 것인지를 설정하는 옵션. true시 종료 false시 종료하지 않음
standby_disconnect_on_failover	standby node가 primary node 로 승격될 때, standby node 와 연결된 클라이언트 연결을 끊을것인지 설정하는 옵션. true시 연결 끊음 false 연결을 끊지 않음
sibling_nodes_disconnect_timeout	standby node가 primary node로 승격될 때, 다른 standby node 와 연결을 끊는데 대해 대기할 최대 시간을 초 단위로 설정하는 옵션
primary_visibility_consensus	standby node들 간에 primary node의 가시성을 합의하도록 설정하는 옵션
always_promote	standby node들 간에 승격이 반드시 발생하도록 설정하는 옵션 true시 승격 발생 false시 승격이 발생하지 않음
failover_validation_command	승격이 발생하기 전에 실행되는 명령어를 설정. 승격이 유효한지 확인하는 데 사용될 수 있음
election_rerun_interval	투표를 통해 다시 실행하는 간격을 초단위로 설정
child_nodes_check_interval	자식 노드들의 상태를 확인하는 간격을 초 단위로 설정
child_nodes_connected_min_count	자식 노드들 중 연결된 상태인 노드의 최소 개수를 설정 옵션
child_nodes_disconnect_min_count	자식 노드들 중 연결이 끊긴 상태인 노드의 최소 개수를 설정하는 옵션
child_nodes_disconnect_timeout	자식 노드들과의 연결이 끊겼을 때 대기할 최대 시간을 초단위로 설정하는 옵션
child_nodes_disconnect_command	자식 노드들과의 연결이 끊긴 후 실행될 명령어를 설정하는 옵션
archive_ready_warning	WAL 아카이브가 준비되지 않은 상태에서 경고 메시지를 생성하는 임계값. WAL 아카이브가 이 값보다 크면 경고가 발생. 16이면 16초 이상 딜레이시 발생
archive_ready_critical	WAL 아카이브가 준비되지 않은 상태에서 심각한 메시지를 생성하는 임계값. WAL 아카이브가 이 값보다 크면 심각한 경고가 발생
replication_lag_warning	복제 지연 시간이 경고 메시지를 생성하는 임계값. 복제 지연 시간이 이 값보다 크면 경고가 발생
replication_lag_critical	복제 지연 시간이 심각한 메시지를 생성하는 임계값. 복제 지연 시간이 이 값보다 크면 심각한 경고가 발생.

설치

사전 준비
- postgresql 14.x 설치되어 있는 상태에서 진행
- RPM 설치

— node 1, node 2
# yum install -y make gcc gcc-c++ tar zlib zlib-devel readline readline-devel gettext gettext-devel git libxslt libicu
# yum install -y epel-release
# yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# yum install -y repmgr_14*

postgresql.conf

…
wal_log_hints = on
shared_preload_libraries = ‘repmgr’
…

디렉토리 권한 변경 및 생성

— node 1, node 2

# chown -R opensql:tmax /etc/repmgr
# mkdir /opensql/pg/14/log/repmgr_log
# touch /opensql/pg/14/log/repmgr_log/repmgr.log

SSH 통신

— node 1, node 2

root
mkdir ~/.ssh
chmod 700 ~/.ssh
cd ~/.ssh
ssh-keygen -t rsa

ssh-copy-id -i id_rsa.pub root@192.168.245.171
ssh-copy-id -i id_rsa.pub opensql@192.168.245.171

opensql
mkdir ~/.ssh
chmod 700 ~/.ssh
cd ~/.ssh
ssh-keygen -t rsa

ssh-copy-id -i id_rsa.pub root@192.168.245.171
ssh-copy-id -i id_rsa.pub opensql@192.168.245.171

repmgr 유저 및 데이터베이스 생성

— primary

$ psql
$ create user repmgr with superuser replication createdb;
$ alter user repmgr password ‘repmgr’;
$ create database repmgr owner repmgr ;

— 만약 witness 설정 한다면

witness
$ psql
$ create user repmgr with superuser replication createdb;
$ alter user repmgr password ‘repmgr’;
$ create database repmgr owner repmgr;

repmgr primary 등록

$ repmgr primary register
$ repmgr daemon start
$ repmgr daemon status
$ repmgr cluster show

standby clone 및 등록

$ repmgr -h 192.168.245.172 -U repmgr -d repmgr standby clone
$ pg_ctl start
$ repmgr standby register
$ repmgr daemon start
$ repmgr daemon status
$ repmgr cluster show

repmgr witness 등록

$ repmgr witness register -h 192.168.245.168 -U repmgr
$ repmgr daemon start
$ repmgr daemon status
$ repmgr cluster show

Replication check

— primary

$ psql –pset expanded=auto -c “select * from pg_stat_wal_receiver;”

— standby

$ psql –pset expanded=auto -c “select * from pg_stat_replication;”

복구

node 1번 , node 2번 daemon 이 활성화 되어야한다.

— node 1번
[opensql@localhost:repmgr_log]$ repmgr daemon status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
—-+—————–+———+———–+—————–+———+——–+———+——————–
1 | 192.168.245.172 | primary | * running | | running | 125775 | no | n/a
2 | 192.168.245.171 | standby | running | 192.168.245.172 | running | 66303 | no | 0 second(s) ago
— node 2번
[opensql@localhost:14]$ repmgr daemon status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
—-+—————–+———+———–+—————–+———+——–+———+——————–
1 | 192.168.245.172 | primary | * running | | running | 125775 | no | n/a
2 | 192.168.245.171 | standby | running | 192.168.245.172 | running | 66303 | no | 1 second(s) ago

node 1번 Primary, node 2번 Standby

— 예시) Primary ( node 1번 ) 죽으면 auto failover

Primary down

node 1
[opensql@localhost:pg_log]$ pg_ctl stop
waiting for server to shut down….[2023-05-16 22:36:10.406 KST] [13176] app: [] user: [] database: []DEBUG: logger shutting down
done
server stopped
[opensql@localhost:pg_log]$ repmgr daemon status
ERROR: connection to database failed
DETAIL:
connection to server at “192.168.245.172”, port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?

DETAIL: attempted to connect using:
user=repmgr dbname=repmgr host=192.168.245.172 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=

— Failover

node 2
[opensql@localhost:~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
—-+—————–+———+———————-+———-+———-+———-+———-+————————————————
1 | 192.168.245.172 | primary | ? unreachable | ? | default | 100 | | host=192.168.245.172 dbname=repmgr user=repmgr
2 | 192.168.245.171 | standby | ! running as primary | | default | 100 | 2 | host=192.168.245.171 dbname=repmgr user=repmgr

WARNING: following issues were detected

unable to connect to node “192.168.245.172” (ID: 1)

node “192.168.245.172” (ID: 1) is registered as an active primary but is unreachable

node “192.168.245.171” (ID: 2) is registered as standby but running as primary

HINT: execute with –verbose option to see connection error messages
[opensql@localhost:~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
—-+—————–+———+———–+———-+———-+———-+———-+————————————————
1 | 192.168.245.172 | primary | – failed | ? | default | 100 | | host=192.168.245.172 dbname=repmgr user=repmgr
2 | 192.168.245.171 | primary | * running | | default | 100 | 2 | host=192.168.245.171 dbname=repmgr user=repmgr

Failback 진행

— Primary 에서 Failback 진행
[opensql@localhost:repmgr_log]$ repmgr daemon status
ERROR: connection to database failed
DETAIL:
connection to server at “192.168.245.172”, port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?

DETAIL: attempted to connect using:
user=repmgr dbname=repmgr host=192.168.245.172 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=
[opensql@localhost:repmgr_log]$ repmgr cluster show
ERROR: connection to database failed
DETAIL:
connection to server at “192.168.245.172”, port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?

DETAIL: attempted to connect using:
user=repmgr dbname=repmgr host=192.168.245.172 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=

[opensql@localhost:repmgr_log]$ repmgr node rejoin -f /etc/repmgr/14/repmgr.conf -d ‘host=192.168.245.171 dbname=repmgr user=repmgr’
NOTICE: rejoin target is node “192.168.245.171” (ID: 2)
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node’s recovery point: 0/14000028; rejoin target node’s fork point: 0/140000A0
NOTICE: setting node 1’s upstream to node 2
WARNING: unable to ping “host=192.168.245.172 dbname=repmgr user=repmgr”
DETAIL: PQping() returned “PQPING_NO_RESPONSE”
NOTICE: starting server using “/usr/pgsql-14/bin/pg_ctl -w -D ‘/opensql/pg/14/data’ start”
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
[opensql@localhost:repmgr_log]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
—-+—————–+———+———–+—————–+———-+———-+———-+————————————————
1 | 192.168.245.172 | standby | running | 192.168.245.171 | default | 100 | 7 | host=192.168.245.172 dbname=repmgr user=repmgr
2 | 192.168.245.171 | primary | * running | | default | 100 | 8 | host=192.168.245.171 dbname=repmgr user=repmgr
[opensql@localhost:repmgr_log]$ repmgr daemon status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
—-+—————–+———+———–+—————–+———+——–+———+——————–
1 | 192.168.245.172 | standby | running | 192.168.245.171 | running | 125775 | no | 0 second(s) ago
2 | 192.168.245.171 | primary | * running | | running | 66303 | no | n/a

Switchover

— Standby 에서 진행
[opensql@localhost:repmgr_log]$ repmgr standby switchover -f /etc/repmgr/14/repmgr.conf –log-to-file
[2023-05-17 00:51:27] [NOTICE] redirecting logging output to “/opensql/pg/14/log/repmgr_log/repmgr.log”

[opensql@localhost:repmgr_log]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
—-+—————–+———+———–+—————–+———-+———-+———-+————————————————
1 | 192.168.245.172 | primary | * running | | default | 100 | 9 | host=192.168.245.172 dbname=repmgr user=repmgr
2 | 192.168.245.171 | standby | running | 192.168.245.172 | default | 100 | 8 | host=192.168.245.171 dbname=repmgr user=repmgr

Rejoin이 안될 시
- 강제로 standby clone 진행 ( Failback 진행 )

— Primary 에서 Failback 진행
[opensql@localhost:pg_log]$ repmgr -h 192.168.245.171 -U repmgr -d repmgr standby clone -F
WARNING: following problems with command line parameters detected:
“config_directory” set in repmgr.conf, but –copy-external-config-files not provided
NOTICE: destination directory “/opensql/pg/14/data” provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.245.171 user=repmgr dbname=repmgr
DETAIL: current installation size is 34 MB
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory “/opensql/pg/14/data” exists but is not empty
NOTICE: -F/–force provided – deleting existing data directory “/opensql/pg/14/data”
NOTICE: starting backup (using pg_basebackup)…
HINT: this may take some time; consider using the -c/–fast-checkpoint option
INFO: executing:
/usr/pgsql-14/bin/pg_basebackup -l “repmgr base backup” -D /opensql/pg/14/data -h 192.168.245.171 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /opensql/pg/14/data start
HINT: after starting the server, you need to re-register this standby with “repmgr standby register –force” to update the existing node record
[opensql@localhost:pg_log]$ pg_ctl start
waiting for server to start….2023-05-16 22:40:36.123 KST [124480] LOG: redirecting log output to logging collector process
2023-05-16 22:40:36.123 KST [124480] HINT: Future log output will appear in directory “/opensql/pg/14/log/pg_log”.
done
server started
[opensql@localhost:pg_log]$ repmgr standby register –force
INFO: connecting to local node “192.168.245.172” (ID: 1)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node “192.168.245.172” (ID: 1) successfully registered
[opensql@localhost:pg_log]$ repmgr daemon status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
—-+—————–+———+———–+—————–+———+——-+———+——————–
1 | 192.168.245.172 | standby | running | 192.168.245.171 | running | 13200 | no | 0 second(s) ago
2 | 192.168.245.171 | primary | * running | | running | 12534 | no | n/a

지금까지 ’PostgreSQL의 ‘repmgr’에 관해 알아보았습니다

‘PostgreSQL의 psql’을 바로 이어서 확인해보세요!

Innovating today, leading tomorrow

[OpenSQL] repmgr

repmgr

repmgr 이란?

데이터 산업 트렌드와
티맥스티베로 소식을
만나보세요.

Tibero

OpenSQL

데이터 산업 트렌드와 티맥스티베로 소식을 뉴스레터로 만나보세요.

대표전화 : 031.8018.1700

I

구입문의 : 031.8018.1717

I

기술서비스센터 : 1544.8629

(주)티맥스티베로

대표이사 : 박경희 / 사업자등록번호 : 306-86-01745 / 주소 : 경기도 성남시 분당구 정자일로 45, 티맥스소프트타워

Tibero Inside

Tibero Now

Tech Book

Trend Report

Partner

I

ⓒ TmaxTibero

Innovating today, leading tomorrow

[OpenSQL] repmgr

repmgr

repmgr 이란?

데이터 산업 트렌드와 티맥스티베로 소식을 만나보세요.

Tibero

OpenSQL

데이터 산업 트렌드와 티맥스티베로 소식을 뉴스레터로 만나보세요.

대표전화 : 031.8018.1700

I

구입문의 : 031.8018.1717

I

기술서비스센터 : 1544.8629

(주)티맥스티베로

대표이사 : 박경희 / 사업자등록번호 : 306-86-01745 / 주소 : 경기도 성남시 분당구 정자일로 45, 티맥스소프트타워

Tibero Inside

Tibero Now

Tech Book

Trend Report

Partner

I

ⓒ TmaxTibero

데이터 산업 트렌드와
티맥스티베로 소식을
만나보세요.