How to do point in time recovery with PostgreSQL 9.2 : PITR

Standard

This post describes how to do point in time recovery with PostgreSQL 9.2 : PITR. Point in time recovery (PITR) simply means restoring data upto certain point in time. This method uses low level API backups, if you want to implement PITR using pg_basebackup utility then read my other post here. Before you proceed any further I would assume that you already have a server ready to implement PITR. If you do not have a PostgreSQL database server then I would suggest to read my other post about installing and configuring PostgreSQL 9.2 database server. Now we have a database server running, before we jump into implementation I want you to see your postgres data directory structure, which should look like the following:

[root@opensourcedbms_pg_master ~]# ls -lrt /var/lib/pgsql/9.2/data/
total 96
-rw-------  1 postgres postgres    72 Jan 14 23:20 postmaster.pid
-rw-------. 1 postgres postgres    71 Jan 14 23:20 postmaster.opts
-rw-------. 1 postgres postgres 19585 Jan 14 23:20 postgresql.conf
drwx------. 3 postgres postgres  4096 Jan 14 23:20 pg_xlog
-rw-------. 1 postgres postgres     4 Jan 14 23:20 PG_VERSION
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_twophase
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_tblspc
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_subtrans
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_stat_tmp
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_snapshots
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_serial
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_notify
drwx------. 4 postgres postgres  4096 Jan 14 23:20 pg_multixact
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_log
-rw-------. 1 postgres postgres  1636 Jan 14 23:20 pg_ident.conf
-rw-------  1 postgres postgres  4292 Jan 14 23:20 pg_hba.conf
drwx------. 2 postgres postgres  4096 Jan 14 23:20 pg_clog
drwx------. 2 postgres postgres  4096 Jan 14 23:20 global
drwx------. 5 postgres postgres  4096 Jan 14 23:20 base

pg xlog

Notice that there is a directory named pg_xlog. At all times, PostgreSQL maintains a write ahead log (WAL) in this pg_xlog/ subdirectory of postgres data directory[typically /var/lib/pgsql/9.2/data]. These log files records every change made to the database’s data files. This log exists primarily for crash-safety purposes and if the system crashes, the database can be restored to consistency by "replaying" the log entries made since the last checkpoint. Which we are now going to use as a highly reliable backup recovery technique for postgres databases.

Continuous archiving and PITR is three step process. I will now go into describing these three steps in detail and explain how we can achieve PITR in PostgreSQL painlessly:

1. Setup Write Ahead Log (WAL) Archiving

To enable continuous WAL archiving we first need to edit postgresql.conf file located under postgres data directory. If your data directory data directory path is different from default, please edit conf file from that location.

vi /var/lib/pgsql/9.2/data/postgresql.conf

Move to WRITE AHEAD LOG section of the config file and update wal_level / archive_mode / archive_command variable values as following:

wal_level = hot_standby
archive_mode = on
archive_command = 'test ! -f /var/lib/pgsql/pg_log_archive/%f && cp %p /var/lib/pgsql/pg_log_archive/%f'

Image 002

Now you are done making configuration changes. Go ahead and write above changes and quit from configuration file. Before you restart your database engine, we need to create above said archive directory [Ideally this should be network mount or location where your main/file level backups are stored]. For now lets create a local directory and to do that simply run the following:

mkdir /var/lib/pgsql/pg_log_archive
chown -R postgres.postgres /var/lib/pgsql/pg_log_archive

Image 003

We are now set to restart our database engine to start continuously archive logs and move them to /var/lib/pgsql/pg_log_archive directory as soon as they reach the check point size which is usually at 16MB. This check point size can be tweaked from configuration file. Now go ahead and restart database engine.

service postgresql-9.2 restart

At this point you are done setting continuous archiving on your database server.

2. Making file system backup [Base backup]

Taking a file system level backup of database directory is very simple and here are the steps to do it:

1. Connect to database as a super user and issue the following command to start backup process:

SELECT pg_start_backup('varlibpgsql', true);

"varlibpgsql" can be any string, this label identifies this backup operation uniquely.

2. Perform file system back of postgresql data directory

tar cfP /var/lib/pgsql/db_file_backup.tar /var/lib/pgsql/9.2/data

Image 009

3. Now again connect to database server as a super user and issue the following command to stop backup process:

SELECT pg_stop_backup();

At this point we are done taking a file system level database backup. You will notice that there are log files flowing into your pg_log_archive directory that you created earlier. These archive log files along with your base backup will let us recover data to last check point or any point in time after base back happened and to the point of upto where we have our archive logs.

Image 006

3. Recovering in the event of disaster

Alright now we have a situation of some sort and we have the worst thing happened. We now have to recover our database server until last transaction. To do that simply follow the steps below:

1. Stop the database server, if it’s running and if its not in running state you are fine.

service postgresql-9.2 stop

2. If you have the space to do so, copy the existing data directory[/var/lib/pgsql/9.2/data] and any tablespaces to a temporary location in case you need them later. If you do not have enough space, you should at least save the contents of the pg_xlog sub directory, as it might contain logs which were not archived before the system went down.

mv /var/lib/pgsql/9.2/data /tmp

3. Restore the database files from your file system backup. Be sure that they are restored with the right ownership (the database system user – postgres, and not root!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.

tar xvfP /var/lib/pgsql/db_file_backup.tar

Image 010

4. Remove any files present in pg_xlog/; these came from the file system backup and are therefore probably obsolete rather than current. If you didn’t archive pg_xlog/ at all, then recreate it with proper permissions, being careful to ensure that you re-establish it as a symbolic link if you had it set up that way before.

rm -rf /var/lib/pgsql/9.2/data/pg_xlog/*.*

5. If you have unarchived WAL segment files that you saved in step 2, copy them into pg_xlog/. (It is best to copy them, not move them, so you still have the unmodified files if a problem occurs and you have to start over.) To check for unarchived WAL segment files simply compare both /var/lib/pgsql/pg_xlog_archive and /tmp/data/pg_xlog directories and copy over new log files found in /tmp/data/pg_xlog into /var/lib/pgsql/9.2/data/pg_xlog. In our case we have 000000010000000000000006 and 000000010000000000000007 files which don’t exist in our continuous archival directory. so copy them to /var/lib/pgsql/9.2/data/pg_xlog/ directory.

cp /tmp/data/pg_xlog/000000010000000000000006 /var/lib/pgsql/9.2/data/pg_xlog
cp /tmp/data/pg_xlog/000000010000000000000007 /var/lib/pgsql/9.2/data/pg_xlog

Image 013

6. At this point you are done setting back all the data directories etc. Now create a recovery command file recovery.conf in the data directory. You might also want to temporarily modify pg_hba.conf to prevent ordinary users from connecting until you are sure the recovery was successful.

A sample recovery file is provided with installation of the database server which should be located in share directory under the installation directory of postgres [ex: /usr/pgsql-9.2/share] or you can create one under /var/lib/pgsql/9.2/data directory with the following contents:

# -------------------------------
# PostgreSQL recovery config file
# -------------------------------
#
# Edit this file to provide the parameters that PostgreSQL needs to
# perform an archive recovery of a database, or to act as a replication
# standby.
#
# If "recovery.conf" is present in the PostgreSQL data directory, it is
# read on postmaster startup.  After successful recovery, it is renamed
# to "recovery.done" to ensure that we do not accidentally re-enter
# archive recovery or standby mode.
#
# This file consists of lines of the form:
#
#   name = value
#
# Comments are introduced with '#'.
#
# The complete list of option names and allowed values can be found
# in the PostgreSQL documentation.
#
#---------------------------------------------------------------------------
# ARCHIVE RECOVERY PARAMETERS
#---------------------------------------------------------------------------
#
# restore_command
#
# specifies the shell command that is executed to copy log files
# back from archival storage.  The command string may contain %f,
# which is replaced by the name of the desired log file, and %p,
# which is replaced by the absolute path to copy the log file to.
#
# This parameter is *required* for an archive recovery, but optional
# for streaming replication.
#
# It is important that the command return nonzero exit status on failure.
# The command *will* be asked for log files that are not present in the
# archive; it must return nonzero when so asked.
#
# NOTE that the basename of %p will be different from %f; do not
# expect them to be interchangeable.
#
#restore_command = ''           # e.g. 'cp /mnt/server/archivedir/%f %p'
#
#
# archive_cleanup_command
#
# specifies an optional shell command to execute at every restartpoint.
# This can be useful for cleaning up the archive of a standby server.
#
#archive_cleanup_command = ''
#
# recovery_end_command
#
# specifies an optional shell command to execute at completion of recovery.
# This can be useful for cleaning up after the restore_command.
#
#recovery_end_command = ''
#
#---------------------------------------------------------------------------
# RECOVERY TARGET PARAMETERS
#---------------------------------------------------------------------------
#
# By default, recovery will rollforward to the end of the WAL log.
# If you want to stop rollforward at a specific point, you
# must set a recovery target.
#
# You may set a recovery target either by transactionId, by name,
# or by timestamp. Recovery may either include or exclude the
# transaction(s) with the recovery target value (ie, stop either
# just after or just before the given target, respectively).
#
#
#recovery_target_name = ''      # e.g. 'daily backup 2011-01-26'
#
#recovery_target_time = ''      # e.g. '2004-07-14 22:39:00 EST'
#
#recovery_target_xid = ''
#
#recovery_target_inclusive = true
#
#
# If you want to recover into a timeline other than the "main line" shown in
# pg_control, specify the timeline number here, or write 'latest' to get
# the latest branch for which there's a history file.
#
#recovery_target_timeline = 'latest'
#
#
# If pause_at_recovery_target is enabled, recovery will pause when
# the recovery target is reached. The pause state will continue until
# pg_xlog_replay_resume() is called. This setting has no effect if
# hot standby is not enabled, or if no recovery target is set.
#
#pause_at_recovery_target = true
#
#---------------------------------------------------------------------------
# STANDBY SERVER PARAMETERS
#---------------------------------------------------------------------------
#
# standby_mode
#
# When standby_mode is enabled, the PostgreSQL server will work as a
# standby. It will continuously wait for the additional XLOG records, using
# restore_command and/or primary_conninfo.
#
#standby_mode = off
#
# primary_conninfo
#
# If set, the PostgreSQL server will try to connect to the primary using this
# connection string and receive XLOG records continuously.
#
#primary_conninfo = ''          # e.g. 'host=localhost port=5432'
#
#
# By default, a standby server keeps restoring XLOG records from the
# primary indefinitely. If you want to stop the standby mode, finish recovery
# and open the system in read/write mode, specify path to a trigger file.
# The server will poll the trigger file path periodically and start as a
# primary server when it's found.
#
#trigger_file = ''
#
#---------------------------------------------------------------------------
# HOT STANDBY PARAMETERS
#---------------------------------------------------------------------------
#
# Hot Standby related parameters are listed in postgresql.conf
#
#---------------------------------------------------------------------------

One important modification that we need to make in this configuration file is we need to mention restore_command. In our can now that would be:

restore_command = 'cp /var/lib/pgsql/pg_log_archive/%f %p'

Please update this variable in the above recovery configuration file created. Write changes and exit out of recovery.conf file and make sure that this config file permissions are properly set:

chown postgres.postgres recovery.conf

Also Make sure that all files under /var/lib/pgsql/9.2/data/pg_xlog/ have correct permissions/ownershipts:

chown -R postgres.postgres /var/lib/pgsql/9.2/data/pg_xlog/

Image 013

7. Now you are ready to start the server.

service postgresql-9.2 start

The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode later) and then commence normal database operations.

Image 014

8. If restore is successful and everything looks good, allow your users to connect by restoring pg_hba.conf to normal.

Thats its folks you have successfully did a point in time recovery. There are several options that can be tweaked in recovery.conf file for timelines etc., Just check it out once to get familiarity.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

About Prashanth Goriparthi

Sr. Database Administrator / Architect

14 Comments

  1. Pingback: Point in time recovery: PITR using pg_basebackup with PostgreSQL 9.2

  2. Pingback: Setup replication with Postgres 9.2 on CentOS 6/Redhat EL6/Fedora

  3. Hello Prashanth,

    excellent mate..
    you saved my day…
    the way you’ve presented is great.
    looking forward to more knowledge sharing from your end to the opensource community..

    cheers!!
    Rammi

  4. Thanks, Prashanth, this helped me verify my recover process.

    I do have one question, when copying the xlog files from your tmp directory to the recovery db, you copy …1…6 and …1…7 ( see commands below) but not …1…8, what’s the logic there? Thanks!

    cp /tmp/data/pg_xlog/000000010000000000000006 /var/lib/pgsql/9.2/data/pg_xlog
    cp /tmp/data/pg_xlog/000000010000000000000007 /var/lib/pgsql/9.2/data/pg_xlog

  5. Hi Prasanth, Thanks for sharing this wonderful article.
    I need to migrate data from 32bit 9.0 postgreSql to 64bit 9.0 PostgreSQL. Could you help me providing some pointers/refernces?

    I am left with 2 options- 1. dump and restore 2.Replicating DB with tools like Slony etc.

    As dump and restore needs more down time due to the large data(approx 200GB)-it will get less priority.

    Any sort of help is appreciated.

  6. Hi Prashanth

    I was trying to restore the backup into a new machine. Version:Postgresql 9.3
    Followed the below steps:
    > archiving enabled
    > select pg_start_backup(‘/pasth/bkp/’,true); in source machine
    > Used tar command to take the backup of data directory in source machine.
    > select pg_stop_backup(); in source machine
    > some archives got generated
    > Moved the tar file plus archive files generated in the source machine to destination machine.
    > Stopped the database in destination machine.
    > Moved the data directory of destination machine to a different mountpoint.
    > Extracted the tar file to the /opt/PostgresPlus/9.3AS and kept the data directory
    > Copied the archive files generated in source machine to archive directory in destination machine.
    > Edited the recovery.conf copied to the data directory with the parameter : restore_command = ‘cp /path/arch_dest/%f %p’
    > Executed ./pg_ctl start -D /opt/PostgresPlus/9.3AS — but failed to start.

    Where did I go wrong?

    Please help

    Thanks & Regards
    Rijo.S.Roy

  7. Hi Prasanth,
    thanks for the article..
    I was following your tutorial for setting up a PITR, but unfortunately I am stuck with an error which occurs when I ran
    select pg_stop_backup();

    please see the below error

    Notice: pg_stop_backup cleanup done, waiting for reqiored WAL segments to be archived

    Warning: pg_stop_backup waiting still for required WAL segments to be archived(60 second elapsed).

    Appreciate your help.

    Thanks
    saj

  8. Its looks me a confusing ,

    why the need to copy the /data of source to /tmp , assuming the disk crashed we are not going to get /data of source , and later you were using /tmp/data while recovery . where you will find /tmp/data of source PITR on other server .

  9. Pingback: What does pg_basebackup does? - PKT

Leave a Reply

* Captcha [Please fill/solve below] : * Time limit is exhausted. Please reload the CAPTCHA.