PostgreSQL WAL vs. Oracle Redo Log

Write-Ahead Logging (WAL) in PostgreSQL and Redo Logs in Oracle have some things in common. Here I want to compare both with respect to database configuration.

WAL and Redo Logs both have the same purpose: to guarantee data integrity the database management system must write changes to some kind of stable storage. But instead of writing data directly to the location where it belongs and from where it will be retrieved later on, records describing the changes (in Oracle sometimes called change vectors) are written to dedicated log files. After the log has been flushed to disk the DBMS has enough information available to recreate the original data changes after a crash. The data eventually will be written to the correct location by an operation called checkpoint. When the checkpoint is completed the log is no longer needed for crash recovery (but it could still be valuable for log based replication or disaster recovery).

The invention of these logs is probably the consequence of plain old rotational disks being the prevailing kind of stable storage for most of the Information Age. The throughput of writing sequentially to a disk is much larger than writing single blocks to different spots on the disk. Therefore it will in general be much faster to write the log with a change description than to write the changed data itself to the database files.

The most visible difference between WAL and redo logs is their configuration. An Oracle DBA has to decide on the number of redo logs (two or more; to simplify things I will ignore mirrored redo logs here) and their size. So the disk space used by the redo logs is fixed and will only change if the DBA modifies the configuration. For PostgreSQL on the other hand the size of a WAL file is always 16 megabytes (unless the server software has been recompiled with a different value). The exact number of WAL files depends on a couple of configuration parameters (checkpoint_completion_target, checkpoint_segments & wal_keep_segments) and the current load profile.

So what does this mean for the database operation? Both DBMS will reuse a log when the contained data is no longer needed for recovery. That implies that the checkpoint for the relevant data blocks must have finished. The log must also have been archived successfully if archiving is enabled.

Oracle can only use the pre-configured logs and will emit warnings or error messages when a log cannot be reused. The message “checkpoint not complete” in the alert log indicates that Oracle has to postpone database changes because switching to the next redo log is not yet possible due to the still running checkpoint. Obviously this reduces the performance of the application. It gets worse when the archiver is unable to archive a log. In this case the error message “ORA-00257: archiver error. Connect internal only, until freed” comes up and the database freezes until the issue is resolved by the DBA.

PostgreSQL on the other hand will start to add more WAL files to the directory pg_xlog when it can not reuse an existing log. This prevents the database from stalling if checkpoints do not finish in time or the archiver process can’t keep up with the work. Of course an adequate amount of disk space must be reserved for PostgreSQL to make this possible without failures.

Both DBMS offer parameters to control the frequency of checkpoints. In PostgreSQL the parameter checkpoint_timeout is used to define the maximum time interval between two checkpoints. Oracle has a similar parameter called log_checkpoint_timeout. But nowadays the checkpoint frequency is often automatically adjusted to satisfy the crash recovery time. The parameter fast_start_mttr_target sets the target for the duration of a crash recovery and Oracle triggers checkpoints to achieve that goal. So in this case the configuration of Oracle uses a more business aligned number.

In both systems the amount of changes done also triggers a checkpoint. PostgreSQL has the parameter checkpoint_segments to set the number of segments written to the WAL files before a checkpoint happens. As said above, the segments normally have a size of 16 megabytes. So this gives at least 16 megabytes written to WAL until this type of checkpoint is triggered. For Oracle the parameter log_checkpoint_interval specifies the number of OS blocks (containing 512 bytes on most platforms) between checkpoints. A checkpoint is also triggered by a redo log switch and therefore the amount of changes is also defined by the size of the redo logs. A DBA can create redo logs much smaller than 16 megabytes so this could lead to very frequent checkpoints in Oracle. In this case the DBA will have to increase the size of the redo logs to reduce the frequency of these checkpoints. In PostgreSQL the same can by achieved by increasing the parameter checkpoint_segments.

The first step when looking into checkpoint activity is to write the relevant details to a logfile. PostgreSQL uses the parameter log_checkpoints to write details of every checkpoint into the server log. The log even shows if the checkpoint has been triggered by the timeout (indicated by checkpoint starting: time) or by the amount of changes (indicated by checkpoint starting: xlog). Oracle uses the parameter log_checkpoints_to_alert to write details into the alert log.

For more details you can check the official PostgreSQL documentation or the slides WAL Internals Of PostgreSQL from PGCon 2012 by Amit Kapila.