Before this commit, we excluded 1664_providers from being rewritten
during a backup, since this is one of the files that the user has to
copy to the destination before the backup starts. It was done so the
user can have different global providers and keys to encrypt backup's
WAL than the source server. However, this raised several issues in
case the server creates new providers or modifies existing ones during
the backup. Then, those changes will be lost in the backup and data
related to such providers might become unreadable, or redo might
struggle to perform a rotation.
This commit treats 1664_providers as the rest of the files and makes
provider changes safe during the backup.
We still don't rewrite wal_keys, as we generate a unique WAL key for
the backup, and the server can't generate new WAL keys can't during
the backup.
Fixes PG-1895
This structure isn't really a key but meta data for a range of WAL using
a given key. Rework these structures to better represent this.
Also decouple the structure from the file format and use InternalKey for
the actual decrypted key data.
There is still plenty of cleanup to do in relation to this, but I
believe this is at least easier to understand than what we currently
have. I wish the end of the ranges were handled better though, but that
is a bit out of scope for this commit.
It gets confusing when this structure is used for both encrypted and
decrypted key data. With this change it's obvious that any allocation of
InternalKey can potentially leak key material to swap files.
Also be more explicit with empty alignment in the TDEMapEntry struct as
multiple levels of anonymous structs to maintain alignment would get a
bit messy.
Exposing and setting the environment variables everywhere makes it
harder to refactor and to understand what is going on. Instead we can
just write a couple of helper functions and use Perl scalar variables.
This only affects EXEC_BACKEND/Windows builds which we currently do not
support, but we fix this anyway to make the code more consistent and
easier to understand since we try to care about this in other places.
In the future we may want to add CI and proper support for EXEC_BACKEND
builds.
The issue was originally found by Zsolt Parragi.
Archiving being enabled during the setup of the SMGR environment caused
one of the test suites for WAL archiving to fail so we disable it while
running queries in single-user mode.
Previously we ignored the extra arguments to initdb when initializing
the pg_tde directory and just copied the directory from a database
initialized without the extra arguments, or if available from the cache.
Now make sure that when extra arguments are supplied that we do not
use the cache and that we copy the pg_tde directory from database
initialized with the extra arguments.
As far as I know this is only relevant to the --allow-group-access flag
but we may as well make the solution generic.
This is currently only added for meson, but could also be added for
make.
This has to be setup before the actual TAP tests are run as they are run
in parallel and as such would all try to setup the template at the same
time if we let them use the same folder for it without it being
pre-generated.
This seems to shorten the test suite run-time by ~25% on my laptop, so
it seems worth doing.
This enables table encryption by default in TAP tests when TDE_MODE=1.
Use TDE_MODE_SMGR=0 to turn off table encryption when running with
pg_tde loaded.
The setup for running regress with tde turned on has been slightly
modified to match what is done for TAP tests to let tests that run the
regress suite under TAP work.
This enables WAL encryption by default when the TAP tests are run with
TDE_MODE=1. Use TDE_MODE_WAL=0 to disable wal encryption while still
having pg_tde enabled.
This makes sure pg_tde is loaded and keys are setup when running
postgresql TAP suite. No TDE features are enabled at this point.
Single user mode is used to generate a template of pg_tde setup files
which are then copied to each created cluster's data directory.
- add as known issue in release notes
- fix a broken link in features.md (not related to issue...)
- add to global key providers a warning about keyring provider with WAL
encrypt
- add new subtopic in Backup WAL about key rotations during backups for
file-based key providers
Based on PG-1895 description.
Before this commit, we XLogged the provider ID (keyringId) of the old
key. Yet, we then attempt to fetch the new key from the old provider
during the Redo, which obviously fails and crashes the recovery.
So the next steps lead to the recovery stalemate:
- Create new provider (with new destination - mount_path, url etc).
- Create new server/global key.
- Rotate key.
- <Crash!>
This commit fixes it by Xlogging the new key's provider ID.
For: PG-1895
There is no reason to do durable_unlink before durable_rename. Rename
can handle existing file. But with this sequence, the cluster may
endup in unrecoverable state should server crash in-between this two
ops, as there is going to be no "_keys" at all.
The current sequence may also cause an issue the backup:
<durable_unlink>, <pg_basebackup gets a file list>, <durable_rename>.
And no "_keys" file in the backup as the result.
- add new topic called # Backing up with WAL encryption enabled
- add two suptopics for other wal methods and restore backup created with wal encrypt
- reword to short form option flags
- remove (tech preview)
- remove mentions of WAL being BETA and warning notes
- add WAL tool support to limitations, improve flow, add button to setup
- add limitation regarding WAL shipping standy not supported with WAL encryption
- add mention of open source and enterprise ed being supported for pg_tde
- add none method to basebackup and link to topic
- add Example Patroni configuration for Patroni tool
- improve supported vs unsupported tools section in Limitations
Continued from #523
- add pg_tde archive and restore commands
- update cli-tools.md with paragraphs explaining New and extended tools
- update pg-tde-restore-encrypt tool with new information and better descriptions for clarity
- update the Features topic button for better clarity
We used to assume that the only errors which could happen were ones
which set the errno, but that is not the case. We also want to give nice
output on non-zero return values and if the process was killed by a
signal.
Before, pg_basebackup would encrypt streamed WAL according to the keys
in pg_tde/wal_keys in the destination dir.
This commit introduces the number of changes:
pg_basebackup encrypts WAL only if the "-E --encrypt-wal" flag is
provided. In such a case, it would extract the principal key, truncate
pg_tde/wal_keys and encrypt WAL with a newly generated WAL key. We
still expect pg_tde/wal_keys and pg_tde/1664_providers in the
destination dir. In case these files are not provided, but "-E" is
specified, it fails with an error.
We also throw a warning if pg_basebackup runs w/o -E, but there is
wal_keys on the source as WAL might be compromised, and the backup
is broken
For PG-1603, PG-1857
There was a race condition in the WAL archiving tests where if the
end-of-recovery checkpoint had completed the tests for the WAL contents
were non-sensical and racy. Solve this by explicilty promoting the
server first after we have looked at the WAL contents but still making
sure to wait until all WAL has been replayed.
Additionally improve the tests by actually making sure the replica
starts in a good state where all WAL is encrypted and testing both
the plaintext and the encrypted scenarios.
Unfortunately the logic for generating a new key to protect the stream
cipher used to encrypt the WAL stream in our restore command was based
on totally incorrect assumptions due to how the recovery is implemented.
Recovery is a state machine which can go back and forward between one
mode where it streams from a primary and another where it first tries to
fetch WAL from the archive and if that fails from the pg_wal directory,
and in the pg_wal directory we may have files which are encrypted with
whatever keys were there originally.
To handle all the possible scenarios we remove the ability of
pg_tde_restore_encrypt to generate new keys and just has it use whatever
keys there are in the key file. This unfortunately means we open
ourselves to some attacks on the stream cipher if the system is tricked
into encrypting a different WAL stream at the same TLI and LSN as we
already have encrypted. As far as I know this should be rare under
normal operations since normally e.g. the WAL should be the same in the
archive as the one in pg_wal or which we receive through streaming.
Ideally we would want to fix this but for now it is better to have WAL
encryption with this weakness than to not have it at all.
This also incidentally fixes a bug we discovered caused by generating a
new key only invalidating one key rather than all keys which should have
become invalid, since we no longer generate a new key.
It seems like there are cases when the postmaster have "restarted"
after a backend crash where the wal cache inherited from the postmaster
is wrong.
I'm not at all sure exactly how and why this happens, but this patch
fixes a bug with this and allows recovery/013_crash_restart to pass with
WAL encryption enabled.
According to the documentation, each backend is supposed to hold
AddinShmemInitLock when calling ShmemInitStruct. We only did that for
half of our calls before this patch.
There is at lesat one corner case scenario where we have to load the
last record into the cache during a write:
* replica crashes, receives last segment from primary
* replica replays last segment, reaches end
* replica activtes new key
* replica replays prepared transaction, has to use old keys again
* old key write function sees that we generated a new key, tries to load
it
In this scenario we could get away by detecting that we are in a write,
and asserting if we tried to use the last key.
But in a release build assertions are not fired, and we would end up
writing some non encrypted data to disk, and later if we have to run
recovery failing.
It could be a FATAL, but that would still crash the server, and the next
startup would crash again and again...
Instead, to properly avoid this situation we preallocate memory for one
more key in the cache during initialization. Since we can only add one
extra key to the cache during the servers run, this means we no longer
try to allocate in the critical section in any corner case.
While this is not the nicest solution, it is simple and keeps the
current cache and decrypt/encrypt logic the same as before. Any other
solution would be more complex, and even more of a hack, as it would
require dealing with a possibly out of date cache.
To not break recovery when we replay encrypted WAL but WAL encryption is
disabled the simplest way is to treat disabled WAL encryption just like
enabled WAL encryption. The issue is not big in practice since it should
only hit users who disable WAL encryption and then crash the database
but treating both cases the same way makes the code simple to
understand.
Previosly we simply set the LSN for the new key to the first write
location.
This is however not correct, as there are many corner cases around this:
* recovery / replication might write old LSNs
* we can't handle multiple keys with the same TLI/LSN, which can happen
with quick restarts without writes
To support this in this commit we modify the following:
* We only activate new keys outside crash recovery, or immediately if
encryption is turned off
* We also take the already existing last key into account (if exists),
and only activate a new key if we progressed past its start location
The remaining changes are just support infrastructure for this:
* Since we might rewrite old records, we use the already existing keys
for those writes, not the active last keys
* We prefetch existing keys during initialization, so it doesn't
accidentally happen in the critical section during a write
There is a remaining bug with stopping wal encryption, also mentioned in
a TODO message in the code. This will be addressed in a later PR as this
fix already took too long.
The min/max comparisons of LSNs assumed that everyting is in the same
timeline. In practice, with replication + recovery combinations, it is
possible that keys span at least 3 timelines, which means that this has
to be included in both combinations, as in other timelines, the
restrictions are less strict.
Use a single argument for the wrapped command in the archivation
wrappers.
Instead of giving all of the arguments of the command separately and
trying to figure out which one should be replaced by the path to the
unencrypted WAL segment, we take a single argument and do % parameter
replacement similar to what postgres does with archive_command and
restore_command.
This also mean that we can simplify by using system() instead of exec().
We also clean up usage instructions and make the two wrappers more
symmetrical by requiring the same parameters.
Co-authored-by: Andreas Karlsson <andreas.karlsson@percona.com>