- add as known issue in release notes
- fix a broken link in features.md (not related to issue...)
- add to global key providers a warning about keyring provider with WAL
encrypt
- add new subtopic in Backup WAL about key rotations during backups for
file-based key providers
Based on PG-1895 description.
Before this commit, we XLogged the provider ID (keyringId) of the old
key. Yet, we then attempt to fetch the new key from the old provider
during the Redo, which obviously fails and crashes the recovery.
So the next steps lead to the recovery stalemate:
- Create new provider (with new destination - mount_path, url etc).
- Create new server/global key.
- Rotate key.
- <Crash!>
This commit fixes it by Xlogging the new key's provider ID.
For: PG-1895
There is no reason to do durable_unlink before durable_rename. Rename
can handle existing file. But with this sequence, the cluster may
endup in unrecoverable state should server crash in-between this two
ops, as there is going to be no "_keys" at all.
The current sequence may also cause an issue the backup:
<durable_unlink>, <pg_basebackup gets a file list>, <durable_rename>.
And no "_keys" file in the backup as the result.
- add new topic called # Backing up with WAL encryption enabled
- add two suptopics for other wal methods and restore backup created with wal encrypt
- reword to short form option flags
- remove (tech preview)
- remove mentions of WAL being BETA and warning notes
- add WAL tool support to limitations, improve flow, add button to setup
- add limitation regarding WAL shipping standy not supported with WAL encryption
- add mention of open source and enterprise ed being supported for pg_tde
- add none method to basebackup and link to topic
- add Example Patroni configuration for Patroni tool
- improve supported vs unsupported tools section in Limitations
Continued from #523
- add pg_tde archive and restore commands
- update cli-tools.md with paragraphs explaining New and extended tools
- update pg-tde-restore-encrypt tool with new information and better descriptions for clarity
- update the Features topic button for better clarity
We used to assume that the only errors which could happen were ones
which set the errno, but that is not the case. We also want to give nice
output on non-zero return values and if the process was killed by a
signal.
Before, pg_basebackup would encrypt streamed WAL according to the keys
in pg_tde/wal_keys in the destination dir.
This commit introduces the number of changes:
pg_basebackup encrypts WAL only if the "-E --encrypt-wal" flag is
provided. In such a case, it would extract the principal key, truncate
pg_tde/wal_keys and encrypt WAL with a newly generated WAL key. We
still expect pg_tde/wal_keys and pg_tde/1664_providers in the
destination dir. In case these files are not provided, but "-E" is
specified, it fails with an error.
We also throw a warning if pg_basebackup runs w/o -E, but there is
wal_keys on the source as WAL might be compromised, and the backup
is broken
For PG-1603, PG-1857
There was a race condition in the WAL archiving tests where if the
end-of-recovery checkpoint had completed the tests for the WAL contents
were non-sensical and racy. Solve this by explicilty promoting the
server first after we have looked at the WAL contents but still making
sure to wait until all WAL has been replayed.
Additionally improve the tests by actually making sure the replica
starts in a good state where all WAL is encrypted and testing both
the plaintext and the encrypted scenarios.
Unfortunately the logic for generating a new key to protect the stream
cipher used to encrypt the WAL stream in our restore command was based
on totally incorrect assumptions due to how the recovery is implemented.
Recovery is a state machine which can go back and forward between one
mode where it streams from a primary and another where it first tries to
fetch WAL from the archive and if that fails from the pg_wal directory,
and in the pg_wal directory we may have files which are encrypted with
whatever keys were there originally.
To handle all the possible scenarios we remove the ability of
pg_tde_restore_encrypt to generate new keys and just has it use whatever
keys there are in the key file. This unfortunately means we open
ourselves to some attacks on the stream cipher if the system is tricked
into encrypting a different WAL stream at the same TLI and LSN as we
already have encrypted. As far as I know this should be rare under
normal operations since normally e.g. the WAL should be the same in the
archive as the one in pg_wal or which we receive through streaming.
Ideally we would want to fix this but for now it is better to have WAL
encryption with this weakness than to not have it at all.
This also incidentally fixes a bug we discovered caused by generating a
new key only invalidating one key rather than all keys which should have
become invalid, since we no longer generate a new key.
It seems like there are cases when the postmaster have "restarted"
after a backend crash where the wal cache inherited from the postmaster
is wrong.
I'm not at all sure exactly how and why this happens, but this patch
fixes a bug with this and allows recovery/013_crash_restart to pass with
WAL encryption enabled.
According to the documentation, each backend is supposed to hold
AddinShmemInitLock when calling ShmemInitStruct. We only did that for
half of our calls before this patch.
There is at lesat one corner case scenario where we have to load the
last record into the cache during a write:
* replica crashes, receives last segment from primary
* replica replays last segment, reaches end
* replica activtes new key
* replica replays prepared transaction, has to use old keys again
* old key write function sees that we generated a new key, tries to load
it
In this scenario we could get away by detecting that we are in a write,
and asserting if we tried to use the last key.
But in a release build assertions are not fired, and we would end up
writing some non encrypted data to disk, and later if we have to run
recovery failing.
It could be a FATAL, but that would still crash the server, and the next
startup would crash again and again...
Instead, to properly avoid this situation we preallocate memory for one
more key in the cache during initialization. Since we can only add one
extra key to the cache during the servers run, this means we no longer
try to allocate in the critical section in any corner case.
While this is not the nicest solution, it is simple and keeps the
current cache and decrypt/encrypt logic the same as before. Any other
solution would be more complex, and even more of a hack, as it would
require dealing with a possibly out of date cache.
To not break recovery when we replay encrypted WAL but WAL encryption is
disabled the simplest way is to treat disabled WAL encryption just like
enabled WAL encryption. The issue is not big in practice since it should
only hit users who disable WAL encryption and then crash the database
but treating both cases the same way makes the code simple to
understand.
Previosly we simply set the LSN for the new key to the first write
location.
This is however not correct, as there are many corner cases around this:
* recovery / replication might write old LSNs
* we can't handle multiple keys with the same TLI/LSN, which can happen
with quick restarts without writes
To support this in this commit we modify the following:
* We only activate new keys outside crash recovery, or immediately if
encryption is turned off
* We also take the already existing last key into account (if exists),
and only activate a new key if we progressed past its start location
The remaining changes are just support infrastructure for this:
* Since we might rewrite old records, we use the already existing keys
for those writes, not the active last keys
* We prefetch existing keys during initialization, so it doesn't
accidentally happen in the critical section during a write
There is a remaining bug with stopping wal encryption, also mentioned in
a TODO message in the code. This will be addressed in a later PR as this
fix already took too long.
The min/max comparisons of LSNs assumed that everyting is in the same
timeline. In practice, with replication + recovery combinations, it is
possible that keys span at least 3 timelines, which means that this has
to be included in both combinations, as in other timelines, the
restrictions are less strict.
Use a single argument for the wrapped command in the archivation
wrappers.
Instead of giving all of the arguments of the command separately and
trying to figure out which one should be replaced by the path to the
unencrypted WAL segment, we take a single argument and do % parameter
replacement similar to what postgres does with archive_command and
restore_command.
This also mean that we can simplify by using system() instead of exec().
We also clean up usage instructions and make the two wrappers more
symmetrical by requiring the same parameters.
Co-authored-by: Andreas Karlsson <andreas.karlsson@percona.com>
Instead of first deleting any leftover key and then writing the new key
we do a single pass through the file where we replace any old key that
we find. To make this happen on redo too we need to stop generating a
separate WAL record for the key deletion for encrypted tables and only
generate that record for unencrypted tables where we still need a key
deletion record.
We except this optimization to primarily be visible on WAL replay where
only a single backend is used to replay everything, but it also speeds
up table creation in general on workloads with many tables.
We forgot to have a check against trying to delete leftover SMGR keys
for temporary tables which is a useless operation since they are store
in memory.
Additionally we forgot to prevent WAL from being written when creating
or removing a key in smgrcreate() for temporary tables.
- Fix whitespace
- Make sure to use the right languages
- Do not wrap short SQL queries unnecessarily
- Add missing end of code block
- Add missing semicolon to SQL query
Instead of having the WAL key code include the headers for the SMGR keys
we move the shared code into a separate header file. Additionally we
clean up some minor header issues.
Just logging that the function was called at DEBUG2 is not very helpful
to anyone and is presumably jsut a leftover from someone's attempt at
debugging a particular issue they had at some point.
Breaking these particular snippets out as separate functions did not
improve readability and was only done because they use to be called from
multiple locations.
This change has already been done in the WAL key code.
When WAL is streamed during the backup (default mode), it comes in
unencrypted. But we need keys to encrypt it. For now, we expect that
the user would put `pg_tde` dir containing the `1664_key` and
`1664_providers` into the destination directory before starting the
backup. We encrypt the streamed WAL according to internal keys. No
`pg_tde` dir means no streamed WAL encryption.
Also rename enum variants for consistency plus renumber the types for
the WAL keys which is fine since this file is newly introduced which
makes breaking backwards compatibility not an issue.
Before this commit, WAL keys didn't mind TLI at all. But after
pg_rewind, for example, pg_wal/ may contain segments from two
timelines. And the wal reader choosing the key may pick the wrong one
because LSNs of different TLIs may overlap. There was also another bug:
There is a key with the start LSN 0/30000 in TLI 1. And after the start
in TLI 2, the wal writer creates a new key with the SN 0/30000, but in
TLI 2. But the reader wouldn't fetch the latest key because w/o TLI,
these are the same.
This commit adds TLI to the Internal keys and makes use of it along
with LSN for key compares.