There was a mistake in the new cap calculations during the cache extension. It popped up only when a new cache size was multiple of a cache record (every 256 records). Which lead to the usage of the memory beyond an allocated size. This commit fixes it along with mlock size for reallocated pages.
Also fixed a typo in a variable name.
Fixes PG-1248
This commit:
1. Removes autoconf builds for pg_tde so it can be together built with Postgres (now, used would have to go to contrib/pg_tde and build it explicitly after building Postgres) with make. There are still left pg_tde builds in CI tests since this PR depends on https://github.com/percona/postgres/pull/20. So those leftover will be removed after merging PG PR
2. Adds necessary changes regarding new code (like kmip) so frontend tools (pg_waldump et al) can be compiled with pg_tde
2. Get rid of realpath as it has issues with optimised builds
For: PG-1003, PG-1005
Also includes some refactoring because libkmip and postgres headers
are not compatible. To avoid compilation errors, keyring_kmip.c does
not include postgres headers, and keyring_kmip_ereport.c does not
include libkmip headers.
We use tablespaceId as a part of IV for the internal keys encryption
which doesn't add any security because dbId (used as well) is unique
anyway.
But having tablespaceId really complicates things as a principal
created for the entire database but then different relations in this db
can be located in different tablespaces...
So it is better not to use tablespace with the principal key (database
level) as it belongs to the relation level.
When a relation moved to a new location it causes the change of relfilenode id
for it. Hence we must re-encrypt and store its internal key with the new id.
Also, we have to store the changed internal key in the new physical location,
and copy there principal key info and keyring data.
Fixes https://perconadev.atlassian.net/browse/PG-1038
* PG-1058 Fix MergeJoin issue
Resolved an issue in MergeJoin by ensuring the decrypted buffer contents are
also copied from the source to the destination tuple slot during
slot copy operations.
Co Authored by:
Andrew Pogrebnoy <absourd.noise@gmail.com>
Artem Gavrilov <artem.gavrilov@percona.com>
* Create table always checked the principal key and tried to create
it, event when we didn't try to create a tde_heap table
* Alter table wasn't handled, and because of this changing a table
to tde_heap access method didn't result in an encrypted table.
* defaut_table_access_method wasn't handled, and because of this,
creating a table using that also resulted in a non encrypted
table.
* PG-1056 Add failing test
* PG-1056 Use proper AM in test
* Fix UPDATE SET ... RETURNING processing for encrypted tuples
If `get_heap_tuple` is NULL, the core uses `copy_heap_tuple` instead. The former returns a pointer to a tuple in the slot and the latter makes a copy of such a tuple. For UPDATE SET, the core uses the slot for INSERT and later for RETURNING processing. If we copy the tuple the next happens:
1. The core creates a slot with the generic tuple.
2. It passed to `pg_tdeam_tuple_update()` and it gets a copy of the tuple here [6d4f7e5b7b/src17/access/pg_tdeam_handler.c (L336)].
3. This generic tuple is filled with the proper data and used for the update here [6d4f7e5b7b/src17/access/pg_tdeam_handler.c (L343)].
4. Later on, RETURNING processing uses the slot's tuple but is still a generic unmodified one because of the copy.
5. That results in wrong RETURNING data.
To avoid this, we should return a pointer to the slot's tuple instead of copying it.
Fixes PG-1056
* PG-1056 Split 'update' testcase for tde_heap and tde_heap_basic
---------
Co-authored-by: Andrew Pogrebnoy <absourd.noise@gmail.com>
* Make related code compilable with frontend
This commit makes the code around keyring, principal keys and WAL
encryption compilable with frontend tools. Namely:
- Hide everything that isn't compatible and of no use behind
'#ifndef FRONTEND'
- Redefine code that is needed in both versions but should have
different code. E.g. error handling, file descriptors and locks
- Make use of frontend lists instead of backend ones where needed.
For https://perconadev.atlassian.net/browse/PG-857
* PG-853: Access control of pg_tde SQL functions
Add SQL interfaces for granting and revoking access to key management and viewer
functions. This commit introduces four new SQL functions to manage access to
key-related functionalities in the `pg_tde` extension:
- `pg_tde_grant_key_management_to_role`: Grants execute permissions on key
management functions to the specified user or role.
- `pg_tde_revoke_key_management_from_role`: Revokes execute permissions on
key management functions from the specified user or role.
- `pg_tde_grant_key_viewer_to_role`: Grants execute permissions on key
viewer functions to the specified user or role.
- `pg_tde_revoke_key_viewer_from_role`: Revokes execute permissions on
key viewer functions from the specified user or role.
Additionally, upon creating the extension, all execute permissions are revoked
from the `PUBLIC` role. Therefore, a superuser must explicitly grant the
necessary permissions to non-superusers to access these functions after the
extension is created.
These additions provide a more controlled and secure way to manage permissions
for key management and viewer functionalities within the extension.
This commit replaces dependency of the keyring code on JSON backend
functions with common JSON API.
Usage of the backend JSON funcs prevents the code to be used by
frontend tools like pg_waldump. Besides it requiers extra conversions
to Datums and DirectFunctionCall-s.
For: https://perconadev.atlassian.net/browse/PG-857
Recent commits in the PG17 code added additional API changes,
making the "single src directory with ifdefs" approach inpractical.
This commit adds a new python based script (documented with comments
in the file) to help with version specific merges, where the copied
heap files reside in srcXX directories, where XX is the version.
- Rename database key rotation functions to make room for the global space ones.
- Now, during the first start, we would create a default temporary key provider for the global space. A user can (and should) create their own key provider afterwards. This allows use the same codepath and internal interfaces for the keyring management across databases and the global space.
- Now need to cache the principal key for the global space as we use it only at the server start to decrypt internal key. Then internal key persists in the memory cache.
Fixes https://perconadev.atlassian.net/browse/PG-835, https://perconadev.atlassian.net/browse/PG-833
* Check and create an internal key for XLog during the server start.
If the key is created (not the first start with the EncryptWAL), then
upload it into the cache. We can't read the key from files while
writing the XLog to the disk as it happens in the critical section and
no palloc is allowed.
* Create a custom cache for the global catalog external key as we can't
use PG's hashmap during the (again, no pallocs in critical section).
* TDE TupleTableSlot for storing decrypted tuple along with the buffer tuple
Tuple data in the shared buffer is encrypted. To store the tuple in the
tupleTableslot, the tuple data is decrypted into allocated memory. This memory
needs to be properly cleaned up. However, with the existing
BufferHeapTupleTableSlot, there is no way to free this memory until the end of
the current query executor cycle.
To address this, the commit introduces TDEBufferHeapTupleTableSlot, a clone of
BufferHeapTupleTableSlot that keeps a reference to the allocated decrypted tuple
and frees it when the tuple slot is cleared. Most of the code is borrowed from
the BufferHeapTupleTableSlot implementation, ensuring that
TDEBufferHeapTupleTableSlot can be cast to BufferHeapTupleTableSlot
Apart from the above, a workaround to clear the decrypted tuple pointer
is added to the TDEBufferHeapTupleTableSlot for cases when the
slot is reused while the previously decrypted tuple was cleared out by
MemoryContext deletion, instead of through the slot cleanup callback.
This commit implements ddl-start and ddl-end event triggers to identify index
creation operations on encrypted tables. Upon creating an index on an encrypted
table, the trigger function updates the global state, which can be accessed by
the storage manager (mgr) to decide if smgr_create needs to do encryption or not.
The start-ddl function analyzes the CREATE TABLE and CREATE INDEX statements
and identifies if the table uses the pg_tde access method. When the table is
created or the one on which the index is being created utilizes the
pg_tde access method, the start-ddl trigger function populates relevant
information about the encrypted table into a global structure.
This structure can be accessed using the GetCurrentTdeCreateEvent() function.
After the execution of the current DDL command finishes, the end-ddl
function clears out this structure.
* Fix issue-153: Server crash and database corruption
We can't use the Tuple CID as an IV because it changes when the tuple is deleted.
If we have a trigger function that needs the deleted tuple, it will get the
wrong IV when decrypting. This happens because the CID used to encrypt the tuple
(during INSERT/UPDATE) is different from the CID passed to the decryption
function (during delete).
To fix this, we need to stop using the CID for IV calculation.
* Update test case to produce same result on all environment
* Add pg_tde_version function to get the module version of the extension
To make the version number configurable, the configure script outputs the
PACKAGE VERSION to the config.h file that gets returned by the SQL interface.
To bump the version, you need to update the version number passed to the
AC_INIT macro in configure.ac and pg_tde_version define in meson.build file
* Implementing remote and file-based parameter support for keyrings
With this change, instead of storing possibly sensitive information in the
catalog, TDE keyring parameters can be configured using external storage.
Currently, this means either a http(s) request or an additional file
accessible by the server process.
To configure a keyring with external parameter values, use a json_object
as the argument instead of the string parameter. such as:
SELECT pg_tde_add_key_provider_file(
'file-provider',
json_object(
'type' VALUE 'remote',
'url' VALUE 'http://localhost:8888/provider-location'));
* Fix internal key XLogging for keyring changes
There is no functional relation cache during XLog replay therefore
`GetMasterKey()` would fail looking for the keyring namespace. Actually,
it would fail earlier on `get_master_key_info_path()` as XLog replay
has its own `MyDatabaseId` and `MyDatabaseTableSpace`. The latter could
be fixed by providing `RelFileLocator` to `GetMasterKey()` but the former
can't be fixed easily (if it could be fixed at all) as XLog replay deals
with the `Buffer` and shouldn't have access to the relations. To solve this,
we should pass the master key name in the XLog so there is no need to call
`GetMasterKey()` during XLog redo.
It also doesn't make sense to add an internal key to the cache during XLog
replay as this cache is per-process and it won't be available to any user
connections.
* Remove master_key info file on cleanup
Now tde_master_key.info is truncated which leads to the next issue:
If the file exists, `CREATE EXTENSION pg_tde;` truncates it. Then `pg_tde_set_master_key()` will create a new master key info but fail to write it to the file. Although pre-check treats an empty file as OK, the actual write opens it with `O_EXCL` (fail if file exists). Everything would work fine as the master key info is already in a cache. Until Postgres is restarted and the cache gets wiped.
This commit removes tde_master_key.info on cleanup.
* Cleanup master key info on standby
* Disallow deletion of keyring used by Master key.
The commit adds the before-delete trigger on the keyring catalog to ensure that
the key provider used by the master key should not be deleted.
* Introducing catalog table for managing key providers
This commit introduces a user catalog table, percona_tde.pg_tde_key_provider,
within the percona_tde schema, as part of the pg_tde extension. The purpose of
this table is to store essential provider information. The catalog accommodates
various key providers, present and future, utilizing a JSON type
options field to capture provider-specific details.
To facilitate the creation of key providers,
the commit introduces new SQL interfaces:
- pg_tde_add_key_provider(provider_type VARCHAR(10),
provider_name VARCHAR(128), options JSON)
- pg_tde_add_key_provider_file(provider_name VARCHAR(128),
file_path TEXT)
- pg_tde_add_key_provider_vault_v2(provider_name VARCHAR(128),
vault_token TEXT, vault_url TEXT,
vault_mount_path TEXT, vault_ca_path TEXT)
Additionally, the commit implements the C interface for catalog
interaction, detailed in the 'tde_keyring.h' file.
These changes lay the foundation for implementing multi-tenancy in pg_tde by
eliminating the necessity of a 'keyring.json' file for configuring a
cluster-wide key provider. With this enhancement, each database can have its
dedicated key provider, added via SQL interface, removing the need
for DBA intervention in TDE setup."
* Establishing a Framework for Master Key and Shared Cache Management
Up until now, pg_tde relied on a hard-coded master key name, primarily for
proof-of-concept purposes. This commit introduces a more robust infrastructure
for configuring the master key and managing a dynamic shared memory-based
master-key cache to enhance accessibility.
For user interaction, a new SQL interface is provided:
- pg_tde_set_master_key(master_key_name VARCHAR(255), provider_name VARCHAR(255));
This interface enables users to set a master key for a specific database and make
further enhancements toward implementing the multi-tenancy.
In addition to the public SQL interface, the commit optimizes the internal
master-key API. It introduces straightforward Get and Set functions,
handling locking, retrieval, caching, and seamlessly assigning a master key for
a database.
The commit also introduces a unified internal interface for requesting and
utilizing shared memory, contributing to a more cohesive and efficient
master key and cache management system.
* Revamping the Keyring API Interface and Integrating Master Key
This commit unifies the master-key and key-provider modules with the core of
pg_tde, marking a significant evolution in the architecture.
As part of this integration, the keyring API undergoes substantial changes
to enhance flexibility and remove unnecessary components such as the key cache.
As a result of the keyring refactoring, the file keyring is also rewritten,
offering a template for implementing additional key providers for the extension.
The modifications make the keyring API more pluggable, streamlining
interactions and paving the way for future enhancements.
* An Interface for Informing the Shared Memory Manager about Lock Requirements
This commit addresses PostgreSQL core's requirement for upfront information
regarding the number of locks the extension needs. Given the connection
between locks and the shared memory interface, a new callback routine
is introduced. This routine allows modules to specify
the number of locks they require.
In addition to this functionality, the commit includes code cleanups
and adjustments to nomenclature for improved clarity and consistency.
* Adjusting test cases
* Extension Initialization and Cleanup Mechanism
This commit enhances the extension by adding a new mechanism to facilitate
cleanup or setup procedures when the extension is installed in a database.
The core addition is a function "pg_tde_extension_initialize" invoked upon
executing the database's 'CREATE EXTENSION' command.
The commit introduces a callback registration mechanism to streamline
future development and ensure extensibility. This enables any module
to specify a callback function (registered using on_ext_install() ) to be
invoked during extension creation.
As of this commit, the callback functionality is explicitly utilized by the
master key module to handle the cleanup of the master key information file.
This file might persist in the database directory if the extension had been
previously deleted in the same database.
This enhancement paves the way for a more modular and maintainable extension
architecture, allowing individual modules to manage their specific
setup and cleanup tasks seamlessly."
* Adjusting Vault-V2 key provider to use new keyring architecture
* The patch implements on disk "key map and data" structure. It replaces
the old "tde" fork architecture.
This new architecture implements a two file pair with:
(1) Map File
(2) Key Data File
Both files contain a header that contains the name of the master key that
was to encrypt the data keys and a file version. The file version is set
to PG_TDE_FILEMAGIC at the moment and it can be used to differiate between
different file format versions in case we change the structure later on.
The map file is a list of relNumber, flags and key index.
- relNumber is the Oid of the associated relation.
- Flags define if the map entry is free or in use.
- Key index points to the starting position of the key in the key data file.
The flags play a pivotal role in avoiding the file to grow infinitely. When
a relation is either deleted or a transaction is aborted, the entry map entry
is marked as MAP_ENTRY_FREE. Any next transaction requiring to store its
relation key will pick the first entry with flag set to MAP_ENTRY_FREE.
The key data file is simply a list of keys. No flags are needed as the validity
is identified by the map file. Writing to the file is performed using FileWrite
function. This avoids any locking in the key data file.
Pending:
- Implementation of key rotation
- Locking of file during key rotation or map entry
- Review of fflush calls
- Review of the WAL
* Refactoring based on the Zsolt's comments on the PR.
Moving the key encryption/decryption functions to the enc_tuple file and
renaming the files according to the functionality.
* Adding the XLOG handling for internal key during relation creation
and when redo-ing the log.
Also, updated the handling of master key to accomodate versioning.
* Updating the comment as it is no longer valid.
* Updated:
- getMasterKey function argument types to bool from int
- Before xlog redo, the decrypted key is added to the key cache.
* Using ctid as IV base instead of offset calculation
This commit modifies the ID calculation of normal tuples to just
use the alrady exisitng ItemPointer to offset the IV instead of
the actual offset addresses, as the ItemPointer doesn't change during
moves and also easier to use for replication.
As part of this, the structure of the IV is also changed: instead of
using the offset as the base number, and incrementing it sequentially,
we now insert the "base" ItemPointer at the high part of the IV, and
start the counter at the other end, at the low part of the IV.
This means that we are no longer using AES-CTR, but instead rely on a
custom AES based encryption, but this is also required for toast, as
with that, we can't rely on the uniqueness of the address in the entire
data range.
Old encryption tests are also deleted, as they no longer work with
these changes.
* Fixing review comments
* Added comment about iv calculation
This commit implements support for storing keys on a vault server
instead of locally. The current implementation only supports the KV
v2 engine, which is the default secrets engine in recent vault
versions.
To use vault for key storage, the following settings have to be
used in the keyring configuration file:
* `provider` set to `vault-v2`
* `url` set to the URL of the vault server
* `mountPath` is set to the mount point where the keyring should
store the keys
* `token` is an access token with read and write access to the
above mount point
* [optional] `caPath` is the path of the CA file used for SSL
verification
Multiple servers can use the same vault server, with the following
restrictions:
* Servers in the same replication group should use the same
'pg_tde.keyringKeyPrefix` to ensure that they see the same keys
* Unrelated servers should use different `pg_tde.keyringKeyPrefix`
values to ensure that they use different keys without conflicts
The source also contains a sample keyring configuration file,
`keyring-vault.json`. This configuration matches the settings of
the vault development server (`vault server -dev`), only the
ROOT_TOKEN has to be replaced to the token of the actual server
process.
If TOAST data gets compressed, it has an extended header containing
compression info. We used to encrypt this header along with the actual
data which in turn caused a crash as PG needs this data in later
stages. So it should be taken into account while encrypting data during
externalisation.
Then, during detoasting, we should not decrypt this compression header
as it is being extracted with the data with the first TOAST chunk. So,
copy the first N bytes (now it is 4 bytes) of the first chunk as it is
and decrypt the rest of the data.
Fixes https://github.com/Percona-Lab/postgres-tde-ext/issues/63
Encryption/decryption of the same data should be exact as long as the
offset is the same. But as we encode in 16-byte blocks, the size of
`encKey` always is a multiple of 16. We start from `aes_block_no`-th
index of the encKey[] so N-th will be crypted with the same encKey byte
despite what start_offset `pg_tde_crypt()` was called with.
For example `start_offset = 10; MAX_AES_ENC_BATCH_KEY_SIZE = 6`:
```
data: [10 11 12 13 14 15 16]
encKeys: [...][0 1 2 3 4 5][0 1 2 3 4 5]
```
so the 10th data byte is encoded with the 4th byte of the 2nd encKey
etc. We need this shift so each byte will be coded the same despite the
initial offset.
Let's see the same data but sent to the func starting from the offset 0:
```
data: [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16]
encKeys: [0 1 2 3 4 5][0 1 2 3 4 5][ 0 1 2 3 4 5]
```
again, the 10th data byte is encoded with the 4th byte of the 2nd
`encKey` etc.
The issue in `pg_tde_crypt()` was that along with shifting encKeys in
the first batch, we started skipping `aes_block_no` bytes in `data`
with every batch after the first one. That lead to:
1. Some bytes remained unencrypted.
2. If data was encrypted and decrypted in different batches these
skipped bytes would differ hence decrypted data would be wrong. TOASTed
data, for example, is encrypted as one chunk but decrypted in
`TOAST_MAX_CHUNK_SIZE` chunks.
The issue with pg_tde_move_encrypted_data() was that encryption and
decryption were happening in the same loop but `encKeys` may have had
different shifting as in and out data may have different start_offset.
It wasn't an issue if the data was less than `DATA_BYTES_PER_AES_BATCH`.
However, it resulted in data corruption while moving the larger tuples.
Plus, it makes sense to reuse `pg_tde_crypt()` instead of copying its
tricky code.
Also, encKey had a maximum size of 1632 bytes but only 1600 bytes
maximum could have been used. We don't need that extra buffer 32 bytes
buffer anymore.
The same with the `dataLen` in `Aes128EncryptedZeroBlocks2()` - I don't
see why do we need that extra block.
Fixes https://github.com/Percona-Lab/postgres-tde-ext/issues/72
Issue: the code cleanup introduced in PR #52 modified the original
tuple in the update method instead of decrypting the tuple data
into a copy. This caused data curruption crashes in some test.
Fix: reintroduce the missing palloc to the update method.
Fixes#61Fixes#62Fixes#64
1. Inserts and Updates now are encrypted in WAL.
We encrypt new tuples directly in Buffer after they were insrerted there. To
pass it to XLog we could memcpy Buffer data into into the tuple. But later
tuple has to be unencrypted for index instertions etc. So we pass directly
data from the Buffer into XLog.
2. Log into WAL and replicate *.tde forks creation.
3. Added docker-compose for the streaming replication test setup.
(not perfect - needs two `up -d` in a row to start the secondary)
4. Added tests for multi inserts. Need tests for replications though.
* Few enhancements and code cleanup around tuple encryption/decryption
Commit contains the following noteworthy changes
-- Getting rid of VLAs from the code base
-- Add an interface to move the encrypted data from one location to another.
-- Make the encryption and decryption happen in batches to eliminate the
requirement of dynamic allocation in crypt functions.