Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
11 years ago
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
*
|
|
|
|
|
* expandeddatum.h
|
|
|
|
|
* Declarations for access to "expanded" value representations.
|
|
|
|
|
*
|
|
|
|
|
* Complex data types, particularly container types such as arrays and
|
|
|
|
|
* records, usually have on-disk representations that are compact but not
|
|
|
|
|
* especially convenient to modify. What's more, when we do modify them,
|
|
|
|
|
* having to recopy all the rest of the value can be extremely inefficient.
|
|
|
|
|
* Therefore, we provide a notion of an "expanded" representation that is used
|
|
|
|
|
* only in memory and is optimized more for computation than storage.
|
|
|
|
|
* The format appearing on disk is called the data type's "flattened"
|
|
|
|
|
* representation, since it is required to be a contiguous blob of bytes --
|
|
|
|
|
* but the type can have an expanded representation that is not. Data types
|
|
|
|
|
* must provide means to translate an expanded representation back to
|
|
|
|
|
* flattened form.
|
|
|
|
|
*
|
|
|
|
|
* An expanded object is meant to survive across multiple operations, but
|
|
|
|
|
* not to be enormously long-lived; for example it might be a local variable
|
|
|
|
|
* in a PL/pgSQL procedure. So its extra bulk compared to the on-disk format
|
|
|
|
|
* is a worthwhile trade-off.
|
|
|
|
|
*
|
|
|
|
|
* References to expanded objects are a type of TOAST pointer.
|
|
|
|
|
* Because of longstanding conventions in Postgres, this means that the
|
|
|
|
|
* flattened form of such an object must always be a varlena object.
|
|
|
|
|
* Fortunately that's no restriction in practice.
|
|
|
|
|
*
|
|
|
|
|
* There are actually two kinds of TOAST pointers for expanded objects:
|
|
|
|
|
* read-only and read-write pointers. Possession of one of the latter
|
|
|
|
|
* authorizes a function to modify the value in-place rather than copying it
|
|
|
|
|
* as would normally be required. Functions should always return a read-write
|
|
|
|
|
* pointer to any new expanded object they create. Functions that modify an
|
|
|
|
|
* argument value in-place must take care that they do not corrupt the old
|
|
|
|
|
* value if they fail partway through.
|
|
|
|
|
*
|
|
|
|
|
*
|
|
|
|
|
* Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
|
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
11 years ago
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
|
*
|
|
|
|
|
* src/include/utils/expandeddatum.h
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
#ifndef EXPANDEDDATUM_H
|
|
|
|
|
#define EXPANDEDDATUM_H
|
|
|
|
|
|
|
|
|
|
/* Size of an EXTERNAL datum that contains a pointer to an expanded object */
|
|
|
|
|
#define EXPANDED_POINTER_SIZE (VARHDRSZ_EXTERNAL + sizeof(varatt_expanded))
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* "Methods" that must be provided for any expanded object.
|
|
|
|
|
*
|
|
|
|
|
* get_flat_size: compute space needed for flattened representation (total,
|
|
|
|
|
* including header).
|
|
|
|
|
*
|
|
|
|
|
* flatten_into: construct flattened representation in the caller-allocated
|
|
|
|
|
* space at *result, of size allocated_size (which will always be the result
|
|
|
|
|
* of a preceding get_flat_size call; it's passed for cross-checking).
|
|
|
|
|
*
|
|
|
|
|
* The flattened representation must be a valid in-line, non-compressed,
|
|
|
|
|
* 4-byte-header varlena object.
|
|
|
|
|
*
|
|
|
|
|
* Note: construction of a heap tuple from an expanded datum calls
|
|
|
|
|
* get_flat_size twice, so it's worthwhile to make sure that that doesn't
|
|
|
|
|
* incur too much overhead.
|
|
|
|
|
*/
|
|
|
|
|
typedef Size (*EOM_get_flat_size_method) (ExpandedObjectHeader *eohptr);
|
|
|
|
|
typedef void (*EOM_flatten_into_method) (ExpandedObjectHeader *eohptr,
|
|
|
|
|
void *result, Size allocated_size);
|
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
11 years ago
|
|
|
|
|
|
|
|
/* Struct of function pointers for an expanded object's methods */
|
|
|
|
|
typedef struct ExpandedObjectMethods
|
|
|
|
|
{
|
|
|
|
|
EOM_get_flat_size_method get_flat_size;
|
|
|
|
|
EOM_flatten_into_method flatten_into;
|
|
|
|
|
} ExpandedObjectMethods;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Every expanded object must contain this header; typically the header
|
|
|
|
|
* is embedded in some larger struct that adds type-specific fields.
|
|
|
|
|
*
|
|
|
|
|
* It is presumed that the header object and all subsidiary data are stored
|
|
|
|
|
* in eoh_context, so that the object can be freed by deleting that context,
|
|
|
|
|
* or its storage lifespan can be altered by reparenting the context.
|
|
|
|
|
* (In principle the object could own additional resources, such as malloc'd
|
|
|
|
|
* storage, and use a memory context reset callback to free them upon reset or
|
|
|
|
|
* deletion of eoh_context.)
|
|
|
|
|
*
|
|
|
|
|
* We set up two TOAST pointers within the standard header, one read-write
|
|
|
|
|
* and one read-only. This allows functions to return either kind of pointer
|
|
|
|
|
* without making an additional allocation, and in particular without worrying
|
|
|
|
|
* whether a separately palloc'd object would have sufficient lifespan.
|
|
|
|
|
* But note that these pointers are just a convenience; a pointer object
|
|
|
|
|
* appearing somewhere else would still be legal.
|
|
|
|
|
*
|
|
|
|
|
* The typedef declaration for this appears in postgres.h.
|
|
|
|
|
*/
|
|
|
|
|
struct ExpandedObjectHeader
|
|
|
|
|
{
|
|
|
|
|
/* Phony varlena header */
|
|
|
|
|
int32 vl_len_; /* always EOH_HEADER_MAGIC, see below */
|
|
|
|
|
|
|
|
|
|
/* Pointer to methods required for object type */
|
|
|
|
|
const ExpandedObjectMethods *eoh_methods;
|
|
|
|
|
|
|
|
|
|
/* Memory context containing this header and subsidiary data */
|
|
|
|
|
MemoryContext eoh_context;
|
|
|
|
|
|
|
|
|
|
/* Standard R/W TOAST pointer for this object is kept here */
|
|
|
|
|
char eoh_rw_ptr[EXPANDED_POINTER_SIZE];
|
|
|
|
|
|
|
|
|
|
/* Standard R/O TOAST pointer for this object is kept here */
|
|
|
|
|
char eoh_ro_ptr[EXPANDED_POINTER_SIZE];
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Particularly for read-only functions, it is handy to be able to work with
|
|
|
|
|
* either regular "flat" varlena inputs or expanded inputs of the same data
|
|
|
|
|
* type. To allow determining which case an argument-fetching function has
|
|
|
|
|
* returned, the first int32 of an ExpandedObjectHeader always contains -1
|
|
|
|
|
* (EOH_HEADER_MAGIC to the code). This works since no 4-byte-header varlena
|
|
|
|
|
* could have that as its first 4 bytes. Caution: we could not reliably tell
|
|
|
|
|
* the difference between an ExpandedObjectHeader and a short-header object
|
|
|
|
|
* with this trick. However, it works fine if the argument fetching code
|
|
|
|
|
* always returns either a 4-byte-header flat object or an expanded object.
|
|
|
|
|
*/
|
|
|
|
|
#define EOH_HEADER_MAGIC (-1)
|
|
|
|
|
#define VARATT_IS_EXPANDED_HEADER(PTR) \
|
|
|
|
|
(((varattrib_4b *) (PTR))->va_4byte.va_header == (uint32) EOH_HEADER_MAGIC)
|
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
11 years ago
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Generic support functions for expanded objects.
|
|
|
|
|
* (More of these might be worth inlining later.)
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
static inline Datum
|
|
|
|
|
EOHPGetRWDatum(const struct ExpandedObjectHeader *eohptr)
|
|
|
|
|
{
|
|
|
|
|
return PointerGetDatum(eohptr->eoh_rw_ptr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static inline Datum
|
|
|
|
|
EOHPGetRODatum(const struct ExpandedObjectHeader *eohptr)
|
|
|
|
|
{
|
|
|
|
|
return PointerGetDatum(eohptr->eoh_ro_ptr);
|
|
|
|
|
}
|
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
11 years ago
|
|
|
|
|
|
|
|
/* Does the Datum represent a writable expanded object? */
|
|
|
|
|
#define DatumIsReadWriteExpandedObject(d, isnull, typlen) \
|
|
|
|
|
(((isnull) || (typlen) != -1) ? false : \
|
|
|
|
|
VARATT_IS_EXTERNAL_EXPANDED_RW(DatumGetPointer(d)))
|
|
|
|
|
|
|
|
|
|
#define MakeExpandedObjectReadOnly(d, isnull, typlen) \
|
|
|
|
|
(((isnull) || (typlen) != -1) ? (d) : \
|
|
|
|
|
MakeExpandedObjectReadOnlyInternal(d))
|
|
|
|
|
|
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
11 years ago
|
|
|
extern ExpandedObjectHeader *DatumGetEOHP(Datum d);
|
|
|
|
|
extern void EOH_init_header(ExpandedObjectHeader *eohptr,
|
|
|
|
|
const ExpandedObjectMethods *methods,
|
|
|
|
|
MemoryContext obj_context);
|
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
11 years ago
|
|
|
extern Size EOH_get_flat_size(ExpandedObjectHeader *eohptr);
|
|
|
|
|
extern void EOH_flatten_into(ExpandedObjectHeader *eohptr,
|
|
|
|
|
void *result, Size allocated_size);
|
|
|
|
|
extern Datum MakeExpandedObjectReadOnlyInternal(Datum d);
|
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation. Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype. There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.
The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.
In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables. We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work. (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)
Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad. In any case most applications should see a net win.
Tom Lane, reviewed by Andres Freund
11 years ago
|
|
|
extern Datum TransferExpandedObject(Datum d, MemoryContext new_parent);
|
|
|
|
|
extern void DeleteExpandedObject(Datum d);
|
|
|
|
|
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
9 years ago
|
|
|
#endif /* EXPANDEDDATUM_H */
|