Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
|
*
|
|
|
|
|
* tqueue.c
|
|
|
|
|
* Use shm_mq to send & receive tuples between parallel backends
|
|
|
|
|
*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Most of the complexity in this module arises from transient RECORD types,
|
|
|
|
|
* which all have type RECORDOID and are distinguished by typmod numbers
|
|
|
|
|
* that are managed per-backend (see src/backend/utils/cache/typcache.c).
|
|
|
|
|
* The sender's set of RECORD typmod assignments probably doesn't match the
|
|
|
|
|
* receiver's. To deal with this, we make the sender send a description
|
|
|
|
|
* of each transient RECORD type appearing in the data it sends. The
|
|
|
|
|
* receiver finds or creates a matching type in its own typcache, and then
|
|
|
|
|
* maps the sender's typmod for that type to its own typmod.
|
|
|
|
|
*
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
* A DestReceiver of type DestTupleQueue, which is a TQueueDestReceiver
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* under the hood, writes tuples from the executor to a shm_mq. If
|
|
|
|
|
* necessary, it also writes control messages describing transient
|
|
|
|
|
* record types used within the tuple.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* A TupleQueueReader reads tuples, and control messages if any are sent,
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* from a shm_mq and returns the tuples. If transient record types are
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* in use, it registers those types locally based on the control messages
|
|
|
|
|
* and rewrites the typmods sent by the remote side to the corresponding
|
|
|
|
|
* local record typmods.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
*
|
|
|
|
|
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
|
*
|
|
|
|
|
* IDENTIFICATION
|
|
|
|
|
* src/backend/executor/tqueue.c
|
|
|
|
|
*
|
|
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
#include "postgres.h"
|
|
|
|
|
|
|
|
|
|
#include "access/htup_details.h"
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
#include "catalog/pg_type.h"
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
#include "executor/tqueue.h"
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
#include "funcapi.h"
|
|
|
|
|
#include "lib/stringinfo.h"
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
#include "miscadmin.h"
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
#include "utils/array.h"
|
|
|
|
|
#include "utils/lsyscache.h"
|
|
|
|
|
#include "utils/memutils.h"
|
|
|
|
|
#include "utils/rangetypes.h"
|
|
|
|
|
#include "utils/syscache.h"
|
|
|
|
|
#include "utils/typcache.h"
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* The data transferred through the shm_mq is divided into messages.
|
|
|
|
|
* One-byte messages are mode-switch messages, telling the receiver to switch
|
|
|
|
|
* between "control" and "data" modes. (We always start up in "data" mode.)
|
|
|
|
|
* Otherwise, when in "data" mode, each message is a tuple. When in "control"
|
|
|
|
|
* mode, each message defines one transient-typmod-to-tupledesc mapping to
|
|
|
|
|
* let us interpret future tuples. Both of those cases certainly require
|
|
|
|
|
* more than one byte, so no confusion is possible.
|
|
|
|
|
*/
|
|
|
|
|
#define TUPLE_QUEUE_MODE_CONTROL 'c' /* mode-switch message contents */
|
|
|
|
|
#define TUPLE_QUEUE_MODE_DATA 'd'
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Both the sender and receiver build trees of TupleRemapInfo nodes to help
|
|
|
|
|
* them identify which (sub) fields of transmitted tuples are composite and
|
|
|
|
|
* may thus need remap processing. We might need to look within arrays and
|
|
|
|
|
* ranges, not only composites, to find composite sub-fields. A NULL
|
|
|
|
|
* TupleRemapInfo pointer indicates that it is known that the described field
|
|
|
|
|
* is not composite and has no composite substructure.
|
|
|
|
|
*
|
|
|
|
|
* Note that we currently have to look at each composite field at runtime,
|
|
|
|
|
* even if we believe it's of a named composite type (i.e., not RECORD).
|
|
|
|
|
* This is because we allow the actual value to be a compatible transient
|
|
|
|
|
* RECORD type. That's grossly inefficient, and it would be good to get
|
|
|
|
|
* rid of the requirement, but it's not clear what would need to change.
|
|
|
|
|
*
|
|
|
|
|
* Also, we allow the top-level tuple structure, as well as the actual
|
|
|
|
|
* structure of composite subfields, to change from one tuple to the next
|
|
|
|
|
* at runtime. This may well be entirely historical, but it's mostly free
|
|
|
|
|
* to support given the previous requirement; and other places in the system
|
|
|
|
|
* also permit this, so it's not entirely clear if we could drop it.
|
|
|
|
|
*/
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
typedef enum
|
|
|
|
|
{
|
|
|
|
|
TQUEUE_REMAP_ARRAY, /* array */
|
|
|
|
|
TQUEUE_REMAP_RANGE, /* range */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQUEUE_REMAP_RECORD /* composite type, named or transient */
|
|
|
|
|
} TupleRemapClass;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
typedef struct TupleRemapInfo TupleRemapInfo;
|
|
|
|
|
|
|
|
|
|
typedef struct ArrayRemapInfo
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
int16 typlen; /* array element type's storage properties */
|
|
|
|
|
bool typbyval;
|
|
|
|
|
char typalign;
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
9 years ago
|
|
|
TupleRemapInfo *element_remap; /* array element type's remap info */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
} ArrayRemapInfo;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
typedef struct RangeRemapInfo
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TypeCacheEntry *typcache; /* range type's typcache entry */
|
|
|
|
|
TupleRemapInfo *bound_remap; /* range bound type's remap info */
|
|
|
|
|
} RangeRemapInfo;
|
|
|
|
|
|
|
|
|
|
typedef struct RecordRemapInfo
|
|
|
|
|
{
|
|
|
|
|
/* Original (remote) type ID info last seen for this composite field */
|
|
|
|
|
Oid rectypid;
|
|
|
|
|
int32 rectypmod;
|
|
|
|
|
/* Local RECORD typmod, or -1 if unset; not used on sender side */
|
|
|
|
|
int32 localtypmod;
|
|
|
|
|
/* If no fields of the record require remapping, these are NULL: */
|
|
|
|
|
TupleDesc tupledesc; /* copy of record's tupdesc */
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
9 years ago
|
|
|
TupleRemapInfo **field_remap; /* each field's remap info */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
} RecordRemapInfo;
|
|
|
|
|
|
|
|
|
|
struct TupleRemapInfo
|
|
|
|
|
{
|
|
|
|
|
TupleRemapClass remapclass;
|
|
|
|
|
union
|
|
|
|
|
{
|
|
|
|
|
ArrayRemapInfo arr;
|
|
|
|
|
RangeRemapInfo rng;
|
|
|
|
|
RecordRemapInfo rec;
|
|
|
|
|
} u;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* DestReceiver object's private contents
|
|
|
|
|
*
|
|
|
|
|
* queue and tupledesc are pointers to data supplied by DestReceiver's caller.
|
|
|
|
|
* The recordhtab and remap info are owned by the DestReceiver and are kept
|
|
|
|
|
* in mycontext. tmpcontext is a tuple-lifespan context to hold cruft
|
|
|
|
|
* created while traversing each tuple to find record subfields.
|
|
|
|
|
*/
|
|
|
|
|
typedef struct TQueueDestReceiver
|
|
|
|
|
{
|
|
|
|
|
DestReceiver pub; /* public fields */
|
|
|
|
|
shm_mq_handle *queue; /* shm_mq to send to */
|
|
|
|
|
MemoryContext mycontext; /* context containing TQueueDestReceiver */
|
|
|
|
|
MemoryContext tmpcontext; /* per-tuple context, if needed */
|
|
|
|
|
HTAB *recordhtab; /* table of transmitted typmods, if needed */
|
|
|
|
|
char mode; /* current message mode */
|
|
|
|
|
TupleDesc tupledesc; /* current top-level tuple descriptor */
|
|
|
|
|
TupleRemapInfo **field_remapinfo; /* current top-level remap info */
|
|
|
|
|
} TQueueDestReceiver;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/*
|
|
|
|
|
* Hash table entries for mapping remote to local typmods.
|
|
|
|
|
*/
|
|
|
|
|
typedef struct RecordTypmodMap
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
int32 remotetypmod; /* hash key (must be first!) */
|
|
|
|
|
int32 localtypmod;
|
|
|
|
|
} RecordTypmodMap;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/*
|
|
|
|
|
* TupleQueueReader object's private contents
|
|
|
|
|
*
|
|
|
|
|
* queue and tupledesc are pointers to data supplied by reader's caller.
|
|
|
|
|
* The typmodmap and remap info are owned by the TupleQueueReader and
|
|
|
|
|
* are kept in mycontext.
|
|
|
|
|
*
|
|
|
|
|
* "typedef struct TupleQueueReader TupleQueueReader" is in tqueue.h
|
|
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
struct TupleQueueReader
|
|
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
shm_mq_handle *queue; /* shm_mq to receive from */
|
|
|
|
|
MemoryContext mycontext; /* context containing TupleQueueReader */
|
|
|
|
|
HTAB *typmodmap; /* RecordTypmodMap hashtable, if needed */
|
|
|
|
|
char mode; /* current message mode */
|
|
|
|
|
TupleDesc tupledesc; /* current top-level tuple descriptor */
|
|
|
|
|
TupleRemapInfo **field_remapinfo; /* current top-level remap info */
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
};
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Local function prototypes */
|
|
|
|
|
static void TQExamine(TQueueDestReceiver *tqueue,
|
|
|
|
|
TupleRemapInfo *remapinfo,
|
|
|
|
|
Datum value);
|
|
|
|
|
static void TQExamineArray(TQueueDestReceiver *tqueue,
|
|
|
|
|
ArrayRemapInfo *remapinfo,
|
|
|
|
|
Datum value);
|
|
|
|
|
static void TQExamineRange(TQueueDestReceiver *tqueue,
|
|
|
|
|
RangeRemapInfo *remapinfo,
|
|
|
|
|
Datum value);
|
|
|
|
|
static void TQExamineRecord(TQueueDestReceiver *tqueue,
|
|
|
|
|
RecordRemapInfo *remapinfo,
|
|
|
|
|
Datum value);
|
|
|
|
|
static void TQSendRecordInfo(TQueueDestReceiver *tqueue, int32 typmod,
|
|
|
|
|
TupleDesc tupledesc);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
static void TupleQueueHandleControlMessage(TupleQueueReader *reader,
|
|
|
|
|
Size nbytes, char *data);
|
|
|
|
|
static HeapTuple TupleQueueHandleDataMessage(TupleQueueReader *reader,
|
|
|
|
|
Size nbytes, HeapTupleHeader data);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
static HeapTuple TQRemapTuple(TupleQueueReader *reader,
|
|
|
|
|
TupleDesc tupledesc,
|
|
|
|
|
TupleRemapInfo **field_remapinfo,
|
|
|
|
|
HeapTuple tuple);
|
|
|
|
|
static Datum TQRemap(TupleQueueReader *reader, TupleRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed);
|
|
|
|
|
static Datum TQRemapArray(TupleQueueReader *reader, ArrayRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed);
|
|
|
|
|
static Datum TQRemapRange(TupleQueueReader *reader, RangeRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed);
|
|
|
|
|
static Datum TQRemapRecord(TupleQueueReader *reader, RecordRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed);
|
|
|
|
|
static TupleRemapInfo *BuildTupleRemapInfo(Oid typid, MemoryContext mycontext);
|
|
|
|
|
static TupleRemapInfo *BuildArrayRemapInfo(Oid elemtypid,
|
|
|
|
|
MemoryContext mycontext);
|
|
|
|
|
static TupleRemapInfo *BuildRangeRemapInfo(Oid rngtypid,
|
|
|
|
|
MemoryContext mycontext);
|
|
|
|
|
static TupleRemapInfo **BuildFieldRemapInfo(TupleDesc tupledesc,
|
|
|
|
|
MemoryContext mycontext);
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Receive a tuple from a query, and send it to the designated shm_mq.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Returns TRUE if successful, FALSE if shm_mq has been detached.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
*/
|
|
|
|
|
static bool
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
tqueueReceiveSlot(TupleTableSlot *slot, DestReceiver *self)
|
|
|
|
|
{
|
|
|
|
|
TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
TupleDesc tupledesc = slot->tts_tupleDescriptor;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
HeapTuple tuple;
|
|
|
|
|
shm_mq_result result;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* If first time through, compute remapping info for the top-level fields.
|
|
|
|
|
* On later calls, if the tupledesc has changed, set up for the new
|
|
|
|
|
* tupledesc. (This is a strange test both because the executor really
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* shouldn't change the tupledesc, and also because it would be unsafe if
|
|
|
|
|
* the old tupledesc could be freed and a new one allocated at the same
|
|
|
|
|
* address. But since some very old code in printtup.c uses a similar
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* approach, we adopt it here as well.)
|
|
|
|
|
*
|
|
|
|
|
* Here and elsewhere in this module, when replacing remapping info we
|
|
|
|
|
* pfree the top-level object because that's easy, but we don't bother to
|
|
|
|
|
* recursively free any substructure. This would lead to query-lifespan
|
|
|
|
|
* memory leaks if the mapping info actually changed frequently, but since
|
|
|
|
|
* we don't expect that to happen, it doesn't seem worth expending code to
|
|
|
|
|
* prevent it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
if (tqueue->tupledesc != tupledesc)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (tqueue->field_remapinfo != NULL)
|
|
|
|
|
pfree(tqueue->field_remapinfo);
|
|
|
|
|
tqueue->field_remapinfo = BuildFieldRemapInfo(tupledesc,
|
|
|
|
|
tqueue->mycontext);
|
|
|
|
|
tqueue->tupledesc = tupledesc;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* When, because of the types being transmitted, no record typmod mapping
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* can be needed, we can skip a good deal of work.
|
|
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (tqueue->field_remapinfo != NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TupleRemapInfo **remapinfo = tqueue->field_remapinfo;
|
|
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
MemoryContext oldcontext = NULL;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Deform the tuple so we can examine fields, if not done already. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
slot_getallattrs(slot);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Iterate over each attribute and search it for transient typmods. */
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
/* Ignore nulls and types that don't need special handling. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (slot->tts_isnull[i] || remapinfo[i] == NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
continue;
|
|
|
|
|
|
|
|
|
|
/* Switch to temporary memory context to avoid leaking. */
|
|
|
|
|
if (oldcontext == NULL)
|
|
|
|
|
{
|
|
|
|
|
if (tqueue->tmpcontext == NULL)
|
|
|
|
|
tqueue->tmpcontext =
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
AllocSetContextCreate(tqueue->mycontext,
|
|
|
|
|
"tqueue sender temp context",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
10 years ago
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
oldcontext = MemoryContextSwitchTo(tqueue->tmpcontext);
|
|
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Examine the value. */
|
|
|
|
|
TQExamine(tqueue, remapinfo[i], slot->tts_values[i]);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* If we used the temp context, reset it and restore prior context. */
|
|
|
|
|
if (oldcontext != NULL)
|
|
|
|
|
{
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
MemoryContextReset(tqueue->tmpcontext);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* If we entered control mode, switch back to data mode. */
|
|
|
|
|
if (tqueue->mode != TUPLE_QUEUE_MODE_DATA)
|
|
|
|
|
{
|
|
|
|
|
tqueue->mode = TUPLE_QUEUE_MODE_DATA;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
shm_mq_send(tqueue->queue, sizeof(char), &tqueue->mode, false);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Send the tuple itself. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
tuple = ExecMaterializeSlot(slot);
|
|
|
|
|
result = shm_mq_send(tqueue->queue, tuple->t_len, tuple->t_data, false);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Check for failure. */
|
|
|
|
|
if (result == SHM_MQ_DETACHED)
|
|
|
|
|
return false;
|
|
|
|
|
else if (result != SHM_MQ_SUCCESS)
|
|
|
|
|
ereport(ERROR,
|
|
|
|
|
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
errmsg("could not send tuple to shared-memory queue")));
|
|
|
|
|
|
|
|
|
|
return true;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Examine the given datum and send any necessary control messages for
|
|
|
|
|
* transient record types contained in it.
|
|
|
|
|
*
|
|
|
|
|
* remapinfo is previously-computed remapping info about the datum's type.
|
|
|
|
|
*
|
|
|
|
|
* This function just dispatches based on the remap class.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamine(TQueueDestReceiver *tqueue, TupleRemapInfo *remapinfo, Datum value)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* This is recursive, so it could be driven to stack overflow. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
check_stack_depth();
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
switch (remapinfo->remapclass)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
case TQUEUE_REMAP_ARRAY:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamineArray(tqueue, &remapinfo->u.arr, value);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
break;
|
|
|
|
|
case TQUEUE_REMAP_RANGE:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamineRange(tqueue, &remapinfo->u.rng, value);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
break;
|
|
|
|
|
case TQUEUE_REMAP_RECORD:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamineRecord(tqueue, &remapinfo->u.rec, value);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Examine a record datum and send any necessary control messages for
|
|
|
|
|
* transient record types contained in it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamineRecord(TQueueDestReceiver *tqueue, RecordRemapInfo *remapinfo,
|
|
|
|
|
Datum value)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
HeapTupleHeader tup;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
Oid typid;
|
|
|
|
|
int32 typmod;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
TupleDesc tupledesc;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Extract type OID and typmod from tuple. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
tup = DatumGetHeapTupleHeader(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
typid = HeapTupleHeaderGetTypeId(tup);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
typmod = HeapTupleHeaderGetTypMod(tup);
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* If first time through, or if this isn't the same composite type as last
|
|
|
|
|
* time, consider sending a control message, and then look up the
|
|
|
|
|
* necessary information for examining the fields.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (typid != remapinfo->rectypid || typmod != remapinfo->rectypmod)
|
|
|
|
|
{
|
|
|
|
|
/* Free any old data. */
|
|
|
|
|
if (remapinfo->tupledesc != NULL)
|
|
|
|
|
FreeTupleDesc(remapinfo->tupledesc);
|
|
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (remapinfo->field_remap != NULL)
|
|
|
|
|
pfree(remapinfo->field_remap);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Look up tuple descriptor in typcache. */
|
|
|
|
|
tupledesc = lookup_rowtype_tupdesc(typid, typmod);
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If this is a transient record type, send the tupledesc in a control
|
|
|
|
|
* message. (TQSendRecordInfo is smart enough to do this only once
|
|
|
|
|
* per typmod.)
|
|
|
|
|
*/
|
|
|
|
|
if (typid == RECORDOID)
|
|
|
|
|
TQSendRecordInfo(tqueue, typmod, tupledesc);
|
|
|
|
|
|
|
|
|
|
/* Figure out whether fields need recursive processing. */
|
|
|
|
|
remapinfo->field_remap = BuildFieldRemapInfo(tupledesc,
|
|
|
|
|
tqueue->mycontext);
|
|
|
|
|
if (remapinfo->field_remap != NULL)
|
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* We need to inspect the record contents, so save a copy of the
|
|
|
|
|
* tupdesc. (We could possibly just reference the typcache's
|
|
|
|
|
* copy, but then it's problematic when to release the refcount.)
|
|
|
|
|
*/
|
|
|
|
|
MemoryContext oldcontext = MemoryContextSwitchTo(tqueue->mycontext);
|
|
|
|
|
|
|
|
|
|
remapinfo->tupledesc = CreateTupleDescCopy(tupledesc);
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
/* No fields of the record require remapping. */
|
|
|
|
|
remapinfo->tupledesc = NULL;
|
|
|
|
|
}
|
|
|
|
|
remapinfo->rectypid = typid;
|
|
|
|
|
remapinfo->rectypmod = typmod;
|
|
|
|
|
|
|
|
|
|
/* Release reference count acquired by lookup_rowtype_tupdesc. */
|
|
|
|
|
DecrTupleDescRefCount(tupledesc);
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* If field remapping is required, deform the tuple and examine each
|
|
|
|
|
* field.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (remapinfo->field_remap != NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
Datum *values;
|
|
|
|
|
bool *isnull;
|
|
|
|
|
HeapTupleData tdata;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* Deform the tuple so we can check each column within. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
tupledesc = remapinfo->tupledesc;
|
|
|
|
|
values = (Datum *) palloc(tupledesc->natts * sizeof(Datum));
|
|
|
|
|
isnull = (bool *) palloc(tupledesc->natts * sizeof(bool));
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
tdata.t_len = HeapTupleHeaderGetDatumLength(tup);
|
|
|
|
|
ItemPointerSetInvalid(&(tdata.t_self));
|
|
|
|
|
tdata.t_tableOid = InvalidOid;
|
|
|
|
|
tdata.t_data = tup;
|
|
|
|
|
heap_deform_tuple(&tdata, tupledesc, values, isnull);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Recursively check each interesting non-NULL attribute. */
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
|
|
|
|
{
|
|
|
|
|
if (!isnull[i] && remapinfo->field_remap[i])
|
|
|
|
|
TQExamine(tqueue, remapinfo->field_remap[i], values[i]);
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Need not clean up, since we're in a short-lived context. */
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Examine an array datum and send any necessary control messages for
|
|
|
|
|
* transient record types contained in it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamineArray(TQueueDestReceiver *tqueue, ArrayRemapInfo *remapinfo,
|
|
|
|
|
Datum value)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
ArrayType *arr = DatumGetArrayTypeP(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
Oid typid = ARR_ELEMTYPE(arr);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
Datum *elem_values;
|
|
|
|
|
bool *elem_nulls;
|
|
|
|
|
int num_elems;
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
/* Deconstruct the array. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
deconstruct_array(arr, typid, remapinfo->typlen,
|
|
|
|
|
remapinfo->typbyval, remapinfo->typalign,
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
&elem_values, &elem_nulls, &num_elems);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Examine each element. */
|
|
|
|
|
for (i = 0; i < num_elems; i++)
|
|
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
if (!elem_nulls[i])
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamine(tqueue, remapinfo->element_remap, elem_values[i]);
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Examine a range datum and send any necessary control messages for
|
|
|
|
|
* transient record types contained in it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamineRange(TQueueDestReceiver *tqueue, RangeRemapInfo *remapinfo,
|
|
|
|
|
Datum value)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
RangeType *range = DatumGetRangeType(value);
|
|
|
|
|
RangeBound lower;
|
|
|
|
|
RangeBound upper;
|
|
|
|
|
bool empty;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Extract the lower and upper bounds. */
|
|
|
|
|
range_deserialize(remapinfo->typcache, range, &lower, &upper, &empty);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* Nothing to do for an empty range. */
|
|
|
|
|
if (empty)
|
|
|
|
|
return;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Examine each bound, if present. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
if (!upper.infinite)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamine(tqueue, remapinfo->bound_remap, upper.val);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
if (!lower.infinite)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQExamine(tqueue, remapinfo->bound_remap, lower.val);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Send tuple descriptor information for a transient typmod, unless we've
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* already done so previously.
|
|
|
|
|
*/
|
|
|
|
|
static void
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQSendRecordInfo(TQueueDestReceiver *tqueue, int32 typmod, TupleDesc tupledesc)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
StringInfoData buf;
|
|
|
|
|
bool found;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* Initialize hash table if not done yet. */
|
|
|
|
|
if (tqueue->recordhtab == NULL)
|
|
|
|
|
{
|
|
|
|
|
HASHCTL ctl;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
MemSet(&ctl, 0, sizeof(ctl));
|
|
|
|
|
/* Hash table entries are just typmods */
|
|
|
|
|
ctl.keysize = sizeof(int32);
|
|
|
|
|
ctl.entrysize = sizeof(int32);
|
|
|
|
|
ctl.hcxt = tqueue->mycontext;
|
|
|
|
|
tqueue->recordhtab = hash_create("tqueue sender record type hashtable",
|
|
|
|
|
100, &ctl,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
9 years ago
|
|
|
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Have we already seen this record type? If not, must report it. */
|
|
|
|
|
hash_search(tqueue->recordhtab, &typmod, HASH_ENTER, &found);
|
|
|
|
|
if (found)
|
|
|
|
|
return;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
elog(DEBUG3, "sending tqueue control message for record typmod %d", typmod);
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/* If message queue is in data mode, switch to control mode. */
|
|
|
|
|
if (tqueue->mode != TUPLE_QUEUE_MODE_CONTROL)
|
|
|
|
|
{
|
|
|
|
|
tqueue->mode = TUPLE_QUEUE_MODE_CONTROL;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
shm_mq_send(tqueue->queue, sizeof(char), &tqueue->mode, false);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Assemble a control message. */
|
|
|
|
|
initStringInfo(&buf);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
appendBinaryStringInfo(&buf, (char *) &typmod, sizeof(int32));
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
appendBinaryStringInfo(&buf, (char *) &tupledesc->natts, sizeof(int));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
appendBinaryStringInfo(&buf, (char *) &tupledesc->tdhasoid, sizeof(bool));
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
|
|
|
|
{
|
|
|
|
|
appendBinaryStringInfo(&buf, (char *) TupleDescAttr(tupledesc, i),
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
sizeof(FormData_pg_attribute));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* Send control message. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
shm_mq_send(tqueue->queue, buf.len, buf.data, false);
|
|
|
|
|
|
|
|
|
|
/* We assume it's OK to leak buf because we're in a short-lived context. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
/*
|
|
|
|
|
* Prepare to receive tuples from executor.
|
|
|
|
|
*/
|
|
|
|
|
static void
|
|
|
|
|
tqueueStartupReceiver(DestReceiver *self, int operation, TupleDesc typeinfo)
|
|
|
|
|
{
|
|
|
|
|
/* do nothing */
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Clean up at end of an executor run
|
|
|
|
|
*/
|
|
|
|
|
static void
|
|
|
|
|
tqueueShutdownReceiver(DestReceiver *self)
|
|
|
|
|
{
|
|
|
|
|
TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
|
|
|
|
|
|
Clean up shm_mq cleanup.
The logic around shm_mq_detach was a few bricks shy of a load, because
(contrary to the comments for shm_mq_attach) all it did was update the
shared shm_mq state. That left us leaking a bit of process-local
memory, but much worse, the on_dsm_detach callback for shm_mq_detach
was still armed. That means that whenever we ultimately detach from
the DSM segment, we'd run shm_mq_detach again for already-detached,
possibly long-dead queues. This accidentally fails to fail today,
because we only ever re-use a shm_mq's memory for another shm_mq, and
multiple detach attempts on the last such shm_mq are fairly harmless.
But it's gonna bite us someday, so let's clean it up.
To do that, change shm_mq_detach's API so it takes a shm_mq_handle
not the underlying shm_mq. This makes the callers simpler in most
cases anyway. Also fix a few places in parallel.c that were just
pfree'ing the handle structs rather than doing proper cleanup.
Back-patch to v10 because of the risk that the revenant shm_mq_detach
callbacks would cause a live bug sometime. Since this is an API
change, it's too late to do it in 9.6. (We could make a variant
patch that preserves API, but I'm not excited enough to do that.)
Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
8 years ago
|
|
|
if (tqueue->queue != NULL)
|
|
|
|
|
shm_mq_detach(tqueue->queue);
|
|
|
|
|
tqueue->queue = NULL;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Destroy receiver when done with it
|
|
|
|
|
*/
|
|
|
|
|
static void
|
|
|
|
|
tqueueDestroyReceiver(DestReceiver *self)
|
|
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
TQueueDestReceiver *tqueue = (TQueueDestReceiver *) self;
|
|
|
|
|
|
Clean up shm_mq cleanup.
The logic around shm_mq_detach was a few bricks shy of a load, because
(contrary to the comments for shm_mq_attach) all it did was update the
shared shm_mq state. That left us leaking a bit of process-local
memory, but much worse, the on_dsm_detach callback for shm_mq_detach
was still armed. That means that whenever we ultimately detach from
the DSM segment, we'd run shm_mq_detach again for already-detached,
possibly long-dead queues. This accidentally fails to fail today,
because we only ever re-use a shm_mq's memory for another shm_mq, and
multiple detach attempts on the last such shm_mq are fairly harmless.
But it's gonna bite us someday, so let's clean it up.
To do that, change shm_mq_detach's API so it takes a shm_mq_handle
not the underlying shm_mq. This makes the callers simpler in most
cases anyway. Also fix a few places in parallel.c that were just
pfree'ing the handle structs rather than doing proper cleanup.
Back-patch to v10 because of the risk that the revenant shm_mq_detach
callbacks would cause a live bug sometime. Since this is an API
change, it's too late to do it in 9.6. (We could make a variant
patch that preserves API, but I'm not excited enough to do that.)
Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
8 years ago
|
|
|
/* We probably already detached from queue, but let's be sure */
|
|
|
|
|
if (tqueue->queue != NULL)
|
|
|
|
|
shm_mq_detach(tqueue->queue);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
if (tqueue->tmpcontext != NULL)
|
|
|
|
|
MemoryContextDelete(tqueue->tmpcontext);
|
|
|
|
|
if (tqueue->recordhtab != NULL)
|
|
|
|
|
hash_destroy(tqueue->recordhtab);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (tqueue->field_remapinfo != NULL)
|
|
|
|
|
pfree(tqueue->field_remapinfo);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
pfree(self);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Create a DestReceiver that writes tuples to a tuple queue.
|
|
|
|
|
*/
|
|
|
|
|
DestReceiver *
|
|
|
|
|
CreateTupleQueueDestReceiver(shm_mq_handle *handle)
|
|
|
|
|
{
|
|
|
|
|
TQueueDestReceiver *self;
|
|
|
|
|
|
|
|
|
|
self = (TQueueDestReceiver *) palloc0(sizeof(TQueueDestReceiver));
|
|
|
|
|
|
|
|
|
|
self->pub.receiveSlot = tqueueReceiveSlot;
|
|
|
|
|
self->pub.rStartup = tqueueStartupReceiver;
|
|
|
|
|
self->pub.rShutdown = tqueueShutdownReceiver;
|
|
|
|
|
self->pub.rDestroy = tqueueDestroyReceiver;
|
|
|
|
|
self->pub.mydest = DestTupleQueue;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
self->queue = handle;
|
|
|
|
|
self->mycontext = CurrentMemoryContext;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
self->tmpcontext = NULL;
|
|
|
|
|
self->recordhtab = NULL;
|
|
|
|
|
self->mode = TUPLE_QUEUE_MODE_DATA;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Top-level tupledesc is not known yet */
|
|
|
|
|
self->tupledesc = NULL;
|
|
|
|
|
self->field_remapinfo = NULL;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
|
|
|
|
return (DestReceiver *) self;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* Create a tuple queue reader.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
TupleQueueReader *
|
|
|
|
|
CreateTupleQueueReader(shm_mq_handle *handle, TupleDesc tupledesc)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
TupleQueueReader *reader = palloc0(sizeof(TupleQueueReader));
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
reader->queue = handle;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
reader->mycontext = CurrentMemoryContext;
|
|
|
|
|
reader->typmodmap = NULL;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
reader->mode = TUPLE_QUEUE_MODE_DATA;
|
|
|
|
|
reader->tupledesc = tupledesc;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
reader->field_remapinfo = BuildFieldRemapInfo(tupledesc, reader->mycontext);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
return reader;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* Destroy a tuple queue reader.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
*/
|
|
|
|
|
void
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
DestroyTupleQueueReader(TupleQueueReader *reader)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Clean up shm_mq cleanup.
The logic around shm_mq_detach was a few bricks shy of a load, because
(contrary to the comments for shm_mq_attach) all it did was update the
shared shm_mq state. That left us leaking a bit of process-local
memory, but much worse, the on_dsm_detach callback for shm_mq_detach
was still armed. That means that whenever we ultimately detach from
the DSM segment, we'd run shm_mq_detach again for already-detached,
possibly long-dead queues. This accidentally fails to fail today,
because we only ever re-use a shm_mq's memory for another shm_mq, and
multiple detach attempts on the last such shm_mq are fairly harmless.
But it's gonna bite us someday, so let's clean it up.
To do that, change shm_mq_detach's API so it takes a shm_mq_handle
not the underlying shm_mq. This makes the callers simpler in most
cases anyway. Also fix a few places in parallel.c that were just
pfree'ing the handle structs rather than doing proper cleanup.
Back-patch to v10 because of the risk that the revenant shm_mq_detach
callbacks would cause a live bug sometime. Since this is an API
change, it's too late to do it in 9.6. (We could make a variant
patch that preserves API, but I'm not excited enough to do that.)
Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
8 years ago
|
|
|
shm_mq_detach(reader->queue);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (reader->typmodmap != NULL)
|
|
|
|
|
hash_destroy(reader->typmodmap);
|
|
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (reader->field_remapinfo != NULL)
|
|
|
|
|
pfree(reader->field_remapinfo);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
pfree(reader);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Fetch a tuple from a tuple queue reader.
|
|
|
|
|
*
|
|
|
|
|
* The return value is NULL if there are no remaining tuples or if
|
|
|
|
|
* nowait = true and no tuple is ready to return. *done, if not NULL,
|
|
|
|
|
* is set to true when there are no remaining tuples and otherwise to false.
|
|
|
|
|
*
|
|
|
|
|
* The returned tuple, if any, is allocated in CurrentMemoryContext.
|
|
|
|
|
* That should be a short-lived (tuple-lifespan) context, because we are
|
|
|
|
|
* pretty cavalier about leaking memory in that context if we have to do
|
|
|
|
|
* tuple remapping.
|
|
|
|
|
*
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* Even when shm_mq_receive() returns SHM_MQ_WOULD_BLOCK, this can still
|
|
|
|
|
* accumulate bytes from a partially-read message, so it's useful to call
|
|
|
|
|
* this with nowait = true even if nothing is returned.
|
|
|
|
|
*/
|
|
|
|
|
HeapTuple
|
|
|
|
|
TupleQueueReaderNext(TupleQueueReader *reader, bool nowait, bool *done)
|
|
|
|
|
{
|
|
|
|
|
shm_mq_result result;
|
|
|
|
|
|
|
|
|
|
if (done != NULL)
|
|
|
|
|
*done = false;
|
|
|
|
|
|
|
|
|
|
for (;;)
|
|
|
|
|
{
|
|
|
|
|
Size nbytes;
|
|
|
|
|
void *data;
|
|
|
|
|
|
|
|
|
|
/* Attempt to read a message. */
|
|
|
|
|
result = shm_mq_receive(reader->queue, &nbytes, &data, nowait);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* If queue is detached, set *done and return NULL. */
|
|
|
|
|
if (result == SHM_MQ_DETACHED)
|
|
|
|
|
{
|
|
|
|
|
if (done != NULL)
|
|
|
|
|
*done = true;
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* In non-blocking mode, bail out if no message ready yet. */
|
|
|
|
|
if (result == SHM_MQ_WOULD_BLOCK)
|
|
|
|
|
return NULL;
|
|
|
|
|
Assert(result == SHM_MQ_SUCCESS);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* We got a message (see message spec at top of file). Process it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
if (nbytes == 1)
|
|
|
|
|
{
|
|
|
|
|
/* Mode switch message. */
|
|
|
|
|
reader->mode = ((char *) data)[0];
|
|
|
|
|
}
|
|
|
|
|
else if (reader->mode == TUPLE_QUEUE_MODE_DATA)
|
|
|
|
|
{
|
|
|
|
|
/* Tuple data. */
|
|
|
|
|
return TupleQueueHandleDataMessage(reader, nbytes, data);
|
|
|
|
|
}
|
|
|
|
|
else if (reader->mode == TUPLE_QUEUE_MODE_CONTROL)
|
|
|
|
|
{
|
|
|
|
|
/* Control message, describing a transient record type. */
|
|
|
|
|
TupleQueueHandleControlMessage(reader, nbytes, data);
|
|
|
|
|
}
|
|
|
|
|
else
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
elog(ERROR, "unrecognized tqueue mode: %d", (int) reader->mode);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
* Handle a data message - that is, a tuple - from the remote side.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
static HeapTuple
|
|
|
|
|
TupleQueueHandleDataMessage(TupleQueueReader *reader,
|
|
|
|
|
Size nbytes,
|
|
|
|
|
HeapTupleHeader data)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
HeapTupleData htup;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/*
|
|
|
|
|
* Set up a dummy HeapTupleData pointing to the data from the shm_mq
|
|
|
|
|
* (which had better be sufficiently aligned).
|
|
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
ItemPointerSetInvalid(&htup.t_self);
|
|
|
|
|
htup.t_tableOid = InvalidOid;
|
|
|
|
|
htup.t_len = nbytes;
|
|
|
|
|
htup.t_data = data;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/*
|
|
|
|
|
* Either just copy the data into a regular palloc'd tuple, or remap it,
|
|
|
|
|
* as required.
|
|
|
|
|
*/
|
|
|
|
|
return TQRemapTuple(reader,
|
|
|
|
|
reader->tupledesc,
|
|
|
|
|
reader->field_remapinfo,
|
|
|
|
|
&htup);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Copy the given tuple, remapping any transient typmods contained in it.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
static HeapTuple
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQRemapTuple(TupleQueueReader *reader,
|
|
|
|
|
TupleDesc tupledesc,
|
|
|
|
|
TupleRemapInfo **field_remapinfo,
|
|
|
|
|
HeapTuple tuple)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
Datum *values;
|
|
|
|
|
bool *isnull;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
bool changed = false;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* If no remapping is necessary, just copy the tuple into a single
|
|
|
|
|
* palloc'd chunk, as caller will expect.
|
|
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (field_remapinfo == NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
return heap_copytuple(tuple);
|
|
|
|
|
|
|
|
|
|
/* Deform tuple so we can remap record typmods for individual attrs. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
values = (Datum *) palloc(tupledesc->natts * sizeof(Datum));
|
|
|
|
|
isnull = (bool *) palloc(tupledesc->natts * sizeof(bool));
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
heap_deform_tuple(tuple, tupledesc, values, isnull);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Recursively process each interesting non-NULL attribute. */
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (isnull[i] || field_remapinfo[i] == NULL)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
continue;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
values[i] = TQRemap(reader, field_remapinfo[i], values[i], &changed);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Reconstruct the modified tuple, if anything was modified. */
|
|
|
|
|
if (changed)
|
|
|
|
|
return heap_form_tuple(tupledesc, values, isnull);
|
|
|
|
|
else
|
|
|
|
|
return heap_copytuple(tuple);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Process the given datum and replace any transient record typmods
|
|
|
|
|
* contained in it. Set *changed to TRUE if we actually changed the datum.
|
|
|
|
|
*
|
|
|
|
|
* remapinfo is previously-computed remapping info about the datum's type.
|
|
|
|
|
*
|
|
|
|
|
* This function just dispatches based on the remap class.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
static Datum
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQRemap(TupleQueueReader *reader, TupleRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* This is recursive, so it could be driven to stack overflow. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
check_stack_depth();
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
switch (remapinfo->remapclass)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
case TQUEUE_REMAP_ARRAY:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
return TQRemapArray(reader, &remapinfo->u.arr, value, changed);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
case TQUEUE_REMAP_RANGE:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
return TQRemapRange(reader, &remapinfo->u.rng, value, changed);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
case TQUEUE_REMAP_RECORD:
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
return TQRemapRecord(reader, &remapinfo->u.rec, value, changed);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
elog(ERROR, "unrecognized tqueue remap class: %d",
|
|
|
|
|
(int) remapinfo->remapclass);
|
|
|
|
|
return (Datum) 0;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Process the given array datum and replace any transient record typmods
|
|
|
|
|
* contained in it. Set *changed to TRUE if we actually changed the datum.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
static Datum
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQRemapArray(TupleQueueReader *reader, ArrayRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
ArrayType *arr = DatumGetArrayTypeP(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
Oid typid = ARR_ELEMTYPE(arr);
|
|
|
|
|
bool element_changed = false;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
Datum *elem_values;
|
|
|
|
|
bool *elem_nulls;
|
|
|
|
|
int num_elems;
|
|
|
|
|
int i;
|
|
|
|
|
|
|
|
|
|
/* Deconstruct the array. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
deconstruct_array(arr, typid, remapinfo->typlen,
|
|
|
|
|
remapinfo->typbyval, remapinfo->typalign,
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
&elem_values, &elem_nulls, &num_elems);
|
|
|
|
|
|
|
|
|
|
/* Remap each element. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
for (i = 0; i < num_elems; i++)
|
|
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
if (!elem_nulls[i])
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
elem_values[i] = TQRemap(reader,
|
|
|
|
|
remapinfo->element_remap,
|
|
|
|
|
elem_values[i],
|
|
|
|
|
&element_changed);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if (element_changed)
|
|
|
|
|
{
|
|
|
|
|
/* Reconstruct and return the array. */
|
|
|
|
|
*changed = true;
|
|
|
|
|
arr = construct_md_array(elem_values, elem_nulls,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
9 years ago
|
|
|
ARR_NDIM(arr), ARR_DIMS(arr), ARR_LBOUND(arr),
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
typid, remapinfo->typlen,
|
|
|
|
|
remapinfo->typbyval, remapinfo->typalign);
|
|
|
|
|
return PointerGetDatum(arr);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Else just return the value as-is. */
|
|
|
|
|
return value;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Process the given range datum and replace any transient record typmods
|
|
|
|
|
* contained in it. Set *changed to TRUE if we actually changed the datum.
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
*/
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
static Datum
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQRemapRange(TupleQueueReader *reader, RangeRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
RangeType *range = DatumGetRangeType(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
bool bound_changed = false;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
RangeBound lower;
|
|
|
|
|
RangeBound upper;
|
|
|
|
|
bool empty;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Extract the lower and upper bounds. */
|
|
|
|
|
range_deserialize(remapinfo->typcache, range, &lower, &upper, &empty);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* Nothing to do for an empty range. */
|
|
|
|
|
if (empty)
|
|
|
|
|
return value;
|
|
|
|
|
|
|
|
|
|
/* Remap each bound, if present. */
|
|
|
|
|
if (!upper.infinite)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
upper.val = TQRemap(reader, remapinfo->bound_remap,
|
|
|
|
|
upper.val, &bound_changed);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
if (!lower.infinite)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
lower.val = TQRemap(reader, remapinfo->bound_remap,
|
|
|
|
|
lower.val, &bound_changed);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (bound_changed)
|
|
|
|
|
{
|
|
|
|
|
/* Reserialize. */
|
|
|
|
|
*changed = true;
|
|
|
|
|
range = range_serialize(remapinfo->typcache, &lower, &upper, empty);
|
|
|
|
|
return RangeTypeGetDatum(range);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Else just return the value as-is. */
|
|
|
|
|
return value;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Process the given record datum and replace any transient record typmods
|
|
|
|
|
* contained in it. Set *changed to TRUE if we actually changed the datum.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
|
|
|
|
static Datum
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TQRemapRecord(TupleQueueReader *reader, RecordRemapInfo *remapinfo,
|
|
|
|
|
Datum value, bool *changed)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
HeapTupleHeader tup;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
Oid typid;
|
|
|
|
|
int32 typmod;
|
|
|
|
|
bool changed_typmod;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
TupleDesc tupledesc;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Extract type OID and typmod from tuple. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
tup = DatumGetHeapTupleHeader(value);
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
typid = HeapTupleHeaderGetTypeId(tup);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
typmod = HeapTupleHeaderGetTypMod(tup);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/*
|
|
|
|
|
* If first time through, or if this isn't the same composite type as last
|
|
|
|
|
* time, identify the required typmod mapping, and then look up the
|
|
|
|
|
* necessary information for processing the fields.
|
|
|
|
|
*/
|
|
|
|
|
if (typid != remapinfo->rectypid || typmod != remapinfo->rectypmod)
|
|
|
|
|
{
|
|
|
|
|
/* Free any old data. */
|
|
|
|
|
if (remapinfo->tupledesc != NULL)
|
|
|
|
|
FreeTupleDesc(remapinfo->tupledesc);
|
|
|
|
|
/* Is it worth trying to free substructure of the remap tree? */
|
|
|
|
|
if (remapinfo->field_remap != NULL)
|
|
|
|
|
pfree(remapinfo->field_remap);
|
|
|
|
|
|
|
|
|
|
/* If transient record type, look up matching local typmod. */
|
|
|
|
|
if (typid == RECORDOID)
|
|
|
|
|
{
|
|
|
|
|
RecordTypmodMap *mapent;
|
|
|
|
|
|
|
|
|
|
Assert(reader->typmodmap != NULL);
|
|
|
|
|
mapent = hash_search(reader->typmodmap, &typmod,
|
|
|
|
|
HASH_FIND, NULL);
|
|
|
|
|
if (mapent == NULL)
|
|
|
|
|
elog(ERROR, "tqueue received unrecognized remote typmod %d",
|
|
|
|
|
typmod);
|
|
|
|
|
remapinfo->localtypmod = mapent->localtypmod;
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
remapinfo->localtypmod = -1;
|
|
|
|
|
|
|
|
|
|
/* Look up tuple descriptor in typcache. */
|
|
|
|
|
tupledesc = lookup_rowtype_tupdesc(typid, remapinfo->localtypmod);
|
|
|
|
|
|
|
|
|
|
/* Figure out whether fields need recursive processing. */
|
|
|
|
|
remapinfo->field_remap = BuildFieldRemapInfo(tupledesc,
|
|
|
|
|
reader->mycontext);
|
|
|
|
|
if (remapinfo->field_remap != NULL)
|
|
|
|
|
{
|
|
|
|
|
/*
|
|
|
|
|
* We need to inspect the record contents, so save a copy of the
|
|
|
|
|
* tupdesc. (We could possibly just reference the typcache's
|
|
|
|
|
* copy, but then it's problematic when to release the refcount.)
|
|
|
|
|
*/
|
|
|
|
|
MemoryContext oldcontext = MemoryContextSwitchTo(reader->mycontext);
|
|
|
|
|
|
|
|
|
|
remapinfo->tupledesc = CreateTupleDescCopy(tupledesc);
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
}
|
|
|
|
|
else
|
|
|
|
|
{
|
|
|
|
|
/* No fields of the record require remapping. */
|
|
|
|
|
remapinfo->tupledesc = NULL;
|
|
|
|
|
}
|
|
|
|
|
remapinfo->rectypid = typid;
|
|
|
|
|
remapinfo->rectypmod = typmod;
|
|
|
|
|
|
|
|
|
|
/* Release reference count acquired by lookup_rowtype_tupdesc. */
|
|
|
|
|
DecrTupleDescRefCount(tupledesc);
|
|
|
|
|
}
|
|
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/* If transient record, replace remote typmod with local typmod. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (typid == RECORDOID && typmod != remapinfo->localtypmod)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
typmod = remapinfo->localtypmod;
|
|
|
|
|
changed_typmod = true;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
else
|
|
|
|
|
changed_typmod = false;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* If we need to change the typmod, or if there are any potentially
|
|
|
|
|
* remappable fields, replace the tuple.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (changed_typmod || remapinfo->field_remap != NULL)
|
|
|
|
|
{
|
|
|
|
|
HeapTupleData htup;
|
|
|
|
|
HeapTuple atup;
|
|
|
|
|
|
|
|
|
|
/* For now, assume we always need to change the tuple in this case. */
|
|
|
|
|
*changed = true;
|
|
|
|
|
|
|
|
|
|
/* Copy tuple, possibly remapping contained fields. */
|
|
|
|
|
ItemPointerSetInvalid(&htup.t_self);
|
|
|
|
|
htup.t_tableOid = InvalidOid;
|
|
|
|
|
htup.t_len = HeapTupleHeaderGetDatumLength(tup);
|
|
|
|
|
htup.t_data = tup;
|
|
|
|
|
atup = TQRemapTuple(reader,
|
|
|
|
|
remapinfo->tupledesc,
|
|
|
|
|
remapinfo->field_remap,
|
|
|
|
|
&htup);
|
|
|
|
|
|
|
|
|
|
/* Apply the correct labeling for a local Datum. */
|
|
|
|
|
HeapTupleHeaderSetTypeId(atup->t_data, typid);
|
|
|
|
|
HeapTupleHeaderSetTypMod(atup->t_data, typmod);
|
|
|
|
|
HeapTupleHeaderSetDatumLength(atup->t_data, htup.t_len);
|
|
|
|
|
|
|
|
|
|
/* And return the results. */
|
|
|
|
|
return HeapTupleHeaderGetDatum(atup->t_data);
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Else just return the value as-is. */
|
|
|
|
|
return value;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/*
|
|
|
|
|
* Handle a control message from the tuple queue reader.
|
|
|
|
|
*
|
|
|
|
|
* Control messages are sent when the remote side is sending tuples that
|
|
|
|
|
* contain transient record types. We need to arrange to bless those
|
|
|
|
|
* record types locally and translate between remote and local typmods.
|
|
|
|
|
*/
|
|
|
|
|
static void
|
|
|
|
|
TupleQueueHandleControlMessage(TupleQueueReader *reader, Size nbytes,
|
|
|
|
|
char *data)
|
|
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
int32 remotetypmod;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
int natts;
|
|
|
|
|
bool hasoid;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
Size offset = 0;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
Form_pg_attribute *attrs;
|
|
|
|
|
TupleDesc tupledesc;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
RecordTypmodMap *mapent;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
bool found;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* Extract remote typmod. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
memcpy(&remotetypmod, &data[offset], sizeof(int32));
|
|
|
|
|
offset += sizeof(int32);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* Extract attribute count. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
memcpy(&natts, &data[offset], sizeof(int));
|
|
|
|
|
offset += sizeof(int);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
|
|
|
|
/* Extract hasoid flag. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
memcpy(&hasoid, &data[offset], sizeof(bool));
|
|
|
|
|
offset += sizeof(bool);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Extract attribute details. The tupledesc made here is just transient. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
attrs = palloc(natts * sizeof(Form_pg_attribute));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
for (i = 0; i < natts; i++)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
attrs[i] = palloc(sizeof(FormData_pg_attribute));
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
memcpy(attrs[i], &data[offset], sizeof(FormData_pg_attribute));
|
|
|
|
|
offset += sizeof(FormData_pg_attribute);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
/* We should have read the whole message. */
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
Assert(offset == nbytes);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Construct TupleDesc, and assign a local typmod. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
tupledesc = CreateTupleDesc(natts, hasoid, attrs);
|
|
|
|
|
tupledesc = BlessTupleDesc(tupledesc);
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Create mapping hashtable if it doesn't exist already. */
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
if (reader->typmodmap == NULL)
|
|
|
|
|
{
|
|
|
|
|
HASHCTL ctl;
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
MemSet(&ctl, 0, sizeof(ctl));
|
|
|
|
|
ctl.keysize = sizeof(int32);
|
|
|
|
|
ctl.entrysize = sizeof(RecordTypmodMap);
|
|
|
|
|
ctl.hcxt = reader->mycontext;
|
|
|
|
|
reader->typmodmap = hash_create("tqueue receiver record type hashtable",
|
|
|
|
|
100, &ctl,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
9 years ago
|
|
|
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/* Create map entry. */
|
|
|
|
|
mapent = hash_search(reader->typmodmap, &remotetypmod, HASH_ENTER,
|
|
|
|
|
&found);
|
|
|
|
|
if (found)
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
elog(ERROR, "duplicate tqueue control message for typmod %d",
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
remotetypmod);
|
|
|
|
|
mapent->localtypmod = tupledesc->tdtypmod;
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
|
|
|
|
|
elog(DEBUG3, "tqueue mapping remote typmod %d to local typmod %d",
|
|
|
|
|
remotetypmod, mapent->localtypmod);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Build remap info for the specified data type, storing it in mycontext.
|
|
|
|
|
* Returns NULL if neither the type nor any subtype could require remapping.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
static TupleRemapInfo *
|
|
|
|
|
BuildTupleRemapInfo(Oid typid, MemoryContext mycontext)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
HeapTuple tup;
|
|
|
|
|
Form_pg_type typ;
|
|
|
|
|
|
|
|
|
|
/* This is recursive, so it could be driven to stack overflow. */
|
|
|
|
|
check_stack_depth();
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
restart:
|
|
|
|
|
tup = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
|
|
|
|
|
if (!HeapTupleIsValid(tup))
|
|
|
|
|
elog(ERROR, "cache lookup failed for type %u", typid);
|
|
|
|
|
typ = (Form_pg_type) GETSTRUCT(tup);
|
|
|
|
|
|
|
|
|
|
/* Look through domains to underlying base type. */
|
|
|
|
|
if (typ->typtype == TYPTYPE_DOMAIN)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
typid = typ->typbasetype;
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
goto restart;
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* If it's a true array type, deal with it that way. */
|
|
|
|
|
if (OidIsValid(typ->typelem) && typ->typlen == -1)
|
|
|
|
|
{
|
|
|
|
|
typid = typ->typelem;
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
return BuildArrayRemapInfo(typid, mycontext);
|
|
|
|
|
}
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Similarly, deal with ranges appropriately. */
|
|
|
|
|
if (typ->typtype == TYPTYPE_RANGE)
|
|
|
|
|
{
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
return BuildRangeRemapInfo(typid, mycontext);
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/*
|
|
|
|
|
* If it's a composite type (including RECORD), set up for remapping. We
|
|
|
|
|
* don't attempt to determine the status of subfields here, since we do
|
|
|
|
|
* not have enough information yet; just mark everything invalid.
|
|
|
|
|
*/
|
|
|
|
|
if (typ->typtype == TYPTYPE_COMPOSITE || typid == RECORDOID)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TupleRemapInfo *remapinfo;
|
|
|
|
|
|
|
|
|
|
remapinfo = (TupleRemapInfo *)
|
|
|
|
|
MemoryContextAlloc(mycontext, sizeof(TupleRemapInfo));
|
|
|
|
|
remapinfo->remapclass = TQUEUE_REMAP_RECORD;
|
|
|
|
|
remapinfo->u.rec.rectypid = InvalidOid;
|
|
|
|
|
remapinfo->u.rec.rectypmod = -1;
|
|
|
|
|
remapinfo->u.rec.localtypmod = -1;
|
|
|
|
|
remapinfo->u.rec.tupledesc = NULL;
|
|
|
|
|
remapinfo->u.rec.field_remap = NULL;
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
return remapinfo;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
}
|
|
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Nothing else can possibly need remapping attention. */
|
|
|
|
|
ReleaseSysCache(tup);
|
|
|
|
|
return NULL;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static TupleRemapInfo *
|
|
|
|
|
BuildArrayRemapInfo(Oid elemtypid, MemoryContext mycontext)
|
|
|
|
|
{
|
|
|
|
|
TupleRemapInfo *remapinfo;
|
|
|
|
|
TupleRemapInfo *element_remapinfo;
|
|
|
|
|
|
|
|
|
|
/* See if element type requires remapping. */
|
|
|
|
|
element_remapinfo = BuildTupleRemapInfo(elemtypid, mycontext);
|
|
|
|
|
/* If not, the array doesn't either. */
|
|
|
|
|
if (element_remapinfo == NULL)
|
|
|
|
|
return NULL;
|
|
|
|
|
/* OK, set up to remap the array. */
|
|
|
|
|
remapinfo = (TupleRemapInfo *)
|
|
|
|
|
MemoryContextAlloc(mycontext, sizeof(TupleRemapInfo));
|
|
|
|
|
remapinfo->remapclass = TQUEUE_REMAP_ARRAY;
|
|
|
|
|
get_typlenbyvalalign(elemtypid,
|
|
|
|
|
&remapinfo->u.arr.typlen,
|
|
|
|
|
&remapinfo->u.arr.typbyval,
|
|
|
|
|
&remapinfo->u.arr.typalign);
|
|
|
|
|
remapinfo->u.arr.element_remap = element_remapinfo;
|
|
|
|
|
return remapinfo;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
static TupleRemapInfo *
|
|
|
|
|
BuildRangeRemapInfo(Oid rngtypid, MemoryContext mycontext)
|
|
|
|
|
{
|
|
|
|
|
TupleRemapInfo *remapinfo;
|
|
|
|
|
TupleRemapInfo *bound_remapinfo;
|
|
|
|
|
TypeCacheEntry *typcache;
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Get range info from the typcache. We assume this pointer will stay
|
|
|
|
|
* valid for the duration of the query.
|
|
|
|
|
*/
|
|
|
|
|
typcache = lookup_type_cache(rngtypid, TYPECACHE_RANGE_INFO);
|
|
|
|
|
if (typcache->rngelemtype == NULL)
|
|
|
|
|
elog(ERROR, "type %u is not a range type", rngtypid);
|
|
|
|
|
|
|
|
|
|
/* See if range bound type requires remapping. */
|
|
|
|
|
bound_remapinfo = BuildTupleRemapInfo(typcache->rngelemtype->type_id,
|
|
|
|
|
mycontext);
|
|
|
|
|
/* If not, the range doesn't either. */
|
|
|
|
|
if (bound_remapinfo == NULL)
|
|
|
|
|
return NULL;
|
|
|
|
|
/* OK, set up to remap the range. */
|
|
|
|
|
remapinfo = (TupleRemapInfo *)
|
|
|
|
|
MemoryContextAlloc(mycontext, sizeof(TupleRemapInfo));
|
|
|
|
|
remapinfo->remapclass = TQUEUE_REMAP_RANGE;
|
|
|
|
|
remapinfo->u.rng.typcache = typcache;
|
|
|
|
|
remapinfo->u.rng.bound_remap = bound_remapinfo;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
return remapinfo;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/*
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
* Build remap info for fields of the type described by the given tupdesc.
|
|
|
|
|
* Returns an array of TupleRemapInfo pointers, or NULL if no field
|
|
|
|
|
* requires remapping. Data is allocated in mycontext.
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
*/
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
static TupleRemapInfo **
|
|
|
|
|
BuildFieldRemapInfo(TupleDesc tupledesc, MemoryContext mycontext)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
TupleRemapInfo **remapinfo;
|
|
|
|
|
bool noop = true;
|
|
|
|
|
int i;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* Recursively determine the remapping status of each field. */
|
|
|
|
|
remapinfo = (TupleRemapInfo **)
|
|
|
|
|
MemoryContextAlloc(mycontext,
|
|
|
|
|
tupledesc->natts * sizeof(TupleRemapInfo *));
|
|
|
|
|
for (i = 0; i < tupledesc->natts; i++)
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
{
|
|
|
|
|
Form_pg_attribute attr = TupleDescAttr(tupledesc, i);
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
if (attr->attisdropped)
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
{
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
remapinfo[i] = NULL;
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
continue;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
remapinfo[i] = BuildTupleRemapInfo(attr->atttypid, mycontext);
|
|
|
|
|
if (remapinfo[i] != NULL)
|
|
|
|
|
noop = false;
|
|
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
/* If no fields require remapping, report that by returning NULL. */
|
|
|
|
|
if (noop)
|
|
|
|
|
{
|
|
|
|
|
pfree(remapinfo);
|
|
|
|
|
remapinfo = NULL;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|
Modify tqueue infrastructure to support transient record types.
Commit 4a4e6893aa080b9094dadbe0e65f8a75fee41ac6, which introduced this
mechanism, failed to account for the fact that the RECORD pseudo-type
uses transient typmods that are only meaningful within a single
backend. Transferring such tuples without modification between two
cooperating backends does not work. This commit installs a system
for passing the tuple descriptors over the same shm_mq being used to
send the tuples themselves. The two sides might not assign the same
transient typmod to any given tuple descriptor, so we must also
substitute the appropriate receiver-side typmod for the one used by
the sender. That adds some CPU overhead, but still seems better than
being unable to pass records between cooperating parallel processes.
Along the way, move the logic for handling multiple tuple queues from
tqueue.c to nodeGather.c; tqueue.c now provides a TupleQueueReader,
which reads from a single queue, rather than a TupleQueueFunnel, which
potentially reads from multiple queues. This change was suggested
previously as a way to make sure that nodeGather.c rather than tqueue.c
had policy control over the order in which to read from queues, but
it wasn't clear to me until now how good an idea it was. typmod
mapping needs to be performed separately for each queue, and it is
much simpler if the tqueue.c code handles that and leaves multiplexing
multiple queues to higher layers of the stack.
10 years ago
|
|
|
|
Code review for tqueue.c: fix memory leaks, speed it up, other fixes.
When doing record typmod remapping, tqueue.c did fresh catalog lookups
for each tuple it processed, which was pretty horrible performance-wise
(it seemed to about halve the already none-too-quick speed of bulk reads
in parallel mode). Worse, it insisted on putting bits of that data into
TopMemoryContext, from where it never freed them, causing a
session-lifespan memory leak. (I suppose this was coded with the idea
that the sender process would quit after finishing the query ---
but the receiver uses the same code.)
Restructure to avoid repetitive catalog lookups and to keep that data
in a query-lifespan context, in or below the context where the
TQueueDestReceiver or TupleQueueReader itself lives.
Fix some other bugs such as continuing to use a tupledesc after
releasing our refcount on it. Clean up cavalier datatype choices
(typmods are int32, please, not int, and certainly not Oid). Improve
comments and error message wording.
10 years ago
|
|
|
return remapinfo;
|
Glue layer to connect the executor to the shm_mq mechanism.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
10 years ago
|
|
|
}
|