postgres

Commit Graph

Author	SHA1	Message	Date
Robert Haas	e8c435a824	docs: Update TOAST storage docs for configurable compression. Mention that there are multiple TOAST compression methods and that the compression method used is stored in a TOAST pointer along with the other information that was stored there previously. Add a reference to the documentation for default_toast_compression, where the supported methods are listed, instead of duplicating that here. I haven't tried to preserve the text claiming that pglz is "fairly simple and very fast." I have no view on the veracity of the former claim, but LZ4 seems to be faster (and to compress better) so it seems better not to muddy the waters by talking about compression speed as a strong point of PGLZ. Patch by me, reviewed by Justin Pryzby. Discussion: http://postgr.es/m/CA+Tgmoaw_YBwQhOS_hhEPPwFhfAnu+VCLs18EfGr9gQw1z4H-w@mail.gmail.com	4 years ago
Fujii Masao	5fe83adad9	doc: Fix typo in logicaldecoding.sgml. Introduced in commit `0aa8a01d04`. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Ps8rkVHKWyjg09Fb1PaVG5-EhoFTFJ9OZMF4HPzDSXfew@mail.gmail.com	4 years ago
Michael Paquier	b094063cd1	Move log_autovacuum_min_duration into its correct sections This GUC has already been classified as LOGGING_WHAT, but its location in postgresql.conf.sample and the documentation did not reflect that, so fix those inconsistencies. Author: Justin Pryzby Discussion: https://postgr.es/m/20210404012546.GK6592@telsasoft.com	4 years ago
Amit Kapila	15c1a9d9cb	doc: Update information of new messages for logical replication. Updated documentation for new messages added for streaming of in-progress transactions, as well as changes made to the existing messages. It also updates the information of protocol versions supported for logical replication. Author: Ajin Cherian Reviewed-by: Amit Kapila, Peter Smith, Euler Taveira Discussion: https://postgr.es/m/CAFPTHDYHN9m=MZZct-B=BYg_TETvv+kXvL9RD2DpaBS5pGxGYg@mail.gmail.com	4 years ago
Tom Lane	07b76833b1	Doc: update documentation of check_function_bodies. Adjust docs and description string to note that check_function_bodies applies to procedures too. (In hindsight it should have been named check_routine_bodies, but it seems too late for that now.) Daniel Westermann Discussion: https://postgr.es/m/GV0P278MB04834A9EB9A74B036DC7CE49D2739@GV0P278MB0483.CHEP278.PROD.OUTLOOK.COM	4 years ago
Peter Eisentraut	9cae39b8e6	doc: Fix man page whitespace issues Whitespace between tags is significant, and in some cases it creates extra vertical space in man pages. The fix is to remove some newlines in the markup.	4 years ago
Thomas Munro	dc88460c24	Doc: Review for "Optionally prefetch referenced data in recovery." Typos, corrections and language improvements in the docs, and a few in code comments too. Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20210409033703.GP6592%40telsasoft.com	4 years ago
Peter Eisentraut	49fb4e6b24	doc: Additional documentation for date_bin Reported-by: Justin Pryzby <pryzby@telsasoft.com> Author: John Naylor <john.naylor@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/CAFBsxsEEm1nuhZmfVQxvu_i3nDDEuvNJ_WMrDo9whFD_jusp-A@mail.gmail.com	4 years ago
Alvaro Herrera	41badeaba8	Document ANALYZE storage parameters for partitioned tables Commit `0827e8af70` added parameters for autovacuum to support partitioned tables, but didn't add any docs. Add them. Discussion: https://postgr.es/m/20210408213051.GL6592@telsasoft.com	4 years ago
Michael Paquier	609b0652af	Fix typos and grammar in documentation and code comments Comment fixes are applied on HEAD, and documentation improvements are applied on back-branches where needed. Author: Justin Pryzby Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com Backpatch-through: 9.6	4 years ago
Fujii Masao	8ff1c94649	Allow TRUNCATE command to truncate foreign tables. This commit introduces new foreign data wrapper API for TRUNCATE. It extends TRUNCATE command so that it accepts foreign tables as the targets to truncate and invokes that API. Also it extends postgres_fdw so that it can issue TRUNCATE command to foreign servers, by adding new routine for that TRUNCATE API. The information about options specified in TRUNCATE command, e.g., ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to truncate is also passed to FDW. FDW truncates the foreign data sources that the passed foreign tables specify, based on those information. For example, postgres_fdw constructs TRUNCATE command using them and issues it to the foreign server. For performance, TRUNCATE command invokes the FDW routine for TRUNCATE once per foreign server that foreign tables to truncate belong to. Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com	4 years ago
Thomas Munro	1d257577e0	Optionally prefetch referenced data in recovery. Introduce a new GUC recovery_prefetch, disabled by default. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Better mechanisms will follow in later work on the I/O subsystem. The GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os we allow ourselves to initiate, based on pessimistic heuristics used to infer that I/Os have begun and completed. The GUC wal_decode_buffer_size is used to limit the maximum distance we are prepared to read ahead in the WAL to find uncached blocks. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts) Reviewed-by: Andres Freund <andres@anarazel.de> (parts) Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com	4 years ago
Magnus Hagander	aaf0432572	Add functions to wait for backend termination This adds a function, pg_wait_for_backend_termination(), and a new timeout argument to pg_terminate_backend(), which will wait for the backend to actually terminate (with or without signaling it to do so depending on which function is called). The default behaviour of pg_terminate_backend() remains being timeout=0 which does not waiting. For pg_wait_for_backend_termination() the default wait is 5 seconds. Author: Bharath Rupireddy Reviewed-By: Fujii Masao, David Johnston, Muhammad Usama, Hou Zhijie, Magnus Hagander Discussion: https://postgr.es/m/CALj2ACUBpunmyhYZw-kXCYs5NM+h6oG_7Df_Tn4mLmmUQifkqA@mail.gmail.com	4 years ago
Peter Eisentraut	fb310f1781	doc: Prefer explicit JOIN syntax over old implicit syntax in tutorial Update src/tutorial/basics.source to match. Author: Jürgen Purtz <juergen@purtz.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/158996922318.7035.10603922579567326239@wrigleys.postgresql.org	4 years ago
Magnus Hagander	6b4d23feef	Track identical top vs nested queries independently in pg_stat_statements Changing pg_stat_statements.track between 'all' and 'top' would control if pg_stat_statements tracked just top level statements or also statements inside functions, but when tracking all it would not differentiate between the two. Being table to differentiate this is useful both to track where the actual query is coming from, and to see if there are differences in executions between the two. To do this, add a boolean to the hash key indicating if the statement was top level or not. Experience from the pg_stat_kcache module shows that in at least some "reasonable worloads" only <5% of the queries show up both top level and nested. Based on this, admittedly small, dataset, this patch does not try to de-duplicate those query texts, and will just store one copy for the top level and one for the nested. Author: Julien Rohaud Reviewed-By: Magnus Hagander, Masahiro Ikeda Discussion: https://postgr.es/m/20201202040516.GA43757@nol	4 years ago
Tom Lane	a3027e1e7f	Allow psql's \df and \do commands to specify argument types. When dealing with overloaded function or operator names, having to look through a long list of matches is tedious. Let's extend these commands to allow specification of (input) argument types to let such results be trimmed down. Each additional argument is treated the same as the pattern argument of \dT and matched against the appropriate argument's type name. While at it, fix \dT (and these new options) to recognize the usual notation of "foo[]" for "the array type over foo", and to handle the special abbreviations allowed by the backend grammar, such as "int" for "integer". Greg Sabino Mullane, revised rather significantly by me Discussion: https://postgr.es/m/CAKAnmmLF9Hhu02N+s7uAyLc5J1xZReg72HQUoiKhNiJV3_jACQ@mail.gmail.com	4 years ago
Bruce Momjian	f57a2f5e03	Add csvlog output for the new query_id value This also adjusts the printf format for query id used by log_line_prefix (%Q). Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20210408005402.GG24239@momjian.us Author: Julien Rouhaud, Bruce Momjian	4 years ago
Peter Eisentraut	e717a9a18b	SQL-standard function body This adds support for writing CREATE FUNCTION and CREATE PROCEDURE statements for language SQL with a function body that conforms to the SQL standard and is portable to other implementations. Instead of the PostgreSQL-specific AS $$ string literal $$ syntax, this allows writing out the SQL statements making up the body unquoted, either as a single statement: CREATE FUNCTION add(a integer, b integer) RETURNS integer LANGUAGE SQL RETURN a + b; or as a block CREATE PROCEDURE insert_data(a integer, b integer) LANGUAGE SQL BEGIN ATOMIC INSERT INTO tbl VALUES (a); INSERT INTO tbl VALUES (b); END; The function body is parsed at function definition time and stored as expression nodes in a new pg_proc column prosqlbody. So at run time, no further parsing is required. However, this form does not support polymorphic arguments, because there is no more parse analysis done at call time. Dependencies between the function and the objects it uses are fully tracked. A new RETURN statement is introduced. This can only be used inside function bodies. Internally, it is treated much like a SELECT statement. psql needs some new intelligence to keep track of function body boundaries so that it doesn't send off statements when it sees semicolons that are inside a function body. Tested-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/1c11f1eb-f00c-43b7-799d-2d44132c02d7@2ndquadrant.com	4 years ago
Peter Geoghegan	1e55e7d175	Add wraparound failsafe to VACUUM. Add a failsafe mechanism that is triggered by VACUUM when it notices that the table's relfrozenxid and/or relminmxid are dangerously far in the past. VACUUM checks the age of the table dynamically, at regular intervals. When the failsafe triggers, VACUUM takes extraordinary measures to finish as quickly as possible so that relfrozenxid and/or relminmxid can be advanced. VACUUM will stop applying any cost-based delay that may be in effect. VACUUM will also bypass any further index vacuuming and heap vacuuming -- it only completes whatever remaining pruning and freezing is required. Bypassing index/heap vacuuming is enabled by commit `8523492d`, which made it possible to dynamically trigger the mechanism already used within VACUUM when it is run with INDEX_CLEANUP off. It is expected that the failsafe will almost always trigger within an autovacuum to prevent wraparound, long after the autovacuum began. However, the failsafe mechanism can trigger in any VACUUM operation. Even in a non-aggressive VACUUM, where we're likely to not advance relfrozenxid, it still seems like a good idea to finish off remaining pruning and freezing. An aggressive/anti-wraparound VACUUM will be launched immediately afterwards. Note that the anti-wraparound VACUUM that follows will itself trigger the failsafe, usually before it even begins its first (and only) pass over the heap. The failsafe is controlled by two new GUCs: vacuum_failsafe_age, and vacuum_multixact_failsafe_age. There are no equivalent reloptions, since that isn't expected to be useful. The GUCs have rather high defaults (both default to 1.6 billion), and are expected to generally only be used to make the failsafe trigger sooner/more frequently. Author: Masahiko Sawada <sawada.mshk@gmail.com> Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzmgH3ySGYeC-m-eOBsa2=sDwa292-CFghV4rESYo39FsQ@mail.gmail.com	4 years ago
Bruce Momjian	4f0b0966c8	Make use of in-core query id added by commit `5fd9dfa5f5` Use the in-core query id computation for pg_stat_activity, log_line_prefix, and EXPLAIN VERBOSE. Similar to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Add a %Q placeholder to include the queryid in log_line_prefix, which will also only expose top level statements. For EXPLAIN VERBOSE, if a query identifier has been computed, either by enabling compute_query_id or using a third-party module, display it. Bump catalog version. Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol Author: Julien Rouhaud Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu	4 years ago
Bruce Momjian	5fd9dfa5f5	Move pg_stat_statements query jumbling to core. Add compute_query_id GUC to control whether a query identifier should be computed by the core (off by default). It's thefore now possible to disable core queryid computation and use pg_stat_statements with a different algorithm to compute the query identifier by using a third-party module. To ensure that a single source of query identifier can be used and is well defined, modules that calculate a query identifier should throw an error if compute_query_id specified to compute a query id and if a query idenfitier was already calculated. Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol Author: Julien Rouhaud Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu	4 years ago
Peter Eisentraut	5c55dc8b47	libpq: Set Server Name Indication (SNI) for SSL connections By default, have libpq set the TLS extension "Server Name Indication" (SNI). This allows an SNI-aware SSL proxy to route connections. (This requires a proxy that is aware of the PostgreSQL protocol, not just any SSL proxy.) In the future, this could also allow the server to use different SSL certificates for different host specifications. (That would require new server functionality. This would be the client-side functionality for that.) Since SNI makes the host name appear in cleartext in the network traffic, this might be undesirable in some cases. Therefore, also add a libpq connection option "sslsni" to turn it off. Discussion: https://www.postgresql.org/message-id/flat/7289d5eb-62a5-a732-c3b9-438cee2cb709%40enterprisedb.com	4 years ago
Peter Eisentraut	4560e0acda	doc: Improve wording Discussion: https://www.postgresql.org/message-id/flat/161626776179.652.11944895442156126506%40wrigleys.postgresql.org	4 years ago
Peter Eisentraut	dd13ad9d39	Fix use of cursor sensitivity terminology Documentation and comments in code and tests have been using the terms sensitive/insensitive cursor incorrectly relative to the SQL standard. (Cursor sensitivity is only relevant for changes made in the same transaction as the cursor, not for concurrent changes in other sessions.) Moreover, some of the behavior of PostgreSQL is incorrect according to the SQL standard, confusing the issue further. (WHERE CURRENT OF changes are not visible in insensitive cursors, but they should be.) This change corrects the terminology and removes the claim that sensitive cursors are supported. It also adds a test case that checks the insensitive behavior in a "correct" way, using a change command not using WHERE CURRENT OF. Finally, it adds the ASENSITIVE cursor option to select the default asensitive behavior, per SQL standard. There are no changes to cursor behavior in this patch. Discussion: https://www.postgresql.org/message-id/flat/96ee8b30-9889-9e1b-b053-90e10c050e85%40enterprisedb.com	4 years ago
Michael Paquier	9afffcb833	Add some information about authenticated identity via log_connections The "authenticated identity" is the string used by an authentication method to identify a particular user. In many common cases, this is the same as the PostgreSQL username, but for some third-party authentication methods, the identifier in use may be shortened or otherwise translated (e.g. through pg_ident user mappings) before the server stores it. To help administrators see who has actually interacted with the system, this commit adds the capability to store the original identity when authentication succeeds within the backend's Port, and generates a log entry when log_connections is enabled. The log entries generated look something like this (where a local user named "foouser" is connecting to the database as the database user called "admin"): LOG: connection received: host=[local] LOG: connection authenticated: identity="foouser" method=peer (/data/pg_hba.conf:88) LOG: connection authorized: user=admin database=postgres application_name=psql Port->authn_id is set according to the authentication method: bsd: the PostgreSQL username (aka the local username) cert: the client's Subject DN gss: the user principal ident: the remote username ldap: the final bind DN pam: the PostgreSQL username (aka PAM username) password (and all pw-challenge methods): the PostgreSQL username peer: the peer's pw_name radius: the PostgreSQL username (aka the RADIUS username) sspi: either the down-level (SAM-compatible) logon name, if compat_realm=1, or the User Principal Name if compat_realm=0 The trust auth method does not set an authenticated identity. Neither does clientcert=verify-full. Port->authn_id could be used for other purposes, like a superuser-only extra column in pg_stat_activity, but this is left as future work. PostgresNode::connect_{ok,fails}() have been modified to let tests check the backend log files for required or prohibited patterns, using the new log_like and log_unlike parameters. This uses a method based on a truncation of the existing server log file, like issues_sql_like(). Tests are added to the ldap, kerberos, authentication and SSL test suites. Author: Jacob Champion Reviewed-by: Stephen Frost, Magnus Hagander, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/c55788dd1773c521c862e8e0dddb367df51222be.camel@vmware.com	4 years ago
Fujii Masao	a3740c48eb	postgres_fdw: Allow partitions specified in LIMIT TO to be imported. Commit `f49bcd4ef3` disallowed postgres_fdw to import table partitions. Because all data can be accessed through the partitioned table which is the root of the partitioning hierarchy, importing only partitioned table should allow access to all the data without creating extra objects. This is a reasonable default when importing a whole schema. But there may be the case where users want to explicitly import one of a partitioned tables' partitions. For that use case, this commit allows postgres_fdw to import tables or foreign tables which are partitions of some other table only when they are explicitly specified in LIMIT TO clause. It doesn't change the behavior that any partitions not specified in LIMIT TO are automatically excluded in IMPORT FOREIGN SCHEMA command. Author: Matthias van de Meent Reviewed-by: Bernd Helmle, Amit Langote, Michael Paquier, Fujii Masao Discussion: https://postgr.es/m/CAEze2Whwg4i=mzApMe+PXxCEfgoZmHGqdqQFW7J4bmj_5p6t1A@mail.gmail.com	4 years ago
Peter Eisentraut	3a51306722	psql: Show all query results by default Previously, psql printed only the last result if a command string returned multiple result sets. Now it prints all of them. The previous behavior can be obtained by setting the psql variable SHOW_ALL_RESULTS to off. Author: Fabien COELHO <coelho@cri.ensmp.fr> Reviewed-by: "Iwata, Aya" <iwata.aya@jp.fujitsu.com> Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1904132231510.8961@lancre	4 years ago
Fujii Masao	9de9294b0c	Stop archive recovery if WAL generated with wal_level=minimal is found. Previously if hot standby was enabled, archive recovery exited with an error when it found WAL generated with wal_level=minimal. But if hot standby was disabled, it just reported a warning and continued in that case. Which could lead to data loss or errors during normal operation. A warning was emitted, but users could easily miss that and not notice this serious situation until they encountered the actual errors. To improve this situation, this commit changes archive recovery so that it exits with FATAL error when it finds WAL generated with wal_level=minimal whatever the setting of hot standby. This enables users to notice the serious situation soon. The FATAL error is thrown if archive recovery starts from a base backup taken before wal_level is changed to minimal. When archive recovery exits with the error, if users have a base backup taken after setting wal_level to higher than minimal, they can recover the database by starting archive recovery from that newer backup. But note that if such backup doesn't exist, there is no easy way to complete archive recovery, which may make the database server unstartable and users may lose whole database. The commit adds the note about this risk into the document. Even in the case of unstartable database server, previously by just disabling hot standby users could avoid the error during archive recovery, forcibly start up the server and salvage data from it. But note that this commit makes this procedure unavailable at all. Author: Takamichi Osumi Reviewed-by: Laurenz Albe, Kyotaro Horiguchi, David Steele, Fujii Masao Discussion: https://postgr.es/m/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90@OSBPR01MB4888.jpnprd01.prod.outlook.com	4 years ago
Dean Rasheed	6b258e3d68	pgbench: Function to generate random permutations. This adds a new function, permute(), that generates pseudorandom permutations of arbitrary sizes. This can be used to randomly shuffle a set of values to remove unwanted correlations. For example, permuting the output from a non-uniform random distribution so that all the most common values aren't collocated, allowing more realistic tests to be performed. Formerly, hash() was recommended for this purpose, but that suffers from collisions that might alter the distribution, so recommend permute() for this purpose instead. Fabien Coelho and Hironobu Suzuki, with additional hacking be me. Reviewed by Thomas Munro, Alvaro Herrera and Muhammad Usama. Discussion: https://postgr.es/m/alpine.DEB.2.21.1807280944370.5142@lancre	4 years ago
Peter Eisentraut	82ed7748b7	ALTER SUBSCRIPTION ... ADD/DROP PUBLICATION At present, if we want to update publications in a subscription, we can use SET PUBLICATION. However, it requires supplying all publications that exists and the new publications. If we want to add new publications, it's inconvenient. The new syntax only supplies the new publications. When the refresh is true, it only refreshes the new publications. Author: Japin Li <japinli@hotmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/MEYP282MB166939D0D6C480B7FBE7EFFBB6BC0@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM	4 years ago
Peter Eisentraut	a2da77cdb4	Change return type of EXTRACT to numeric The previous implementation of EXTRACT mapped internally to date_part(), which returned type double precision (since it was implemented long before the numeric type existed). This can lead to imprecise output in some cases, so returning numeric would be preferrable. Changing the return type of an existing function is a bit risky, so instead we do the following: We implement a new set of functions, which are now called "extract", in parallel to the existing date_part functions. They work the same way internally but use numeric instead of float8. The EXTRACT construct is now mapped by the parser to these new extract functions. That way, dumps of views etc. from old versions (which would use date_part) continue to work unchanged, but new uses will map to the new extract functions. Additionally, the reverse compilation of EXTRACT now reproduces the original syntax, using the new mechanism introduced in `40c24bfef9`. The following minor changes of behavior result from the new implementation: - The column name from an isolated EXTRACT call is now "extract" instead of "date_part". - Extract from date now rejects inappropriate field names such as HOUR. It was previously mapped internally to extract from timestamp, so it would silently accept everything appropriate for timestamp. - Return values when extracting fields with possibly fractional values, such as second and epoch, now have the full scale that the value has internally (so, for example, '1.000000' instead of just '1'). Reported-by: Petr Fedorov <petr.fedorov@phystech.edu> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu	4 years ago
Fujii Masao	43620e3286	Add function to log the memory contexts of specified backend process. Commit `3e98c0bafb` added pg_backend_memory_contexts view to display the memory contexts of the backend process. However its target process is limited to the backend that is accessing to the view. So this is not so convenient when investigating the local memory bloat of other backend process. To improve this situation, this commit adds pg_log_backend_memory_contexts() function that requests to log the memory contexts of the specified backend process. This information can be also collected by calling MemoryContextStats(TopMemoryContext) via a debugger. But this technique cannot be used in some environments because no debugger is available there. So, pg_log_backend_memory_contexts() allows us to see the memory contexts of specified backend more easily. Only superusers are allowed to request to log the memory contexts because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its memory contexts at LOG_SERVER_ONLY level, so that these memory contexts will appear in the server log but not be sent to the client. It logs one message per memory context. Because if it buffers all memory contexts into StringInfo to log them as one message, which may require the buffer to be enlarged very much and lead to OOM error since there can be a large number of memory contexts in a backend. When a backend process is consuming huge memory, logging all its memory contexts might overrun available disk space. To prevent this, now this patch limits the number of child contexts to log per parent to 100. As with MemoryContextStats(), it supposes that practical cases where the log gets long will typically be huge numbers of siblings under the same parent context; while the additional debugging value from seeing details about individual siblings beyond 100 will not be large. There was another proposed patch to add the function to return the memory contexts of specified backend as the result sets, instead of logging them, in the discussion. However that patch is not included in this commit because it had several issues to address. Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra, Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com	4 years ago
Amit Kapila	ac4645c015	Allow pgoutput to send logical decoding messages. The output plugin accepts a new parameter (messages) that controls if logical decoding messages are written into the replication stream. It is useful for those clients that use pgoutput as an output plugin and needs to process messages that were written by pg_logical_emit_message(). Although logical streaming replication protocol supports logical decoding messages now, logical replication does not use this feature yet. Author: David Pirotte, Euler Taveira Reviewed-by: Euler Taveira, Andres Freund, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/CADK3HHJ-+9SO7KuRLH=9Wa1rAo60Yreq1GFNkH_kd0=CdaWM+A@mail.gmail.com	4 years ago
Tom Lane	09c1c6ab4b	Support INCLUDE'd columns in SP-GiST. Not much to say here: does what it says on the tin. We steal a previously-always-zero bit from the nextOffset field of leaf index tuples in order to track whether there is a nulls bitmap. Otherwise it works about like included columns in other index types. Pavel Borisov, reviewed by Andrey Borodin and Anastasia Lubennikova, and rather heavily editorialized on by me Discussion: https://postgr.es/m/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com	4 years ago
Stephen Frost	6c3ffd697e	Add pg_read_all_data and pg_write_all_data roles A commonly requested use-case is to have a role who can run an unfettered pg_dump without having to explicitly GRANT that user access to all tables, schemas, et al, without that role being a superuser. This address that by adding a "pg_read_all_data" role which implicitly gives any member of this role SELECT rights on all tables, views and sequences, and USAGE rights on all schemas. As there may be cases where it's also useful to have a role who has write access to all objects, pg_write_all_data is also introduced and gives users implicit INSERT, UPDATE and DELETE rights on all tables, views and sequences. These roles can not be logged into directly but instead should be GRANT'd to a role which is able to log in. As noted in the documentation, if RLS is being used then an administrator may (or may not) wish to set BYPASSRLS on the login role which these predefined roles are GRANT'd to. Reviewed-by: Georgios Kokolatos Discussion: https://postgr.es/m/20200828003023.GU29590@tamriel.snowman.net	5 years ago
Alvaro Herrera	6734e80695	Align some terms in arch-dev.sgml to glossary This mostly adds links to the glossary to the existing text, instead of using <firstterm>. Heikki left this out of `29ad6595ef` out of stylistic concerns; these have since been addressed. Author: Jürgen Purtz <juergen@purtz.de> Discussion: https://postgr.es/m/67d7240f-8596-83fc-5e15-af06c128a0f5@purtz.de	5 years ago
Tom Lane	ac9099fc1d	Fix confusion in SP-GiST between attribute type and leaf storage type. According to the documentation, the attType passed to the opclass config function (and also relied on by the core code) is the type of the heap column or expression being indexed. But what was actually being passed was the type stored for the index column. This made no difference for user-defined SP-GiST opclasses, because we weren't allowing the STORAGE clause of CREATE OPCLASS to be used, so the two types would be the same. But it's silly not to allow that, seeing that the built-in poly_ops opclass has a different value for opckeytype than opcintype, and that if you want to do lossy storage then the types must really be different. (Thus, user-defined opclasses doing lossy storage had to lie about what type is in the index.) Hence, remove the restriction, and make sure that we use the input column type not opckeytype where relevant. For reasons of backwards compatibility with existing user-defined opclasses, we can't quite insist that the specified leafType match the STORAGE clause; instead just add an amvalidate() warning if they don't match. Also fix some bugs that would only manifest when trying to return index entries when attType is different from attLeafType. It's not too surprising that these have not been reported, because the only usual reason for such a difference is to store the leaf value lossily, rendering index-only scans impossible. Add a src/test/modules module to exercise cases where attType is different from attLeafType and yet index-only scan is supported. Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us	5 years ago
Tom Lane	55873a00e3	Improve psql's behavior when the editor is exited without saving. When editing the previous query buffer, if the editor is exited without modifying the temp file then clear the query buffer, rather than re-loading (and probably re-executing) the previous query buffer. This reduces the probability of accidentally re-executing something you didn't intend to. Similarly, in "\e file", if the file isn't actually modified then don't load it into the query buffer. And in "\ef" and "\ev", if no changes are made then clear the query buffer instead of loading the function or view definition into it. Cases where we fail to invoke the editor at all, or it returns a nonzero status, are treated like the no-file-modification case. Laurenz Albe, reviewed by Jacob Champion Discussion: https://postgr.es/m/0ba3f2a658bac6546d9934ab6ba63a805d46a49b.camel@cybertec.at	5 years ago
Thomas Munro	c30f54ad73	Detect POLLHUP/POLLRDHUP while running queries. Provide a new GUC check_client_connection_interval that can be used to check whether the client connection has gone away, while running very long queries. It is disabled by default. For now this uses a non-standard Linux extension (also adopted by at least one other OS). POLLRDHUP is not defined by POSIX, and other OSes don't have a reliable way to know if a connection was closed without actually trying to read or write. In future we might consider trying to send a no-op/heartbeat message instead, but that could require protocol changes. Author: Sergey Cherkashin <s.cherkashin@postgrespro.ru> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Maksim Milyutin <milyutinma@gmail.com> Reviewed-by: Tsunakawa, Takayuki/綱川貴之 <tsunakawa.takay@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (much earlier version) Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru	5 years ago
Joe Conway	174edbe9f9	Clarify documentation of RESET ROLE Command-line options, or previous "ALTER (ROLE\|DATABASE) ... SET ROLE ..." commands, can change the value of the default role for a session. In the presence of one of these, RESET ROLE will change the current user identifier to the default role rather than the session user identifier. Fix the documentation to reflect this reality. Backpatch to all supported versions. Author: Nathan Bossart Reviewed-By: Laurenz Albe, David G. Johnston, Joe Conway Reported by: Nathan Bossart Discussion: https://postgr.es/m/flat/925134DB-8212-4F60-8AB1-B1231D750CB4%40amazon.com Backpatch-through: 9.6	5 years ago
Fujii Masao	b1be3074ac	postgres_fdw: Add option to control whether to keep connections open. This commit adds a new option keep_connections that controls whether postgres_fdw keeps the connections to the foreign server open so that the subsequent queries can re-use them. This option can only be specified for a foreign server. The default is on. If set to off, all connections to the foreign server will be discarded at the end of transaction. Closed connections will be re-established when they are necessary by future queries using a foreign table. This option is useful, for example, when users want to prevent the connections from eating up the foreign servers connections capacity. Author: Bharath Rupireddy Reviewed-by: Alexey Kondratov, Vignesh C, Fujii Masao Discussion: https://postgr.es/m/CALj2ACVvrp5=AVp2PupEm+nAC8S4buqR3fJMmaCoc7ftT0aD2A@mail.gmail.com	5 years ago
Michael Paquier	6fb66c268d	doc: Clarify how to generate backup files with non-exclusive backups The current instructions describing how to write the backup_label and tablespace_map files are confusing. For example, opening a file in text mode on Windows and copy-pasting the file's contents would result in a failure at recovery because of the extra CRLF characters generated. The documentation was not stating that clearly, and per discussion this is not considered as a supported scenario. This commit extends a bit the documentation to mention that it may be required to open the file in binary mode before writing its data. Reported-by: Wang Shenhao Author: David Steele Reviewed-by: Andrew Dunstan, Magnus Hagander Discussion: https://postgr.es/m/8373f61426074f2cb6be92e02f838389@G08CNEXMBPEKD06.g08.fujitsu.local Backpatch-through: 9.6	5 years ago
Bruce Momjian	2bda93f813	doc: mention that intervening major releases can be skipped Also mention that you should read the intervening major releases notes. This change was also applied to the website. Discussion: https://postgr.es/m/20210330144949.GA8259@momjian.us Backpatch-through: 9.6	5 years ago
David Rowley	9eacee2e62	Add Result Cache executor node (take 2) Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com	5 years ago
Stephen Frost	c9c41c7a33	Rename Default Roles to Predefined Roles The term 'default roles' wasn't quite apt as these roles aren't able to be modified or removed after installation, so rename them to be 'Predefined Roles' instead, adding an entry into the newly added Obsolete Appendix to help users of current releases find the new documentation. Bruce Momjian and Stephen Frost Discussion: https://postgr.es/m/157742545062.1149.11052653770497832538%40wrigleys.postgresql.org and https://www.postgresql.org/message-id/20201120211304.GG16415@tamriel.snowman.net	5 years ago
Heikki Linnakangas	ea1b99a661	Add 'noError' argument to encoding conversion functions. With the 'noError' argument, you can try to convert a buffer without knowing the character boundaries beforehand. The functions now need to return the number of input bytes successfully converted. This is is a backwards-incompatible change, if you have created a custom encoding conversion with CREATE CONVERSION. This adds a check to pg_upgrade for that, refusing the upgrade if there are any user-defined encoding conversions. Custom conversions are very rare, there are no commonly used extensions that I know of that uses that feature. No other objects can depend on conversions, so if you do have one, you can fairly easily drop it before upgrading, and recreate it after the upgrade with an updated version. Add regression tests for built-in encoding conversions. This doesn't cover every conversion, but it covers all the internal functions in conv.c that are used to implement the conversions. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi	5 years ago
Michael Paquier	ffd3391ea9	doc: Clarify use of ACCESS EXCLUSIVE lock in various sections Some sections of the documentation used "exclusive lock" to describe that an ACCESS EXCLUSIVE lock is taken during a given operation. This can be confusing to the reader as ACCESS SHARE is allowed with an EXCLUSIVE lock is used, but that would not be the case with what is described on those parts of the documentation. Author: Greg Rychlewski Discussion: https://postgr.es/m/CAKemG7VptD=7fNWckFMsMVZL_zzvgDO6v2yVmQ+ZiBfc_06kCQ@mail.gmail.com Backpatch-through: 9.6	5 years ago
Amit Kapila	4778826532	Ensure to send a prepare after we detect concurrent abort during decoding. It is possible that while decoding a prepared transaction, it gets aborted concurrently via a ROLLBACK PREPARED command. In that case, we were skipping all the changes and directly sending Rollback Prepared when we find the same in WAL. However, the downstream has no idea of the GID of such a transaction. So, ensure to send prepare even when a concurrent abort is detected. Author: Ajin Cherian Reviewed-by: Markus Wanner, Amit Kapila Discussion: https://postgr.es/m/f82133c6-6055-b400-7922-97dae9f2b50b@enterprisedb.com	5 years ago
David Rowley	28b3e3905c	Revert `b6002a796` This removes "Add Result Cache executor node". It seems that something weird is going on with the tracking of cache hits and misses as highlighted by many buildfarm animals. It's not yet clear what the problem is as other parts of the plan indicate that the cache did work correctly, it's just the hits and misses that were being reported as 0. This is especially a bad time to have the buildfarm so broken, so reverting before too many more animals go red. Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com	5 years ago
David Rowley	b6002a796d	Add Result Cache executor node Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com	5 years ago

1 2 3 4 5 ...

15435 Commits (60f1f09ff44308667ef6c72fbafd68235e55ae27)