Coverity complained that we weren't doing that, and it's right.
This fix just makes fmtIdEnc() honor the general convention that OOM
causes a PQExpBuffer to become marked "broken", without any immediate
error. In the pretty-unlikely case that we actually did hit OOM here,
the end result would be to return an empty string to the caller,
probably resulting in invalid SQL syntax in an issued command (if
nothing else went wrong, which is even more unlikely). It's tempting
to throw an "out of memory" error if the buffer becomes broken, but
there's not a lot of point in doing that only here and not in hundreds
of other PQExpBuffer-using places in pg_dump and similar callers.
The whole issue could do with some non-time-crunched redesign, perhaps.
This is a followup to the fixes for CVE-2025-1094, and should be
included if cherry-picking those fixes.
Instead of dropping the trailing byte(s) of an invalid or incomplete
multibyte character, replace only the first byte with a known-invalid
sequence, and process the rest normally. This seems less likely to
confuse incautious callers than the behavior adopted in 5dc1e42b4.
While we're at it, adjust PQescapeStringInternal to produce at most
one bleat about invalid multibyte characters per string. This
matches the behavior of PQescapeInternal, and avoids the risk of
producing tons of repetitive junk if a long string is simply given
in the wrong encoding.
This is a followup to the fixes for CVE-2025-1094, and should be
included if cherry-picking those fixes.
Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Jeff Davis <pgsql@j-davis.com>
Discussion: https://postgr.es/m/20250215012712.45@rfd.leadboat.com
Backpatch-through: 13
We need to tell fmtId() what encoding to assume, but this function
doesn't know that. Fortunately we can fix that without changing the
function's API, because we can just use SQL_ASCII. That's because
database names in connection requests are effectively binary not text:
no encoding-aware processing will happen on them.
This fixes XversionUpgrade failures seen in the buildfarm. The
alternative of having pg_upgrade use setFmtEncoding() is unappetizing,
given that it's connecting to multiple databases that may have
different encodings.
Andres Freund, Noah Misch, Tom Lane
Security: CVE-2025-1094
Previously invalidly encoded input to various escaping functions could lead to
the escaped string getting incorrectly parsed by psql. To be safe, escaping
functions need to ensure that neither invalid nor incomplete multi-byte
characters can be used to "escape" from being quoted.
Functions which can report errors now return an error in more cases than
before. Functions that cannot report errors now replace invalid input bytes
with a byte sequence that cannot be used to escape the quotes and that is
guaranteed to error out when a query is sent to the server.
The following functions are fixed by this commit:
- PQescapeLiteral()
- PQescapeIdentifier()
- PQescapeString()
- PQescapeStringConn()
- fmtId()
- appendStringLiteral()
Reported-by: Stephen Fewer <stephen_fewer@rapid7.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Security: CVE-2025-1094
This commit adds fmtIdEnc() and fmtQualifiedIdEnc(), which allow to specify
the encoding as an explicit argument. Additionally setFmtEncoding() is
provided, which defines the encoding when no explicit encoding is provided, to
avoid breaking all code using fmtId().
All users of fmtId()/fmtQualifiedId() are either converted to the explicit
version or a call to setFmtEncoding() has been added.
This commit does not yet utilize the now well-defined encoding, that will
happen in a subsequent commit.
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Security: CVE-2025-1094
Since commit 2549f0661, we reject an identifier immediately following
a numeric literal (without separating whitespace), because that risks
ambiguity with hex/octal/binary integers. However, that patch used
token patterns like "{integer}{ident_start}", which is problematic
because {ident_start} matches only a single byte. If the first
character after the integer is a multibyte character, this ends up
with flex reporting an error message that includes a partial multibyte
character. That can cause assorted bad-encoding problems downstream,
both in the report to the client and in the postmaster log file.
To fix, use {identifier} not {ident_start} in the "junk" token
patterns, so that they will match complete multibyte characters.
This seems generally better user experience quite aside from the
encoding problem: for "123abc" the error message will now say that
the error appeared at or near "123abc" instead of "123a".
While at it, add some commentary about why these patterns exist
and how they work.
Report and patch by Karina Litskevich; review by Pavel Borisov.
Back-patch to v15 where the problem came in.
Discussion: https://postgr.es/m/CACiT8iZ_diop=0zJ7zuY3BXegJpkKK1Av-PU7xh0EDYHsa5+=g@mail.gmail.com
Commit faff8f8e47 allowed integer literals to contain underscores, but
failed to update the lexer's "numericfail" rule. As a result, a
decimal integer literal containing underscores would fail to parse, if
used in an integer range with no whitespace after the first number,
such as "1_001..1_003" in a PL/pgSQL FOR loop.
Fix and backpatch to v16, where support for underscores in integer
literals was added.
Report and patch by Erik Wienhold.
Discussion: https://postgr.es/m/808ce947-46ec-4628-85fa-3dd600b2c154%40ewie.name
Underscores were added to numeric literals in faff8f8e47. This change
also affected the positional parameters (e.g., $1) rule, which uses
the same production for its digits. But this did not actually work,
because the digits for parameters are processed using atol(), which
does not handle underscores and ignores whatever it cannot parse.
The underscores notation is probably not useful for positional
parameters, so for simplicity revert that rule to its old form that
only accepts digits 0-9.
Author: Erik Wienhold <ewie@ewie.name>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/5d216d1c-91f6-4cbe-95e2-b4cbd930520c%40ewie.name
Starting on 2023-08-03, this intermittently terminated a "pgbench -C"
test in CI. It could affect a high-client-count "pgbench" without "-C".
While parallel reindexdb and vacuumdb reach the same problematic check,
sufficient client count and/or connection turnover is less plausible for
them. Given the lack of examples from the buildfarm or from manual
builds, reproducing this must entail rare operating system
configurations. Also correct the associated error message, which was
wrong for non-Windows. Back-patch to v12, where the pgbench check first
appeared. While v11 vacuumdb has the problematic check, reaching it
with typical vacuumdb usage is implausible.
Reviewed by Thomas Munro.
Discussion: https://postgr.es/m/CA+hUKG+JwvTNdcyJTriy9BbtzF1veSRQ=9M_ZKFn9_LqE7Kp7Q@mail.gmail.com
Run pgindent, pgperltidy, and reformat-dat-files.
This set of diffs is a bit larger than typical. We've updated to
pg_bsd_indent 2.1.2, which properly indents variable declarations that
have multi-line initialization expressions (the continuation lines are
now indented one tab stop). We've also updated to perltidy version
20230309 and changed some of its settings, which reduces its desire to
add whitespace to lines to make assignments etc. line up. Going
forward, that should make for fewer random-seeming changes to existing
code.
Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
This reverts commits 24c35ec and 57169ad. PreRestoreCommand() and
PostRestoreCommand() need to be put closer to the system() call calling
a restore_command, as they enable in_restore_command for the startup
process which would in turn trigger an immediate proc_exit() in the
SIGTERM handler. Perhaps we could get rid of this behavior entirely,
but 24c35ec has made the window where the flag is enabled much larger
than it was, and any Postgres-like actions (palloc, etc.) taken by code
paths while the flag is enabled could lead to more severe issues in the
shutdown processing.
Note that curculio has showed that there are much more problems in this
area, unrelated to this change, actually, hence the issues related to
that had better be addressed first. Keeping the code of HEAD in line
with the stable branches should make that a bit easier.
Per discussion with Andres Freund and Nathan Bossart.
Discussion: https://postgr.es/m/Y979NR3U5VnWrTwB@paquier.xyz
This allows underscores to be used in integer and numeric literals,
and their corresponding type input functions, for visual grouping.
For example:
1_500_000_000
3.14159_26535_89793
0xffff_ffff
0b_1001_0001
A single underscore is allowed between any 2 digits, or immediately
after the base prefix indicator of non-decimal integers, per SQL:202x
draft.
Peter Eisentraut and Dean Rasheed
Discussion: https://postgr.es/m/84aae844-dc55-a4be-86d9-4f0fa405cc97%40enterprisedb.com
Presently, restore_command uses a different code path than
archive_cleanup_command and recovery_end_command. These code paths
are similar and can be easily combined, as long as it is possible to
identify if a command should:
- Issue a FATAL on signal.
- Exit immediately on SIGTERM.
While on it, this removes src/common/archive.c and its associated
header. Since the introduction of c96de2c, BuildRestoreCommand() has
become a simple wrapper of replace_percent_placeholders() able to call
make_native_path(). This simplifies shell_restore.c as long as
RestoreArchivedFile() includes a call to make_native_path().
Author: Nathan Bossart
Reviewed-by: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/20221227192449.GA3672473@nathanxps13
There are a number of places where a shell command is constructed with
percent-placeholders (like %x). It's cumbersome to have to open-code
this several times. This factors out this logic into a separate
function. This also allows us to ensure consistency for and document
some subtle behaviors, such as what to do with unrecognized
placeholders.
The unified handling is now that incorrect and unknown placeholders
are an error, where previously in most cases they were skipped or
ignored. This affects the following settings:
- archive_cleanup_command
- archive_command
- recovery_end_command
- restore_command
- ssl_passphrase_command
The following settings are part of this refactoring but already had
stricter error handling and should be unchanged in their behavior:
- basebackup_to_shell.command
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5238bbed-0b01-83a6-d4b2-7eb0562a054e%40enterprisedb.com
VACUUM normally ends by running vac_update_datfrozenxid(), which
requires a scan of pg_class. Therefore, if one attempts to vacuum a
database one table at a time --- as vacuumdb has done since v12 ---
we will spend O(N^2) time in vac_update_datfrozenxid(). That causes
serious performance problems in databases with tens of thousands of
tables, and indeed the effect is measurable with only a few hundred.
To add insult to injury, only one process can run
vac_update_datfrozenxid at the same time per DB, so this behavior
largely defeats vacuumdb's -j option.
Hence, invent options SKIP_DATABASE_STATS and ONLY_DATABASE_STATS
to allow applications to postpone vac_update_datfrozenxid() until the
end of a series of VACUUM requests, and teach vacuumdb to use them.
Per bug #17717 from Gunnar L. Sadly, this answer doesn't seem
like something we'd consider back-patching, so the performance
problem will remain in v12-v15.
Tom Lane and Nathan Bossart
Discussion: https://postgr.es/m/17717-6c50eb1c7d23a886@postgresql.org
Add support for hexadecimal, octal, and binary integer literals:
0x42F
0o273
0b100101
per SQL:202x draft.
This adds support in the lexer as well as in the integer type input
functions.
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
This substantially speeds up building for windows, due to the vast amount of
headers included via windows.h. A cross build from linux targetting mingw goes
from
994.11user 136.43system 0:31.58elapsed 3579%CPU
to
422.41user 89.05system 0:14.35elapsed 3562%CPU
The wins on windows are similar-ish (but I don't have a system at hand just
now for actual numbers). Targetting other operating systems the wins are far
smaller (tested linux, macOS, FreeBSD).
For now precompiled headers are disabled by default, it's not clear how well
they work on all platforms. E.g. on FreeBSD gcc doesn't seem to have working
support, but clang does.
When doing a full build precompiled headers are only beneficial for targets
with multiple .c files, as meson builds a separate precompiled header for each
target (so that different compilation options take effect). This commit
therefore only changes target with at least two .c files to use precompiled
headers.
Because this commit adds b_pch=false to the default_options new build
directories will have precompiled headers disabled by default, however
existing build directories will continue use the default value of b_pch, which
is true.
Note that using precompiled headers with ccache requires setting
CCACHE_SLOPPINESS=pch_defines,time_macros to get hits.
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CA+hUKG+50eOUbN++ocDc0Qnp9Pvmou23DSXu=ZA6fepOcftKqA@mail.gmail.com
Discussion: https://postgr.es/m/c5736f70-bb6d-8d25-e35c-e3d886e4e905@enterprisedb.com
Discussion: https://postgr.es/m/20190826054000.GE7005%40paquier.xyz
When using precompiled headers, we cannot pre-define macros for the system
headers from within .c files, as headers are already processed before
the #define in the C file is reached. But we can pre-define using
-DFD_SETSIZE, as long as that's also used when building the precompiled header.
A few files #define FD_SETSIZE 1024 on windows, as the default is only 64. I
am hesitant to change FD_SETSIZE globally on windows, due to
src/backend/port/win32/socket.c using it to size on-stack arrays. Instead add
-DFD_SETSIZE=1024 when building the specific targets needing it.
We likely should move away from using select() in those places, but that's a
larger change.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20221005190829.lda7ttalh4mzrvf4@awork3.anarazel.de
Discussion: https://postgr.es/m/CA+hUKG+50eOUbN++ocDc0Qnp9Pvmou23DSXu=ZA6fepOcftKqA@mail.gmail.com
Discussion: https://postgr.es/m/20190826054000.GE7005%40paquier.xyz
There are still some alignment-related failures in the buildfarm,
which might or might not be able to be fixed quickly, but I've also
just realized that it increased the size of many WAL records by 4 bytes
because a block reference contains a RelFileLocator. The effect of that
hasn't been studied or discussed, so revert for now.
RelFileNumbers are now assigned using a separate counter, instead of
being assigned from the OID counter. This counter never wraps around:
if all 2^56 possible RelFileNumbers are used, an internal error
occurs. As the cluster is limited to 2^64 total bytes of WAL, this
limitation should not cause a problem in practice.
If the counter were 64 bits wide rather than 56 bits wide, we would
need to increase the width of the BufferTag, which might adversely
impact buffer lookup performance. Also, this lets us use bigint for
pg_class.relfilenode and other places where these values are exposed
at the SQL level without worrying about overflow.
This should remove the need to keep "tombstone" files around until
the next checkpoint when relations are removed. We do that to keep
RelFileNumbers from being recycled, but now that won't happen
anyway. However, this patch doesn't actually change anything in
this area; it just makes it possible for a future patch to do so.
Dilip Kumar, based on an idea from Andres Freund, who also reviewed
some earlier versions of the patch. Further review and some
wordsmithing by me. Also reviewed at various points by Ashutosh
Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane.
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Autoconf is showing its age, fewer and fewer contributors know how to wrangle
it. Recursive make has a lot of hard to resolve dependency issues and slow
incremental rebuilds. Our home-grown MSVC build system is hard to maintain for
developers not using Windows and runs tests serially. While these and other
issues could individually be addressed with incremental improvements, together
they seem best addressed by moving to a more modern build system.
After evaluating different build system choices, we chose to use meson, to a
good degree based on the adoption by other open source projects.
We decided that it's more realistic to commit a relatively early version of
the new build system and mature it in tree.
This commit adds an initial version of a meson based build system. It supports
building postgres on at least AIX, FreeBSD, Linux, macOS, NetBSD, OpenBSD,
Solaris and Windows (however only gcc is supported on aix, solaris). For
Windows/MSVC postgres can now be built with ninja (faster, particularly for
incremental builds) and msbuild (supporting the visual studio GUI, but
building slower).
Several aspects (e.g. Windows rc file generation, PGXS compatibility, LLVM
bitcode generation, documentation adjustments) are done in subsequent commits
requiring further review. Other aspects (e.g. not installing test-only
extensions) are not yet addressed.
When building on Windows with msbuild, builds are slower when using a visual
studio version older than 2019, because those versions do not support
MultiToolTask, required by meson for intra-target parallelism.
The plan is to remove the MSVC specific build system in src/tools/msvc soon
after reaching feature parity. However, we're not planning to remove the
autoconf/make build system in the near future. Likely we're going to keep at
least the parts required for PGXS to keep working around until all supported
versions build with meson.
Some initial help for postgres developers is at
https://wiki.postgresql.org/wiki/Meson
With contributions from Thomas Munro, John Naylor, Stone Tickle and others.
Author: Andres Freund <andres@anarazel.de>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-By: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/20211012083721.hvixq4pnh2pixr3j@alap3.anarazel.de
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in optimizer, parser,
utility, libpq, and "commands" code, as well as in remaining library
code. Do the same for all code related to frontend programs (with the
exception of pg_dump/pg_dumpall related code).
Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy. Later commits will handle
ecpg and pg_dump/pg_dumpall.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
More than twenty years ago (79fcde48b), we hacked the postmaster
to avoid a core-dump on systems that didn't support fflush(NULL).
We've mostly, though not completely, hewed to that rule ever since.
But such systems are surely gone in the wild, so in the spirit of
cleaning out no-longer-needed portability hacks let's get rid of
multiple per-file fflush() calls in favor of using fflush(NULL).
Also, we were fairly inconsistent about whether to fflush() before
popen() and system() calls. While we've received no bug reports
about that, it seems likely that at least some of these call sites
are at risk of odd behavior, such as error messages appearing in
an unexpected order. Rather than expend a lot of brain cells
figuring out which places are at hazard, let's just establish a
uniform coding rule that we should fflush(NULL) before these calls.
A no-op fflush() is surely of trivial cost compared to launching
a sub-process via a shell; while if it's not a no-op then we likely
need it.
Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us
While waiting for slots to become available in wait_on_slots() in
parallel_slot.c, the cancellation always relied on the first connection
in the set to do the job. This could cause problems when this slot's
socket is gone as PQgetCancel() would return NULL in this case. Rather
than always using the first connection, this changes the logic to use
the first valid connection for the cancellation.
Author: Ranier Vilela
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAEudQAokk1h_pUwGXsYS4oVOuf35s1O2o3TXGHpV8=AWikvgHA@mail.gmail.com
Backpatch-through: 14
The setting controls tha maximum length of the header line in expanded
format output. Possible settings are full, column, page, or an integer.
the default is full, the current behaviour, and in this case the header
line is the length of the widest line of output. column causes the
header to be truncated to the width of the first column, page causes it
to be truncated to the width of the terminal page, and an integer causes
it to be truncated to that value. If the full value is less than the
page or integer value no truncation occurs. If given without an argument
this option prints its current setting.
Platon Pronko, somewhat modified by me.
Discussion: https://postgr.es/m/f03d38a3-db96-a56e-d1bc-dbbc80bbde4d@gmail.com
psql, pg_dump, and pg_amcheck share code to process object name
patterns like 'foo*.bar*' to match all tables with names starting in
'bar' that are in schemas starting with 'foo'. Before v14, any number
of extra name parts were silently ignored, so a command line '\d
foo.bar.baz.bletch.quux' was interpreted as '\d bletch.quux'. In v14,
as a result of commit 2c8726c4b0, we
instead treated this as a request for table quux in a schema named
'foo.bar.baz.bletch'. That caused problems for people like Justin
Pryzby who were accustomed to copying strings of the form
db.schema.table from messages generated by PostgreSQL itself and using
them as arguments to \d.
Accordingly, revise things so that if an object name pattern contains
more parts than we're expecting, we throw an error, unless there's
exactly one extra part and it matches the current database name.
That way, thisdb.myschema.mytable is accepted as meaning just
myschema.mytable, but otherdb.myschema.mytable is an error, and so
is some.random.garbage.myschema.mytable.
Mark Dilger, per report from Justin Pryzby and discussion among
various people.
Discussion: https://www.postgresql.org/message-id/20211013165426.GD27491%40telsasoft.com
Get rid of the separate "FATAL" log level, as it was applied
so inconsistently as to be meaningless. This mostly involves
s/pg_log_fatal/pg_log_error/g.
Create a macro pg_fatal() to handle the common use-case of
pg_log_error() immediately followed by exit(1). Various
modules had already invented either this or equivalent macros;
standardize on pg_fatal() and apply it where possible.
Invent the ability to add "detail" and "hint" messages to a
frontend message, much as we have long had in the backend.
Except where rewording was needed to convert existing coding
to detail/hint style, I have (mostly) resisted the temptation
to change existing message wording.
Patch by me. Design and patch reviewed at various stages by
Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and
Daniel Gustafsson.
Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
Plain \dconfig is basically equivalent to SHOW except that you can
give it a pattern with wildcards, either to match multiple GUCs or
because you don't exactly remember the name you want.
\dconfig+ adds type, context, and access-privilege information,
mainly because every other kind of object privilege has a psql command
to show it, so GUC privileges should too. (A form of this command was
in some versions of the patch series leading up to commit a0ffa885e.
We pulled it out then because of doubts that the design and code were
up to snuff, but I think subsequent work has resolved that.)
In passing, fix incorrect completion of GUC names in GRANT/REVOKE
ON PARAMETER: a0ffa885e neglected to use the VERBATIM form of
COMPLETE_WITH_QUERY, so it misbehaved for custom (qualified) GUC
names.
Mark Dilger and Tom Lane
Discussion: https://postgr.es/m/3118455.1649267333@sss.pgh.pa.us
When serialization or deadlock errors are reported by backend, allow
to retry and continue the benchmarking. For this purpose new options
"--max-tries", "--failures-detailed" and "--verbose-errors" are added.
Transactions with serialization errors or deadlock errors will be
repeated after rollbacks until they complete successfully or reach the
maximum number of tries (specified by the --max-tries option), or the
maximum time of tries (specified by the --latency-limit option).
These options can be specified at the same time. It is not possible to
use an unlimited number of tries (--max-tries=0) without the
--latency-limit option or the --time option. By default the option
--max-tries is set to 1, which means transactions with
serialization/deadlock errors are not retried. If the last try fails,
this transaction will be reported as failed, and the client variables
will be set as they were before the first run of this transaction.
Statistics on retries and failures are printed in the progress,
transaction / aggregation logs and in the end with other results (all
and for each script). Also retries and failures are printed
per-command with average latency by using option
(--report-per-command, -r).
Option --failures-detailed prints group failures by basic types
(serialization failures / deadlock failures).
Option --verbose-errors prints distinct reports on errors and failures
(errors without retrying) by type with detailed information like which
limit for retries was violated and how far it was exceeded for the
serialization/deadlock failures.
Patch originally written by Marina Polyakova then Yugo Nagata
inherited the discussion and heavily modified the patch to make it
commitable.
Authors: Yugo Nagata, Marina Polyakova
Reviewed-by: Fabien Coelho, Tatsuo Ishii, Alvaro Herrera, Kevin Grittner, Andres Freund, Arthur Zakirov, Alexander Korotkov, Teodor Sigaev, Ildus Kurbangaliev
Discussion: https://postgr.es/m/flat/72a0d590d6ba06f242d75c2e641820ec%40postgrespro.ru
Most of these are cases where we could call memcpy() or other libc
functions with a NULL pointer and a zero count, which is forbidden
by POSIX even though every production version of libc allows it.
We've fixed such things before in a piecemeal way, but apparently
never made an effort to try to get them all. I don't claim that
this patch does so either, but it gets every failure I observe in
check-world, using clang 12.0.1 on current RHEL8.
numeric.c has a different issue that the sanitizer doesn't like:
"ln(-1.0)" will compute log10(0) and then try to assign the
resulting -Inf to an integer variable. We don't actually use the
result in such a case, so there's no live bug.
Back-patch to all supported branches, with the idea that we might
start running a buildfarm member that tests this case. This includes
back-patching c1132aae3 (Check the size in COPY_POINTER_FIELD),
which previously silenced some of these issues in copyfuncs.c.
Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com
After this, the PostgreSQL lexers no longer accept numeric literals
with trailing non-digits, such as 123abc, which would be scanned as
two tokens: 123 and abc. This is undocumented and surprising, and it
might also interfere with some extended numeric literal syntax being
contemplated for the future.
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
It seems highly unlikely that gettext() can be relied on to be
async-signal-safe. psql used to understand that, but someone got
it wrong long ago in the src/bin/scripts/ version of handle_sigint,
and then the bad idea was perpetuated when those two versions were
unified into src/fe_utils/cancel.c.
I'm unsure why there have not been field complaints about this
... maybe gettext() is signal-safe once it's translated at least
one message? But we have no business assuming any such thing.
In cancel.c (v13 and up), I preserved our ability to localize
"Cancel request sent" messages by invoking gettext() before
the signal handler is set up. In earlier branches I just made
src/bin/scripts/ not localize those messages, as psql did then.
(Just for extra unsafety, the src/bin/scripts/ version was
invoking fprintf() from a signal handler. Sigh.)
Noted while fixing signal-safety issues in PQcancel() itself.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/2937814.1641960929@sss.pgh.pa.us
Throw away most of the existing logic for this, as it was very
inefficient thanks to expensive sub-selects executed to collect
ACL data that we very possibly would have no interest in dumping.
Reduce the ACL handling in the initial per-object-type queries
to be just collection of the catalog ACL fields, as it was
originally. Fetch pg_init_privs data separately in a single
scan of that catalog, and do the merging calculations on the
client side. Remove the separate code path used for pre-9.6
source servers; there is no good reason to treat them differently
from newer servers that happen to have empty pg_init_privs.
Discussion: https://postgr.es/m/2273648.1634764485@sss.pgh.pa.us
Discussion: https://postgr.es/m/7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc
psql's lexer has historically deleted dash-dash (single-line) comments
from what's collected and sent to the server. This is inconsistent
with what it does for slash-star comments, and people have complained
before that they wish such comments would be captured in the server log.
Undoing the decision completely seems like too big a behavioral change,
however. In particular, comments on lines preceding the start of a
query are generally not thought of as being part of that query.
What we can do to improve the situation is to capture comments that
are clearly *within* a query, that is after the first non-whitespace,
non-comment token but before the query's ending semicolon or backslash
command. This is a nearly trivial code change, and it affects only a
few regression test results.
(It is tempting to try to apply the same rule to slash-star comments.
But it's hard to see how to do that without getting strange history
behavior for comments that cross lines, especially if the user then
starts a new query on the same line as the star-slash. In view of
the lack of complaints, let's leave that case alone.)
Discussion: https://postgr.es/m/CAJcOf-cAdMVr7azeYR7nWKsNp7qhORzc84rV6d7m7knG5Hrtsw@mail.gmail.com
The unicode characters, while in comments and not code, caused MSVC
to emit compiler warning C4819:
The file contains a character that cannot be represented in the
current code page (number). Save the file in Unicode format to
prevent data loss.
Fix by replacing the characters in print.c with descriptive comments
containing the codepoints and symbol names, and remove the character
in brin_bloom.c which was a footnote reference copied from the paper
citation.
Per report from hamerkop in the buildfarm.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/340E4118-0D0C-4E85-8141-8C40EB22DA3A@yesql.se
strtoint(), via strtol(), would skip leading whitespaces but the same
rule was not applied for trailing whitespaces, leading to an
inconsistent behavior. Some tests are changed to cover more this area.
Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/YP5Pv0d13Ct+03ve@paquier.xyz
Most of the integer options for command-line binaries now make use of a
single routine able to do the job, fixing issues with the detection of
sloppy values caused for example by the use of atoi(), that fails on
strings beginning with numerical characters with junk trailing
characters.
This commit cuts down the number of strings requiring translation by 26
per my count, switching the code to have two error types for invalid and
out-of-range values instead.
Much more could be done here, with float or even int64 options, but
int32 was the most appealing case as it is possible to rely on strtol()
to do the job reliably. Note that there are some exceptions for now,
like pg_ctl or pg_upgrade that use their own logging logic. A couple of
negative TAP tests required some adjustments for the new errors
generated.
pg_dump and pg_restore tracked the maximum number of parallel jobs
within the option parsing. The code is refactored a bit to track that
in the code dedicated to parallelism instead.
Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: David Rowley, Álvaro Herrera
Discussion: https://postgr.es/m/CALj2ACXqdG9WhqVoJ9zYf-iZt7sgK7Szv5USs=he6NnWQ2ofTA@mail.gmail.com
An incorrectly-encoded multibyte character near the end of a string
could cause various processing loops to run past the string's
terminating NUL, with results ranging from no detectable issue to
a program crash, depending on what happens to be in the following
memory.
This isn't an issue in the server, because we take care to verify
the encoding of strings before doing any interesting processing
on them. However, that lack of care leaked into client-side code
which shouldn't assume that anyone has validated the encoding of
its input.
Although this is certainly a bug worth fixing, the PG security team
elected not to regard it as a security issue, primarily because
any untrusted text should be sanitized by PQescapeLiteral or
the like before being incorporated into a SQL or psql command.
(If an app fails to do so, the same technique can be used to
cause SQL injection, with probably much more dire consequences
than a mere client-program crash.) Those functions were already
made proof against this class of problem, cf CVE-2006-2313.
To fix, invent PQmblenBounded() which is like PQmblen() except it
won't return more than the number of bytes remaining in the string.
In HEAD we can make this a new libpq function, as PQmblen() is.
It seems imprudent to change libpq's API in stable branches though,
so in the back branches define PQmblenBounded as a macro in the files
that need it. (Note that just changing PQmblen's behavior would not
be a good idea; notably, it would completely break the escaping
functions' defense against this exact problem. So we just want a
version for those callers that don't have any better way of handling
this issue.)
Per private report from houjingyi. Back-patch to all supported branches.