This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()). We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions. To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c. This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need. There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates. Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
are listed in <xref linkend="functions-orderedset-table"> and
<xref linkend="functions-hypothetical-table">.
The special syntax considerations for aggregate
functions are explained in <xref linkend="syntax-aggregates">.
Consult <xref linkend="tutorial-agg"> for additional introductory
@ -12307,6 +12310,290 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
</tgroup>
</table>
<para>
<xref linkend="functions-orderedset-table"> shows some
aggregate functions that use the <firstterm>ordered-set aggregate</>
syntax. These functions are sometimes referred to as <quote>inverse
distribution</> functions.
</para>
<indexterm>
<primary>ordered-set aggregate</primary>
<secondary>built-in</secondary>
</indexterm>
<indexterm>
<primary>inverse distribution</primary>
</indexterm>
<table id="functions-orderedset-table">
<title>Ordered-Set Aggregate Functions</title>
<tgroup cols="5">
<thead>
<row>
<entry>Function</entry>
<entry>Direct Argument Type(s)</entry>
<entry>Aggregated Argument Type(s)</entry>
<entry>Return Type</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<indexterm>
<primary>mode</primary>
<secondary>statistical</secondary>
</indexterm>
<function>mode() WITHIN GROUP (ORDER BY <replaceable class="parameter">sort_expression</replaceable>)</function>
</entry>
<entry>
</entry>
<entry>
any sortable type
</entry>
<entry>
same as sort expression
</entry>
<entry>
returns the most frequent input value (arbitrarily choosing the first
one if there are multiple equally-frequent results)
</entry>
</row>
<row>
<entry>
<indexterm>
<primary>percentile</primary>
<secondary>continuous</secondary>
</indexterm>
<indexterm>
<primary>median</primary>
</indexterm>
<function>percentile_cont(<replaceable class="parameter">fraction</replaceable>) WITHIN GROUP (ORDER BY <replaceable class="parameter">sort_expression</replaceable>)</function>
</entry>
<entry>
<type>double precision</type>
</entry>
<entry>
<type>double precision</type> or <type>interval</type>
</entry>
<entry>
same as sort expression
</entry>
<entry>
continuous percentile: returns a value corresponding to the specified
fraction in the ordering, interpolating between adjacent input items if
needed
</entry>
</row>
<row>
<entry>
<function>percentile_cont(<replaceable class="parameter">fractions</replaceable>) WITHIN GROUP (ORDER BY <replaceable class="parameter">sort_expression</replaceable>)</function>
</entry>
<entry>
<type>double precision[]</type>
</entry>
<entry>
<type>double precision</type> or <type>interval</type>
</entry>
<entry>
array of sort expression's type
</entry>
<entry>
multiple continuous percentile: returns an array of results matching
the shape of the <literal>fractions</literal> parameter, with each
non-null element replaced by the value corresponding to that percentile
</entry>
</row>
<row>
<entry>
<indexterm>
<primary>percentile</primary>
<secondary>discrete</secondary>
</indexterm>
<function>percentile_disc(<replaceable class="parameter">fraction</replaceable>) WITHIN GROUP (ORDER BY <replaceable class="parameter">sort_expression</replaceable>)</function>
</entry>
<entry>
<type>double precision</type>
</entry>
<entry>
any sortable type
</entry>
<entry>
same as sort expression
</entry>
<entry>
discrete percentile: returns the first input value whose position in
the ordering equals or exceeds the specified fraction
</entry>
</row>
<row>
<entry>
<function>percentile_disc(<replaceable class="parameter">fractions</replaceable>) WITHIN GROUP (ORDER BY <replaceable class="parameter">sort_expression</replaceable>)</function>
</entry>
<entry>
<type>double precision[]</type>
</entry>
<entry>
any sortable type
</entry>
<entry>
array of sort expression's type
</entry>
<entry>
multiple discrete percentile: returns an array of results matching the
shape of the <literal>fractions</literal> parameter, with each non-null
element replaced by the input value corresponding to that percentile
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
All the aggregates listed in <xref linkend="functions-orderedset-table">
ignore null values in their sorted input. For those that take
a <replaceable>fraction</replaceable> parameter, the fraction value must be
between 0 and 1; an error is thrown if not. However, a null fraction value
simply produces a null result.
</para>
<para>
Each of the aggregates listed in
<xref linkend="functions-hypothetical-table"> is associated with a
window function of the same name defined in
<xref linkend="functions-window">. In each case, the aggregate result
is the value that the associated window function would have
returned for the <quote>hypothetical</> row constructed from
<replaceable>args</replaceable>, if such a row had been added to the sorted
group of rows computed from the <replaceable>sorted_args</replaceable>.
<function>rank(<replaceable class="parameter">args</replaceable>) WITHIN GROUP (ORDER BY <replaceable class="parameter">sorted_args</replaceable>)</function>
</entry>
<entry>
<literal>VARIADIC</> <type>"any"</type>
</entry>
<entry>
<literal>VARIADIC</> <type>"any"</type>
</entry>
<entry>
<type>bigint</type>
</entry>
<entry>
rank of the hypothetical row, with gaps for duplicate rows
</entry>
</row>
<row>
<entry>
<indexterm>
<primary>dense_rank</primary>
<secondary>hypothetical</secondary>
</indexterm>
<function>dense_rank(<replaceable class="parameter">args</replaceable>) WITHIN GROUP (ORDER BY <replaceable class="parameter">sorted_args</replaceable>)</function>
</entry>
<entry>
<literal>VARIADIC</> <type>"any"</type>
</entry>
<entry>
<literal>VARIADIC</> <type>"any"</type>
</entry>
<entry>
<type>bigint</type>
</entry>
<entry>
rank of the hypothetical row, without gaps
</entry>
</row>
<row>
<entry>
<indexterm>
<primary>percent_rank</primary>
<secondary>hypothetical</secondary>
</indexterm>
<function>percent_rank(<replaceable class="parameter">args</replaceable>) WITHIN GROUP (ORDER BY <replaceable class="parameter">sorted_args</replaceable>)</function>
</entry>
<entry>
<literal>VARIADIC</> <type>"any"</type>
</entry>
<entry>
<literal>VARIADIC</> <type>"any"</type>
</entry>
<entry>
<type>double precision</type>
</entry>
<entry>
relative rank of the hypothetical row, ranging from 0 to 1
</entry>
</row>
<row>
<entry>
<indexterm>
<primary>cume_dist</primary>
<secondary>hypothetical</secondary>
</indexterm>
<function>cume_dist(<replaceable class="parameter">args</replaceable>) WITHIN GROUP (ORDER BY <replaceable class="parameter">sorted_args</replaceable>)</function>
</entry>
<entry>
<literal>VARIADIC</> <type>"any"</type>
</entry>
<entry>
<literal>VARIADIC</> <type>"any"</type>
</entry>
<entry>
<type>double precision</type>
</entry>
<entry>
relative rank of the hypothetical row, ranging from
1/<replaceable>N</> to 1
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
For each of these hypothetical-set aggregates, the list of direct arguments
given in <replaceable>args</replaceable> must match the number and types of
the aggregated arguments given in <replaceable>sorted_args</replaceable>.
Unlike most built-in aggregates, these aggregates are not strict, that is
they do not drop input rows containing nulls. Null values sort according
to the rule specified in the <literal>ORDER BY</> clause.
</para>
</sect1>
<sect1 id="functions-window">
@ -12332,9 +12619,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
</para>
<para>
In addition to these functions, any built-in or user-defined aggregate
function can be used as a window function (see
<xref linkend="functions-aggregate"> for a list of the built-in aggregates).
In addition to these functions, any built-in or user-defined normal
aggregate function (but not ordered-set or hypothetical-set aggregates)
can be used as a window function; see
<xref linkend="functions-aggregate"> for a list of the built-in aggregates.
Aggregate functions act as window functions only when an <literal>OVER</>
clause follows the call; otherwise they act as regular aggregates.
errhint("There is an ordered-set aggregate %s, but it requires %d direct arguments, not %d.",
NameListToString(funcname),
catDirectArgs,numDirectArgs),
parser_errposition(pstate,location)));
}
else
{
/*
*Ifit'svariadic,wehavetwocasesdependingonwhether
*theaggwas"... ORDER BY VARIADIC"or"..., VARIADIC ORDER
*BYVARIADIC". It's the latter if catDirectArgs equals
*pronargs;tosaveacataloglookup,wereverse-engineer
*pronargsfromtheinfowegotfromfunc_get_detail.
*/
intpronargs;
pronargs=nargs;
if(nvargs>1)
pronargs-=nvargs-1;
if(catDirectArgs<pronargs)
{
/* VARIADIC isn't part of direct args, so still easy */
if(numDirectArgs!=catDirectArgs)
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("function %s does not exist",
func_signature_string(funcname,nargs,
argnames,
actual_arg_types)),
errhint("There is an ordered-set aggregate %s, but it requires %d direct arguments, not %d.",
NameListToString(funcname),
catDirectArgs,numDirectArgs),
parser_errposition(pstate,location)));
}
else
{
/*
*Bothdirectandaggregatedargsweredeclaredvariadic.
*Forastandardordered-setaggregate,it'sokayaslong
*astherearen'ttoofewdirectargs.Fora
*hypothetical-setaggregate,weassumethatthe
*hypotheticalargumentsarethosethatmatchedthe
*variadicparameter;theremustbejustasmanyofthem
*asthereareaggregatedarguments.
*/
if(aggkind==AGGKIND_HYPOTHETICAL)
{
if(nvargs!=2*numAggregatedArgs)
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("function %s does not exist",
func_signature_string(funcname,nargs,
argnames,
actual_arg_types)),
errhint("To use the hypothetical-set aggregate %s, the number of hypothetical direct arguments (here %d) must match the number of ordering columns (here %d).",
NameListToString(funcname),
nvargs-numAggregatedArgs,numAggregatedArgs),
parser_errposition(pstate,location)));
}
else
{
if(nvargs<=numAggregatedArgs)
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("function %s does not exist",
func_signature_string(funcname,nargs,
argnames,
actual_arg_types)),
errhint("There is an ordered-set aggregate %s, but it requires at least %d direct arguments.",
NameListToString(funcname),
catDirectArgs),
parser_errposition(pstate,location)));
}
}
}
/* Check type matching of hypothetical arguments */
ERROR: WITHIN GROUP is required for ordered-set aggregate percentile_cont
LINE 1: select p, percentile_cont(p,p)
^
select percentile_cont(0.5) within group (order by b) from aggtest;
percentile_cont
------------------
53.4485001564026
(1 row)
select percentile_cont(0.5) within group (order by b), sum(b) from aggtest;
percentile_cont | sum
------------------+---------
53.4485001564026 | 431.773
(1 row)
select percentile_cont(0.5) within group (order by thousand) from tenk1;
percentile_cont
-----------------
499.5
(1 row)
select percentile_disc(0.5) within group (order by thousand) from tenk1;
percentile_disc
-----------------
499
(1 row)
select rank(3) within group (order by x)
from (values (1),(1),(2),(2),(3),(3),(4)) v(x);
rank
------
5
(1 row)
select cume_dist(3) within group (order by x)
from (values (1),(1),(2),(2),(3),(3),(4)) v(x);
cume_dist
-----------
0.875
(1 row)
select percent_rank(3) within group (order by x)
from (values (1),(1),(2),(2),(3),(3),(4),(5)) v(x);
percent_rank
--------------
0.5
(1 row)
select dense_rank(3) within group (order by x)
from (values (1),(1),(2),(2),(3),(3),(4)) v(x);
dense_rank
------------
3
(1 row)
select percentile_disc(array[0,0.1,0.25,0.5,0.75,0.9,1]) within group (order by thousand)
from tenk1;
percentile_disc
----------------------------
{0,99,249,499,749,899,999}
(1 row)
select percentile_cont(array[0,0.25,0.5,0.75,1]) within group (order by thousand)
from tenk1;
percentile_cont
-----------------------------
{0,249.75,499.5,749.25,999}
(1 row)
select percentile_disc(array[[null,1,0.5],[0.75,0.25,null]]) within group (order by thousand)
from tenk1;
percentile_disc
---------------------------------
{{NULL,999,499},{749,249,NULL}}
(1 row)
select percentile_cont(array[0,1,0.25,0.75,0.5,1]) within group (order by x)
from generate_series(1,6) x;
percentile_cont
-----------------------
{1,6,2.25,4.75,3.5,6}
(1 row)
select ten, mode() within group (order by string4) from tenk1 group by ten;
ten | mode
-----+--------
0 | HHHHxx
1 | OOOOxx
2 | VVVVxx
3 | OOOOxx
4 | HHHHxx
5 | HHHHxx
6 | OOOOxx
7 | AAAAxx
8 | VVVVxx
9 | VVVVxx
(10 rows)
select percentile_disc(array[0.25,0.5,0.75]) within group (order by x)
from unnest('{fred,jim,fred,jack,jill,fred,jill,jim,jim,sheila,jim,sheila}'::text[]) u(x);
percentile_disc
-----------------
{fred,jill,jim}
(1 row)
-- check collation propagates up in suitable cases:
select pg_collation_for(percentile_disc(1) within group (order by x collate "POSIX"))
from (values ('fred'),('jim')) v(x);
pg_collation_for
------------------
"POSIX"
(1 row)
-- ordered-set aggs created with CREATE AGGREGATE
select test_rank(3) within group (order by x)
from (values (1),(1),(2),(2),(3),(3),(4)) v(x);
test_rank
-----------
5
(1 row)
select test_percentile_disc(0.5) within group (order by thousand) from tenk1;
test_percentile_disc
----------------------
499
(1 row)
-- ordered-set aggs can't use ungrouped vars in direct args:
select rank(x) within group (order by x) from generate_series(1,5) x;
ERROR: column "x.x" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select rank(x) within group (order by x) from generate_serie...
^
DETAIL: Direct arguments of an ordered-set aggregate must use only grouped columns.
-- outer-level agg can't use a grouped arg of a lower level, either:
select array(select percentile_disc(a) within group (order by x)
from (values (0.3),(0.7)) v(a) group by a)
from generate_series(1,5) g(x);
ERROR: outer-level aggregate cannot contain a lower-level variable in its direct arguments
LINE 1: select array(select percentile_disc(a) within group (order b...
^
-- agg in the direct args is a grouping violation, too:
select rank(sum(x)) within group (order by x) from generate_series(1,5) x;
ERROR: aggregate function calls cannot be nested
LINE 1: select rank(sum(x)) within group (order by x) from generate_...
^
-- hypothetical-set type unification and argument-count failures:
select rank(3) within group (order by x) from (values ('fred'),('jim')) v(x);
ERROR: WITHIN GROUP types text and integer cannot be matched
LINE 1: select rank(3) within group (order by x) from (values ('fred...
^
select rank(3) within group (order by stringu1,stringu2) from tenk1;
ERROR: function rank(integer, name, name) does not exist
LINE 1: select rank(3) within group (order by stringu1,stringu2) fro...
^
HINT: To use the hypothetical-set aggregate rank, the number of hypothetical direct arguments (here 1) must match the number of ordering columns (here 2).
select rank('fred') within group (order by x) from generate_series(1,5) x;
ERROR: invalid input syntax for integer: "fred"
LINE 1: select rank('fred') within group (order by x) from generate_...
^
select rank('adam'::text collate "C") within group (order by x collate "POSIX")
from (values ('fred'),('jim')) v(x);
ERROR: collation mismatch between explicit collations "C" and "POSIX"
LINE 1: ...adam'::text collate "C") within group (order by x collate "P...
^
-- hypothetical-set type unification successes:
select rank('adam'::varchar) within group (order by x) from (values ('fred'),('jim')) v(x);
rank
------
1
(1 row)
select rank('3') within group (order by x) from generate_series(1,5) x;
rank
------
3
(1 row)
-- divide by zero check
select percent_rank(0) within group (order by x) from generate_series(1,0) x;
percent_rank
--------------
0
(1 row)
-- deparse and multiple features:
create view aggordview1 as
select ten,
percentile_disc(0.5) within group (order by thousand) as p50,
percentile_disc(0.5) within group (order by thousand) filter (where hundred=1) as px,
rank(5,'AZZZZ',50) within group (order by hundred, string4 desc, hundred)