Improve documentation's description of JOIN clauses.

In bug #12000, Andreas Kunert complained that the documentation was misleading in saying "FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2". That's correct as far as it goes, but the equivalence doesn't hold when you consider three or more tables, since JOIN binds more tightly than comma. I added a <note> to explain this, and ended up rearranging some of the existing text so that the note would make sense in context. In passing, rewrite the description of JOIN USING, which was unnecessarily vague, and hadn't been helped any by somebody's reliance on markup as a substitute for clear writing. (Mostly this involved reintroducing a concrete example that was unaccountably removed by commit 032f3b7e166cfa28.) Back-patch to all supported branches.
11 years ago · 400a4c3d66
parent a855c90a72
commit 400a4c3d66
1 changed files with 98 additions and 56 deletions
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@ -118,10 +118,12 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
 </synopsis>

    A table reference can be a table name (possibly schema-qualified),
-    or a derived table such as a subquery, a table join, or complex
-    combinations of these.  If more than one table reference is listed
-    in the <literal>FROM</> clause they are cross-joined (see below)
-    to form the intermediate virtual table that can then be subject to
+    or a derived table such as a subquery, a <literal>JOIN</> construct, or
+    complex combinations of these.  If more than one table reference is
+    listed in the <literal>FROM</> clause, the tables are cross-joined
+    (that is, the Cartesian product of their rows is formed; see below).
+    The result of the <literal>FROM</> list is an intermediate virtual
+    table that can then be subject to
    transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
    and <literal>HAVING</> clauses and is finally the result of the
    overall table expression.
@ -161,6 +163,16 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
     A joined table is a table derived from two other (real or
     derived) tables according to the rules of the particular join
     type.  Inner, outer, and cross-joins are available.
+     The general syntax of a joined table is
+<synopsis>
+<replaceable>T1</replaceable> <replaceable>join_type</replaceable> <replaceable>T2</replaceable> <optional> <replaceable>join_condition</replaceable> </optional>
+</synopsis>
+     Joins of all types can be chained together, or nested: either or
+     both <replaceable>T1</replaceable> and
+     <replaceable>T2</replaceable> can be joined tables.  Parentheses
+     can be used around <literal>JOIN</> clauses to control the join
+     order.  In the absence of parentheses, <literal>JOIN</> clauses
+     nest left-to-right.
    </para>

    <variablelist>
@ -197,10 +209,28 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
       <para>
        <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
        <replaceable>T2</replaceable></literal> is equivalent to
-        <literal>FROM <replaceable>T1</replaceable>,
-        <replaceable>T2</replaceable></literal>.  It is also equivalent to
        <literal>FROM <replaceable>T1</replaceable> INNER JOIN
        <replaceable>T2</replaceable> ON TRUE</literal> (see below).
+        It is also equivalent to
+        <literal>FROM <replaceable>T1</replaceable>,
+        <replaceable>T2</replaceable></literal>.
+        <note>
+        <para>
+         This latter equivalence does not hold exactly when more than two
+         tables appear, because <literal>JOIN</> binds more tightly than
+         comma.  For example
+         <literal>FROM <replaceable>T1</replaceable> CROSS JOIN
+         <replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
+         ON <replaceable>condition</replaceable></literal>
+         is not the same as
+         <literal>FROM <replaceable>T1</replaceable>,
+         <replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
+         ON <replaceable>condition</replaceable></literal>
+         because the <replaceable>condition</replaceable> can
+         reference <replaceable>T1</replaceable> in the first case but not
+         the second.
+        </para>
+        </note>
       </para>
      </listitem>
     </varlistentry>
@ -240,47 +270,6 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
        <quote>match</quote>, as explained in detail below.
       </para>

-       <para>
-        The <literal>ON</> clause is the most general kind of join
-        condition: it takes a Boolean value expression of the same
-        kind as is used in a <literal>WHERE</> clause.  A pair of rows
-        from <replaceable>T1</> and <replaceable>T2</> match if the
-        <literal>ON</> expression evaluates to true for them.
-       </para>
-
-       <para>
-        <literal>USING</> is a shorthand notation: it takes a
-        comma-separated list of column names, which the joined tables
-        must have in common, and forms a join condition specifying
-        equality of each of these pairs of columns.  Furthermore, the
-        output of <literal>JOIN USING</> has one column for each of
-        the equated pairs of input columns, followed by the
-        remaining columns from each table.  Thus, <literal>USING (a, b,
-        c)</literal> is equivalent to <literal>ON (t1.a = t2.a AND
-        t1.b = t2.b AND t1.c = t2.c)</literal> with the exception that
-        if <literal>ON</> is used there will be two columns
-        <literal>a</>, <literal>b</>, and <literal>c</> in the result,
-        whereas with <literal>USING</> there will be only one of each
-        (and they will appear first if <command>SELECT *</> is used).
-       </para>
-
-       <para>
-        <indexterm>
-         <primary>join</primary>
-         <secondary>natural</secondary>
-        </indexterm>
-        <indexterm>
-         <primary>natural join</primary>
-        </indexterm>
-        Finally, <literal>NATURAL</> is a shorthand form of
-        <literal>USING</>: it forms a <literal>USING</> list
-        consisting of all column names that appear in both
-        input tables.  As with <literal>USING</>, these columns appear
-        only once in the output table.  If there are no common
-        columns, <literal>NATURAL</literal> behaves like
-        <literal>CROSS JOIN</literal>.
-       </para>
-
       <para>
        The possible types of qualified join are:

@ -358,19 +347,70 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
        </varlistentry>
       </variablelist>
       </para>
-      </listitem>
-     </varlistentry>
-    </variablelist>

       <para>
-     Joins of all types can be chained together or nested: either or
-     both <replaceable>T1</replaceable> and
-     <replaceable>T2</replaceable> can be joined tables.  Parentheses
-     can be used around <literal>JOIN</> clauses to control the join
-     order.  In the absence of parentheses, <literal>JOIN</> clauses
-     nest left-to-right.
+        The <literal>ON</> clause is the most general kind of join
+        condition: it takes a Boolean value expression of the same
+        kind as is used in a <literal>WHERE</> clause.  A pair of rows
+        from <replaceable>T1</> and <replaceable>T2</> match if the
+        <literal>ON</> expression evaluates to true.
       </para>

+       <para>
+        The <literal>USING</> clause is a shorthand that allows you to take
+        advantage of the specific situation where both sides of the join use
+        the same name for the joining column(s).  It takes a
+        comma-separated list of the shared column names
+        and forms a join condition that includes an equality comparison
+        for each one.  For example, joining <replaceable>T1</>
+        and <replaceable>T2</> with <literal>USING (a, b)</> produces
+        the join condition <literal>ON <replaceable>T1</>.a
+        = <replaceable>T2</>.a AND <replaceable>T1</>.b
+        = <replaceable>T2</>.b</literal>.
+       </para>
+
+       <para>
+        Furthermore, the output of <literal>JOIN USING</> suppresses
+        redundant columns: there is no need to print both of the matched
+        columns, since they must have equal values.  While <literal>JOIN
+        ON</> produces all columns from <replaceable>T1</> followed by all
+        columns from <replaceable>T2</>, <literal>JOIN USING</> produces one
+        output column for each of the listed column pairs (in the listed
+        order), followed by any remaining columns from <replaceable>T1</>,
+        followed by any remaining columns from <replaceable>T2</>.
+       </para>
+
+       <para>
+        <indexterm>
+         <primary>join</primary>
+         <secondary>natural</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>natural join</primary>
+        </indexterm>
+        Finally, <literal>NATURAL</> is a shorthand form of
+        <literal>USING</>: it forms a <literal>USING</> list
+        consisting of all column names that appear in both
+        input tables.  As with <literal>USING</>, these columns appear
+        only once in the output table.  If there are no common
+        column names, <literal>NATURAL</literal> behaves like
+        <literal>CROSS JOIN</literal>.
+       </para>
+
+       <note>
+        <para>
+         <literal>USING</literal> is reasonably safe from column changes
+         in the joined relations since only the listed columns
+         are combined.  <literal>NATURAL</> is considerably more risky since
+         any schema changes to either relation that cause a new matching
+         column name to be present will cause the join to combine that new
+         column as well.
+        </para>
+       </note>
+      </listitem>
+     </varlistentry>
+    </variablelist>
+
    <para>
     To put this together, assume we have tables <literal>t1</literal>:
 <programlisting>
@ -487,6 +527,8 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
     clause is processed <emphasis>before</> the join, while
     a restriction placed in the <literal>WHERE</> clause is processed
     <emphasis>after</> the join.
+     That does not matter with inner joins, but it matters a lot with outer
+     joins.
    </para>
   </sect3>