mirror of https://github.com/postgres/postgres
Add some chapters on new topics. Change to referencing OASIS/Docbook v3.1 rather than Davenport/Docbook v3.0 Grepped for and fixed apparent tag mangling from emacs "Normalize" operation. Should be the last of those.REL7_0_PATCHES
parent
2cc8e6ac1f
commit
f75bf1877a
@ -1,296 +0,0 @@ |
||||
<Sect1> |
||||
<title>Bug Reporting Guidelines</title> |
||||
|
||||
<para> |
||||
When you encounter a bug in <productname>PostgreSQL</productname> we want to |
||||
hear about it. Your bug reports are an important part in making |
||||
<productname>PostgreSQL</productname> more reliable because even the utmost |
||||
care cannot guarantee that every part of PostgreSQL will work on every |
||||
platform under every circumstance. |
||||
</para> |
||||
|
||||
<para> |
||||
The following suggestions are intended to assist you in forming bug reports |
||||
that can be handled in an effective fashion. No one is required to follow |
||||
them but it tends to be to everyone's advantage. |
||||
</para> |
||||
|
||||
<para> |
||||
We cannot promise to fix every bug right away. If the bug is obvious, critical, |
||||
or affects a lot of users, chances are good that someone will look into it. It |
||||
could also happen that we tell you to update to a newer version to see if the |
||||
bug happens there. Or we might decide that the bug |
||||
cannot be fixed before some major rewrite we might be planning is done. Or |
||||
perhaps it's simply too hard and there are more important things on the agenda. |
||||
If you need help immediately, consider obtaining a commercial support contract. |
||||
</para> |
||||
|
||||
<Sect2> |
||||
<title>Identifying Bugs</title> |
||||
|
||||
<para> |
||||
Before you ask <quote>Is this a bug?</quote>, please read and re-read the |
||||
documentation to verify that you can really do whatever it is you are |
||||
trying. If it is not clear from the documentation whether you can do |
||||
something or not, please report that too, it's a bug in the documentation. |
||||
If it turns out that the program does something different from what the |
||||
documentation says, that's a bug. That might include, but is not limited to, |
||||
the following circumstances: |
||||
|
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
A program terminates with a fatal signal or an operating system |
||||
error message that would point to a problem in the program (for |
||||
example not <quote>disk full</quote>). |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
A program produces the wrong output for any given input. |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
A program refuses to accept valid input. |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
A program accepts invalid input without a notice or error message. |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
<productname>PostgreSQL</productname> fails to compile, build, or |
||||
install according to the instructions on supported platforms. |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
|
||||
Here <quote>program</quote> refers to any executable, not only the backend server. |
||||
</para> |
||||
|
||||
<para> |
||||
Being slow or resource-hogging is not necessarily a bug. Read the documentation |
||||
or ask on one of the mailing lists for help in tuning your applications. Failing |
||||
to comply to <acronym>SQL</acronym> is not a bug unless compliance for the |
||||
specific feature is explicitly claimed. |
||||
</para> |
||||
|
||||
<para> |
||||
Before you continue, check on the TODO list and in the FAQ to see if your bug is |
||||
already known. If you can't decode the information on the TODO list, report your |
||||
problem. The least we can do is make the TODO list clearer. |
||||
</para> |
||||
</Sect2> |
||||
|
||||
<Sect2> |
||||
<title>What to report</title> |
||||
|
||||
<para> |
||||
The most important thing to remember about bug reporting is to state all |
||||
the facts and only facts. Do not speculate what you think went wrong, what |
||||
<quote>it seemed to do</quote>, or which part of the program has a fault. |
||||
If you are not familiar with the implementation you would probably guess |
||||
wrong and not help us a bit. And even if you are, educated explanations are |
||||
a great supplement to but no substitute for facts. If we are going to fix |
||||
the bug we still have to see it happen for ourselves first. |
||||
Reporting the bare facts |
||||
is relatively straightforward (you can probably copy and paste them from the |
||||
screen) but all too often important details are left out because someone |
||||
thought it doesn't matter or the report would <quote>ring a bell</quote> |
||||
anyway. |
||||
</para> |
||||
|
||||
<para> |
||||
The following items should be contained in every bug report: |
||||
|
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
The exact sequence of steps <emphasis>from program startup</emphasis> |
||||
necessary to reproduce the problem. This should be self-contained; |
||||
it is not enough to send in a bare select statement without the |
||||
preceeding create table and insert statements, if the output should |
||||
depend on the data in the tables. We do not have the time |
||||
to decode your database schema, and if we are supposed to make up |
||||
our own data we would probably miss the problem. |
||||
The best format for a test case for |
||||
query-language related problems is a file that can be run through the |
||||
<application>psql</application> frontend |
||||
that shows the problem. (Be sure to not have anything in your |
||||
<filename>~/.psqlrc</filename> startup file.) You are encouraged to |
||||
minimize the size of your example, but this is not absolutely necessary. |
||||
If the bug is reproduceable, we'll find it either way. |
||||
</para> |
||||
<para> |
||||
If your application uses some other client interface, such as PHP, then |
||||
please try to isolate the offending queries. We probably won't set up a |
||||
web server to reproduce your problem. In any case remember to provide |
||||
the exact input files, do not guess that the problem happens for |
||||
<quote>large files</quote> or <quote>mid-size databases</quote>, etc. |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
The output you got. Please do not say that it <quote>didn't work</quote> or |
||||
<quote>failed</quote>. If there is an error message, |
||||
show it, even if you don't understand it. If the program terminates with |
||||
an operating system error, say which. If nothing at all happens, say so. |
||||
Even if the result of your test case is a program crash or otherwise obvious |
||||
it might not happen on our platform. The easiest thing is to copy the output |
||||
from the terminal, if possible. |
||||
</para> |
||||
<note> |
||||
<para> |
||||
In case of fatal errors, the error message provided by the client might |
||||
not contain all the information available. In that case, also look at the |
||||
output of the database server. If you do not keep your server output, |
||||
this would be a good time to start doing so. |
||||
</para> |
||||
</note> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
The output you expected is very important to state. If you just write |
||||
<quote>This command gives me that output.</quote> or <quote>This is not |
||||
what I expected.</quote>, we might run it ourselves, scan the output, and |
||||
think it looks okay and is exactly what we expected. We shouldn't have to |
||||
spend the time to decode the exact semantics behind your commands. |
||||
Especially refrain from merely saying that <quote>This is not what SQL says/Oracle |
||||
does.</quote> Digging out the correct behavior from <acronym>SQL</acronym> |
||||
is not a fun undertaking, nor do we all know how all the other relational |
||||
databases out there behave. (If your problem is a program crash you can |
||||
obviously omit this item.) |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
Any command line options and other startup options, including concerned |
||||
environment variables or configuration files that you changed from the |
||||
default. Again, be exact. If you are using a pre-packaged |
||||
distribution that starts the database server at boot time, you should try |
||||
to find out how that is done. |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
Anything you did at all differently from the installation instructions. |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
The PostgreSQL version. You can run the command |
||||
<literal>SELECT version();</literal> to |
||||
find out. If this function does not exist, say so, then we know that |
||||
your version is old enough. If you can't start up the server or a |
||||
client, look into the README file in the source directory or at the |
||||
name of your distribution file or package name. If your version is older |
||||
than 6.5 we will almost certainly tell you to upgrade. There are tons |
||||
of bugs in old versions, that's why we write new ones. |
||||
</para> |
||||
<para> |
||||
If you run a pre-packaged version, such as RPMs, say so, including any |
||||
subversion the package may have. If you are talking about a CVS |
||||
snapshot, mention that, including its date and time. |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
Platform information. This includes the kernel name and version, C library, |
||||
processor, memory information. In most cases it is sufficient to report |
||||
the vendor and version, but do not assume everyone knows what exactly |
||||
<quote>Debian</quote> contains or that everyone runs on Pentiums. If |
||||
you have installation problems information about compilers, make, etc. |
||||
is also necessary. |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
|
||||
Do not be afraid if your bug report becomes rather lengthy. That is a fact of life. |
||||
It's better to report everything the first time than us having to squeeze the |
||||
facts out of you. On the other hand, if your input files are huge, it is |
||||
fair to ask first whether somebody is interested in looking into it. |
||||
</para> |
||||
|
||||
<para> |
||||
Do not spend all your time to figure out which changes in the input make |
||||
the problem go away. This will probably not help solving it. If it turns |
||||
out that the bug can't be fixed right away, you will still have time to |
||||
find and share your work around. Also, once again, do not waste your time |
||||
guessing why the bug exists. We'll find that out soon enough. |
||||
</para> |
||||
|
||||
<para> |
||||
When writing a bug report, please choose non-confusing terminology. |
||||
The software package as such is called <quote>PostgreSQL</quote>, |
||||
sometimes <quote>Postgres</quote> for short. (Sometimes |
||||
the abbreviation <quote>Pgsql</quote> is used but don't do that.) When you |
||||
are specifically talking about the backend server, mention that, don't |
||||
just say <quote>Postgres crashes</quote>. The interactive frontend is called |
||||
<quote>psql</quote> and is for all intends and purposes completely separate |
||||
from the backend. |
||||
</para> |
||||
</Sect2> |
||||
|
||||
<Sect2> |
||||
<title>Where to report bugs</title> |
||||
|
||||
<para> |
||||
In general, send bug reports to <pgsql-bugs@postgresql.org>. You are |
||||
invited to find a descriptive subject for your email message, perhaps parts |
||||
of the error message. |
||||
</para> |
||||
|
||||
<para> |
||||
Do not send bug reports to any of the user mailing lists, such as |
||||
pgsql-sql or pgsql-general. These mailing lists are for answering |
||||
user questions, their subscribers normally do not wish to receive |
||||
bug reports. More importantly, they are unlikely to fix them. |
||||
</para> |
||||
|
||||
<para> |
||||
Also, please do <emphasis>not</emphasis> send reports to |
||||
<pgsql-hackers@postgresql.org>. This list is for discussing the |
||||
development of <productname>PostgreSQL</productname>, it would be nice |
||||
if we could keep the bug reports separate. We might choose take up a |
||||
discussion |
||||
about your bug report on it, if the bug needs more review. |
||||
</para> |
||||
|
||||
<para> |
||||
If you have a problem with the documentation, send email to |
||||
<pgsql-docs@postgresql.org>. Refer to the document, chapter, and sections. |
||||
</para> |
||||
|
||||
<para> |
||||
If your bug is a portability problem on a non-supported platform, send |
||||
mail to <pgsql-ports@postgresql.org>, so we (and you) can work on |
||||
porting <productname>PostgreSQL</productname> to your platform. |
||||
</para> |
||||
|
||||
<note> |
||||
<para> |
||||
Due to the unfortunate amount of spam going around, all of the above |
||||
email addresses are closed mailing lists. That is, you need to be |
||||
subscribed to them in order to be allowed to post. If you simply |
||||
want to send mail but do not want to receive list traffic, you can |
||||
subscribe to the special pgsql-loophole <quote>list</quote>, which |
||||
allows you to post to all <productname>PostgreSQL</productname> |
||||
mailing lists without receiving any messages. Send email to |
||||
<pgsql-loophole-request@postgresql.org> to subscribe. |
||||
</para> |
||||
</note> |
||||
</Sect2> |
||||
</Sect1> |
||||
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,263 @@ |
||||
<chapter> |
||||
<title>Understanding Performance</title> |
||||
|
||||
<para> |
||||
Query performance can be affected by many things. Some of these can |
||||
be manipulated by the user, while others are fundamental to the underlying |
||||
design of the system. |
||||
</para> |
||||
|
||||
<para> |
||||
Some performance issues, such as index creation and bulk data |
||||
loading, are covered elsewhere. This chapter will discuss the |
||||
<command>EXPLAIN</command> command, and will show how the details |
||||
of a query can affect the query plan, and hence overall |
||||
performance. |
||||
</para> |
||||
|
||||
<sect1> |
||||
<title>Using <command>EXPLAIN</command></title> |
||||
|
||||
<note> |
||||
<title>Author</title> |
||||
<para> |
||||
Written by Tom Lane, from e-mail dated 2000-03-27. |
||||
</para> |
||||
</note> |
||||
|
||||
<para> |
||||
Plan-reading is an art that deserves a tutorial, and I haven't |
||||
had time to write one. Here is some quick & dirty explanation. |
||||
</para> |
||||
|
||||
<para> |
||||
The numbers that are currently quoted by EXPLAIN are: |
||||
|
||||
<itemizedlist> |
||||
<listitem> |
||||
<para> |
||||
Estimated startup cost (time expended before output scan can start, |
||||
eg, time to do the sorting in a SORT node). |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
Estimated total cost (if all tuples are retrieved, which they may not |
||||
be --- LIMIT will stop short of paying the total cost, for |
||||
example). |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
Estimated number of rows output by this plan node. |
||||
</para> |
||||
</listitem> |
||||
|
||||
<listitem> |
||||
<para> |
||||
Estimated average width (in bytes) of rows output by this plan |
||||
node. |
||||
</para> |
||||
</listitem> |
||||
</itemizedlist> |
||||
</para> |
||||
|
||||
<para> |
||||
The costs are measured in units of disk page fetches. (There are some |
||||
fairly bogus fudge-factors for converting CPU effort estimates into |
||||
disk-fetch units; see the SET ref page if you want to play with these.) |
||||
It's important to note that the cost of an upper-level node includes |
||||
the cost of all its child nodes. It's also important to realize that |
||||
the cost only reflects things that the planner/optimizer cares about. |
||||
In particular, the cost does not consider the time spent transmitting |
||||
result tuples to the frontend --- which could be a pretty dominant |
||||
factor in the true elapsed time, but the planner ignores it because |
||||
it cannot change it by altering the plan. (Every correct plan will |
||||
output the same tuple set, we trust.) |
||||
</para> |
||||
|
||||
<para> |
||||
Rows output is a little tricky because it is *not* the number of rows |
||||
processed/scanned by the query --- it is usually less, reflecting the |
||||
estimated selectivity of any WHERE-clause constraints that are being |
||||
applied at this node. |
||||
</para> |
||||
|
||||
<para> |
||||
Average width is pretty bogus because the thing really doesn't have |
||||
any idea of the average length of variable-length columns. I'm thinking |
||||
about improving that in the future, but it may not be worth the trouble, |
||||
because the width isn't used for very much. |
||||
</para> |
||||
|
||||
<para> |
||||
Here are some examples (using the regress test database after a |
||||
vacuum analyze, and current sources): |
||||
|
||||
<programlisting> |
||||
regression=# explain select * from tenk1; |
||||
NOTICE: QUERY PLAN: |
||||
|
||||
Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=148) |
||||
</programlisting> |
||||
</para> |
||||
|
||||
<para> |
||||
About as straightforward as it gets. If you do |
||||
|
||||
<programlisting> |
||||
select * from pg_class where relname = 'tenk1'; |
||||
</programlisting> |
||||
|
||||
you'll find out that tenk1 has 233 disk |
||||
pages and 10000 tuples. So the cost is estimated at 233 block |
||||
reads, defined as 1.0 apiece, plus 10000 * cpu_tuple_cost which is |
||||
currently 0.01 (try <command>show cpu_tuple_cost</command>). |
||||
</para> |
||||
|
||||
<para> |
||||
Now let's modify the query to add a qualification clause: |
||||
|
||||
<programlisting> |
||||
regression=# explain select * from tenk1 where unique1 < 1000; |
||||
NOTICE: QUERY PLAN: |
||||
|
||||
Seq Scan on tenk1 (cost=0.00..358.00 rows=1000 width=148) |
||||
</programlisting> |
||||
|
||||
Estimated output rows has gone down because of the WHERE clause. |
||||
(The uncannily accurate estimate is just because tenk1 is a particularly |
||||
simple case --- the unique1 column has 10000 distinct values ranging |
||||
from 0 to 9999, so the estimator's linear interpolation between min and |
||||
max column values is dead-on.) However, the scan will still have to |
||||
visit all 10000 rows, so the cost hasn't decreased; in fact it has gone |
||||
up a bit to reflect the extra CPU time spent checking the WHERE |
||||
condition. |
||||
</para> |
||||
|
||||
<para> |
||||
Modify the query to restrict the qualification even more: |
||||
|
||||
<programlisting> |
||||
regression=# explain select * from tenk1 where unique1 < 100; |
||||
NOTICE: QUERY PLAN: |
||||
|
||||
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.35 rows=100 width=148) |
||||
</programlisting> |
||||
|
||||
and you will see that if we make the WHERE condition selective |
||||
enough, the planner will |
||||
eventually decide that an indexscan is cheaper than a sequential scan. |
||||
This plan will only have to visit 100 tuples because of the index, |
||||
so it wins despite the fact that each individual fetch is expensive. |
||||
</para> |
||||
|
||||
<para> |
||||
Add another condition to the qualification: |
||||
|
||||
<programlisting> |
||||
regression=# explain select * from tenk1 where unique1 < 100 and |
||||
regression-# stringu1 = 'xxx'; |
||||
NOTICE: QUERY PLAN: |
||||
|
||||
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..89.60 rows=1 width=148) |
||||
</programlisting> |
||||
|
||||
The added clause "stringu1 = 'xxx'" reduces the output-rows estimate, |
||||
but not the cost because we still have to visit the same set of tuples. |
||||
</para> |
||||
|
||||
<para> |
||||
Let's try joining two tables, using the fields we have been discussing: |
||||
|
||||
<programlisting> |
||||
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100 |
||||
regression-# and t1.unique2 = t2.unique2; |
||||
NOTICE: QUERY PLAN: |
||||
|
||||
Nested Loop (cost=0.00..144.07 rows=100 width=296) |
||||
-> Index Scan using tenk1_unique1 on tenk1 t1 |
||||
(cost=0.00..89.35 rows=100 width=148) |
||||
-> Index Scan using tenk2_unique2 on tenk2 t2 |
||||
(cost=0.00..0.53 rows=1 width=148) |
||||
</programlisting> |
||||
</para> |
||||
|
||||
<para> |
||||
In this nested-loop join, the outer scan is the same indexscan we had |
||||
in the example before last, and the cost and row count are the same |
||||
because we are applying the "unique1 < 100" WHERE clause at this node. |
||||
The "t1.unique2 = t2.unique2" clause isn't relevant yet, so it doesn't |
||||
affect the row count. For the inner scan, we assume that the current |
||||
outer-scan tuple's unique2 value is plugged into the inner indexscan |
||||
to produce an indexqual like |
||||
"t2.unique2 = <replaceable>constant</replaceable>". So we get the |
||||
same inner-scan plan and costs that we'd get from, say, "explain select |
||||
* from tenk2 where unique2 = 42". The loop node's costs are then set |
||||
on the basis of the outer scan's cost, plus one repetition of the |
||||
inner scan for each outer tuple (100 * 0.53, here), plus a little CPU |
||||
time for join processing. |
||||
</para> |
||||
|
||||
<para> |
||||
In this example the loop's output row count is the same as the product |
||||
of the two scans' row counts, but that's not true in general, because |
||||
in general you can have WHERE clauses that mention both relations and |
||||
so can only be applied at the join point, not to either input scan. |
||||
For example, if we added "WHERE ... AND t1.hundred < t2.hundred", |
||||
that'd decrease the output row count of the join node, but not change |
||||
either input scan. |
||||
</para> |
||||
|
||||
<para> |
||||
We can look at variant plans by forcing the planner to disregard |
||||
whatever strategy it thought was the winner (a pretty crude tool, |
||||
but it's what we've got at the moment): |
||||
|
||||
<programlisting> |
||||
regression=# set enable_nestloop = 'off'; |
||||
SET VARIABLE |
||||
regression=# explain select * from tenk1 t1, tenk2 t2 where t1.unique1 < 100 |
||||
regression-# and t1.unique2 = t2.unique2; |
||||
NOTICE: QUERY PLAN: |
||||
|
||||
Hash Join (cost=89.60..574.10 rows=100 width=296) |
||||
-> Seq Scan on tenk2 t2 |
||||
(cost=0.00..333.00 rows=10000 width=148) |
||||
-> Hash (cost=89.35..89.35 rows=100 width=148) |
||||
-> Index Scan using tenk1_unique1 on tenk1 t1 |
||||
(cost=0.00..89.35 rows=100 width=148) |
||||
</programlisting> |
||||
|
||||
This plan proposes to extract the 100 interesting rows of tenk1 |
||||
using ye same olde indexscan, stash them into an in-memory hash table, |
||||
and then do a sequential scan of tenk2, probing into the hash table |
||||
for possible matches of "t1.unique2 = t2.unique2" at each tenk2 tuple. |
||||
The cost to read tenk1 and set up the hash table is entirely startup |
||||
cost for the hash join, since we won't get any tuples out until we can |
||||
start reading tenk2. The total time estimate for the join also |
||||
includes a pretty hefty charge for CPU time to probe the hash table |
||||
10000 times. Note, however, that we are NOT charging 10000 times 89.35; |
||||
the hash table setup is only done once in this plan type. |
||||
</para> |
||||
</sect1> |
||||
</chapter> |
||||
|
||||
<!-- Keep this comment at the end of the file |
||||
Local variables: |
||||
mode:sgml |
||||
sgml-omittag:nil |
||||
sgml-shorttag:t |
||||
sgml-minimize-attributes:nil |
||||
sgml-always-quote-attributes:t |
||||
sgml-indent-step:1 |
||||
sgml-indent-data:t |
||||
sgml-parent-document:nil |
||||
sgml-default-dtd-file:"./reference.ced" |
||||
sgml-exposed-tags:nil |
||||
sgml-local-catalogs:("/usr/lib/sgml/CATALOG") |
||||
sgml-local-ecat-files:nil |
||||
End: |
||||
--> |
||||
@ -1 +1 @@ |
||||
<!doctype refentry PUBLIC "-//Davenport//DTD DocBook V3.0//EN"> |
||||
<!doctype refentry PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> |
||||
|
||||
Loading…
Reference in new issue