mirror of https://github.com/postgres/postgres
parent
29138eeb3c
commit
c9ead90ea3
@ -0,0 +1,160 @@ |
||||
|
||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* |
||||
Genetic Query Optimization in Database Systems |
||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* |
||||
|
||||
Martin Utesch |
||||
|
||||
<utesch@aut.tu-freiberg.de> |
||||
|
||||
Institute of Automatic Control |
||||
University of Mining and Technology |
||||
Freiberg, Germany |
||||
|
||||
02/10/1997 |
||||
|
||||
|
||||
1.) Query Handling as a Complex Optimization Problem |
||||
==================================================== |
||||
|
||||
Among all relational operators the most difficult one to process and |
||||
optimize is the JOIN. The number of alternative plans to answer a query |
||||
grows exponentially with the number of JOINs included in it. Further |
||||
optimization effort is caused by the support of a variety of *JOIN |
||||
methods* (e.g., nested loop, index scan, merge join in Postgres) to |
||||
process individual JOINs and a diversity of *indices* (e.g., r-tree, |
||||
b-tree, hash in Postgres) as access paths for relations. |
||||
|
||||
The current Postgres optimizer implementation performs a *near- |
||||
exhaustive search* over the space of alternative strategies. This query |
||||
optimization technique is inadequate to support database application |
||||
domains that evolve the need for extensive queries, such as artifcial |
||||
intelligence. |
||||
|
||||
The Institute of Automatic Control at the University of Mining and |
||||
Technology Freiberg, Germany encountered the described problems as its |
||||
folks wanted to take the Postgres DBMS as the backend for a decision |
||||
support knowledge based system for the maintenance of an electrical |
||||
power grid. The DBMS needed to handle large JOIN queries for the |
||||
inference machine of the knowledge based system. |
||||
|
||||
Performance difficulties within exploring the space of possible query |
||||
plans arose the demand for a new optimization technique being developed. |
||||
|
||||
In the following we propose the implementation of a *Genetic |
||||
Algorithm* as an option for the database query optimization problem. |
||||
|
||||
|
||||
2.) Genetic Algorithms (GA) |
||||
=========================== |
||||
|
||||
The GA is a heuristic optimization method which operates through |
||||
determined, randomized search. The set of possible solutions for the |
||||
optimization problem is considered as a *population* of *individuals*. |
||||
The degree of adaption of an individual to its environment is specified |
||||
by its *fitness*. |
||||
|
||||
The coordinates of an individual in the search space are represented |
||||
by *chromosomes*, in essence a set of character strings. A *gene* is a |
||||
subsection of a chromosome which encodes the value of a single parameter |
||||
being optimized. Typical encodings for a gene could be *binary* or |
||||
*integer*. |
||||
|
||||
Through simulation of the evolutionary operations *recombination*, |
||||
*mutation*, and *selection* new generations of search points are found |
||||
that show a higher average fitness than their ancestors. |
||||
|
||||
According to the "comp.ai.genetic" FAQ it cannot be stressed too |
||||
strongly that a GA is not a pure random search for a solution to a |
||||
problem. A GA uses stochastic processes, but the result is distinctly |
||||
non-random (better than random). |
||||
|
||||
Structured Diagram of a GA: |
||||
--------------------------- |
||||
|
||||
P(t) generation of ancestors at a time t |
||||
P''(t) generation of descendants at a time t |
||||
|
||||
+=========================================+ |
||||
|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<| |
||||
+=========================================+ |
||||
| INITIALIZE t := 0 | |
||||
+=========================================+ |
||||
| INITIALIZE P(t) | |
||||
+=========================================+ |
||||
| evalute FITNESS of P(t) | |
||||
+=========================================+ |
||||
| while not STOPPING CRITERION do | |
||||
| +-------------------------------------+ |
||||
| | P'(t) := RECOMBINATION{P(t)} | |
||||
| +-------------------------------------+ |
||||
| | P''(t) := MUTATION{P'(t)} | |
||||
| +-------------------------------------+ |
||||
| | P(t+1) := SELECTION{P''(t) + P(t)} | |
||||
| +-------------------------------------+ |
||||
| | evalute FITNESS of P''(t) | |
||||
| +-------------------------------------+ |
||||
| | t := t + 1 | |
||||
+===+=====================================+ |
||||
|
||||
|
||||
3.) Genetic Query Optimization (GEQO) in PostgreSQL |
||||
=================================================== |
||||
|
||||
The GEQO module is intended for the solution of the query |
||||
optimization problem similar to a traveling salesman problem (TSP). |
||||
Possible query plans are encoded as integer strings. Each string |
||||
represents the JOIN order from one relation of the query to the next. |
||||
E. g., the query tree /\ |
||||
/\ 2 |
||||
/\ 3 |
||||
4 1 is encoded by the integer string '4-1-3-2', |
||||
which means, first join relation '4' and '1', then '3', and |
||||
then '2', where 1, 2, 3, 4 are relids in PostgreSQL. |
||||
|
||||
Parts of the GEQO module are adapted from D. Whitley's Genitor |
||||
algorithm. |
||||
|
||||
Specific characteristics of the GEQO implementation in PostgreSQL |
||||
are: |
||||
|
||||
o usage of a *steady state* GA (replacement of the least fit |
||||
individuals in a population, not whole-generational replacement) |
||||
allows fast convergence towards improved query plans. This is |
||||
essential for query handling with reasonable time; |
||||
|
||||
o usage of *edge recombination crossover* which is especially suited |
||||
to keep edge losses low for the solution of the TSP by means of a GA; |
||||
|
||||
o mutation as genetic operator is deprecated so that no repair |
||||
mechanisms are needed to generate legal TSP tours. |
||||
|
||||
The GEQO module gives the following benefits to the PostgreSQL DBMS |
||||
compared to the Postgres query optimizer implementation: |
||||
|
||||
o handling of large JOIN queries through non-exhaustive search; |
||||
|
||||
o improved cost size approximation of query plans since no longer |
||||
plan merging is needed (the GEQO module evaluates the cost for a |
||||
query plan as an individual). |
||||
|
||||
|
||||
References |
||||
========== |
||||
|
||||
J. Heitk"otter, D. Beasley: |
||||
--------------------------- |
||||
"The Hitch-Hicker's Guide to Evolutionary Computation", |
||||
FAQ in 'comp.ai.genetic', |
||||
'ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html' |
||||
|
||||
Z. Fong: |
||||
-------- |
||||
"The Design and Implementation of the Postgres Query Optimizer", |
||||
file 'planner/Report.ps' in the 'postgres-papers' distribution |
||||
|
||||
R. Elmasri, S. Navathe: |
||||
----------------------- |
||||
"Fundamentals of Database Systems", |
||||
The Benjamin/Cummings Pub., Inc. |
||||
|
Loading…
Reference in new issue