Search This Blog

Wednesday 6 February 2013

Oracle SQL Tunning OR Query Tunning

ORACLE SQL TUNING

This document is meant to explain the points to be considered, while
formulating your SQLs. Some of this is meant at the DBA level, but most of
these can be noted, for programmer level.

Before tuning your SQL you should understand the phases of SQL query
processing. There are 5 phases of processing the SQL query :
1) Open
2) Parse
3) Fetch
4) Execute
5) Close
In Open phase, one cursor area (context area) will be opened for your query
In Parse phase, the following processes will take place
i) SQL statement is verified, whether valid or not (Syntax checking).
ii) The tables, columns and privileges are verified (Symantec checking)
iii) A Parse tree is prepared, based on the execution plan
iv) The parse tree into SQL area (reuse to identical query).
In Fetch phase, rows are selected from the corresponding data files and put
into the data block buffer, by server process.
In Execute phase, user process sends the records to the corresponding users.
In Close phase the opened cursor will be closed and temporary memory will be
released.

The following points are to be noted, when you write SQL statements.
# When you write conditions that compare columns with constants, wherever
possible, don't use expressions.
Eg:
Sal>(24000/12) - Not optimized
Sal >2000 - Optimized

# While using LIKE operator, if you are not using wildcards, then don't use
LIKE. Use equality operator instead.
Eg.
desig Like 'DBA' - Not optimized
desig='DBA' - Optimized
desig like 'DB%' - Correct. But if you know the full string, better to use
it.

# When the number of condition in IN operator is less, use OR and =
operator
Eg.
WHERE dno IN (10,20,30) - Not optimized
dno=10 OR dno=20 OR dno=30 - Optimized.

#When you use ANY or ALL or BETWEEN operators, the query optimizer expands
the condition .So, it's better that you expand the condition as far as
possible using OR operator .
Eg.1
Where Sal=ANY(5000,6000) - Not optimized
Where sal=5000 OR sal=6000 - Optimized
Eg2.
Where sal >ALL(5000,6000) - Not optimized
Where sal >5000 AND sal>6000 - Optimized
Eg3.
Where sal BETWEEN 2000 AND 5000 - Not optimized
Where sal >=2000 AND sal<=5000 - Optimized

# Try to avoid using the NOT logical operator as far as possible. Use <> or
!= or ^= relational operators instead.
Eg.
NOT dno = (SELECT dno FROM emp WHERE ename = 'TAYLOR') - Not optimized
deptno <> (SELECT deptno FROM emp WHERE ename = 'TAYLOR') - Optimized

# If your WHERE conditions have more OR operators, then you have to rewrite
the query with SET operators because SET operators execute more efficiently
then OR (Relational Operators).
Eg.
SELECT * FROM emp WHERE job = 'CLERK' OR deptno = 10; - Not optimized

SELECT * FROM emp WHERE job = 'CLERK'
UNION ALL SELECT * FROM emp WHERE deptno = 10 AND job <> 'CLERK';
(Make sure you are having INDEX) - Optimized

# Complex queries have to be changed to join.
Eg.
SELECT * FROM accounts WHERE custno
IN (SELECT custno FROM customers); - Not optimized
SELECT accounts.* FROM accounts, customers
WHERE accounts.custno = customers.custno; - Optimized
(Make sure 'custno' column is the primary key or indexed.)

Optimizer Modes
There are three types of optimizer modes.
1. CHOOSE (Default).
2. COST (Using Resource cost)
3. RULE (Using Access path)

Default optimizer mode is CHOOSE, now Oracle check recourse to set cost
based mode if you are not enabled resource_limit It will set rule based
optimizer mode.

When you set COST based optimizer mode, Oracle considers only CPU resource.
Eg.
Logical reads per call
Logical reads per session
CPU per session
And etc. But you have to set recourse_limit=true in init parameter file to
set cost based optimizer mode.

When you set RULE based optimizer mode, Oracle will never worry about COST
(means resource or I/O cost). It will search the records through the Access
paths. Oracle maintains 15 access paths for each search. Each of these paths
are explained below with the corresponding rank associated with it. Among
these paths, Oracle tries to apply the least rank of the path as possible.
Your queries should apply the least rank of path.


Note:
If you are not able to understand the access path properly, send mail to me.
Why, because unless you are clear with access path, you can't tune SQL
queries those are using RULE based optimizer mode.

ACCESS PATHS (Each number below, is the rank associated with the access
path)

1. Single row returned by ROWID

Eg : SELECT * FROM emp WHERE ROWID = '00000DC5.0000.0001';

2. Single row returned by CLUSTER join
To use this access path, your join tables should be stored in the same
cluster and WHERE clause should be return one only row. To store join tables
in same cluster you have to create cluster for those tables first.

3. Single row returned by hash cluster join, with unique or primary key
To use this access path you have to create a HASH CLUSTER using HASH
function.

4. Single row returned by unique or primary key
To use this access path, the WHERE clause should have all columns in the
unique or primary key with equality condition.

5. Cluster join
To use this access path the join tables should be stored in the same
cluster and WHERE clause should contain conditions that equate each column
of the cluster key in one table with corresponding column in the other
table.

6. Hash cluster key
To use this access path, the WHERE clause should use all columns of hash
cluster key with equality condition.

7. Indexed cluster key
To use this access path, the WHERE clause should use all columns of
indexed cluster key with equality condition.

8. Composite index
To use this access path, the WHERE clause should use all columns of a
composite index in equality conditions combined with AND operators.

9. Single-column index
To use this access path, the WHERE clause should use the columns of one
or more single- column indexes in equality conditions.

10. Bound range search on indexed column
To use this access path, the WHERE clause should contains a condition
that uses either the column of a single-column index or one or more columns
that make up a leading portion of a composite index (using AND)

11. Unbound range search on indexed column.
To use this access path, the WHERE clause should contain one of the
conditions that use either the column of a single-column index or one or
more columns of the leading portion of a composite ndex (Using OR).

12. Sort merge join
To use this access path, the join tables should not be stored together
in a cluster, if the statement's WHERE clause uses columns from each table
in equality conditions.

13. MAX or MIN of indexed column
To use this access path the following conditions must be true.
i) The query uses the MAX or MIN function to select the maximum or minimum
value of either the column of a single-column index or the leading column of
a composite index. The index cannot be a cluster index.
ii) The argument to the MAX or MIN function can be any expression involving
the column, a constant, or the addition operator (+), the concatenation
operation (||), or the CONCAT function.
iii) There are no other expressions in the select list.
iv) The statement has no WHERE clause or GROUP BY clause.

14.ORDER BY on indexed column
To use this access path the following conditions must be true.
i) The query contains an ORDER BY clause that uses either the column of a
single-column index or the leading portion of a composite index. The index
cannot be a cluster index.
ii) There must be a PRIMARY KEY or NOT NULL integrity constraint that
guarantees that at least one of the indexed columns listed in the ORDER BY
clause contains no nulls.
iii) The NLS_SORT parameter is set to BINARY in init parameter file or using
statement ALTER SESSION SET NLS_SORT=BINARY;

15.Full table scan.
This access path is available for any SQL statement, regardless of
its WHERE clause conditions.

When you create a composite index for all the columns which are specified in
WHERE clause, Oracle uses the path composite index (rank 8). This will
increase the performance.
Keep in mind that full table scan is the last rank (15). So, at any point of
time try to avoid the full table scan.

Note:
Even though you are having indexes, your query will never use that index
when you write a query with the following conditions
col1 > col2
col1 < col2
col1 >= col2
col1 <= col2
col1 and col2 are in the same table.
Where Col1 is not null
Where Col1 is null
Where Col1 not in
Where col1!=<expr>
Where col1 like '% pattern'

How to make your JOINS more effective?

When you write a query with two-row source, Oracle performs one of the
following joins. I will take two tables like emp and dept to explain the
joins

1. Nested loops
When you write a query like
select * from emp,dept where emp.dno=dept.dno;
Here EMP is the Driving (outer) table and DEPT is Driven (inner) table.
Oracle performs a full table scan of emp and a rowid scan of DEPT.

2. Sort Merge
Oracle first sorts each row of source to be joined if they have not been
already sorted. Then it performs a full table scan of two tables. But sort
merge joins can perform only for equi-joins.

3. Cluster join
Oracle can perform a cluster join only for an equi-join that equates the
cluster key columns of two tables in the same cluster. Then Oracle performs
a full table scan for dept and cluster scan of emp.

4. Hash join
Here also Oracle performs a full table scan for dept and hash cluster
scan of emp.

Hints
1. When you use OUTER JOIN(+) operator.
The table with the outer join operator must come after the other table in
the condition.
2. When you specify more tables in FROM clause.
The first table in the join order has the most highly ranked access path
(eg. rank 5 table first and rank 7 table second and so on).
3. When you set operators UNION or UNION ALL, as per commitment, both are
same except that UNION uses SORT operation to eliminate the duplicate
records. So, if you are not using any WHERE clause in a compound query and
you are not worrying about duplicate records, then UNION ALL is advisable.

Using Plan Table

Note: The following examples are taken from actual data that is available
for the GECFS-CarPool? application.

Before describing how to use the plan table, it is better to know something
about the examples that are to be used in this document.

There is a table 'T8000_TIC_UNIT_HISTORY' that is created with the synonym
'TIC_UNIT_HISTORY'. To get to know the indexes for this table, use the
following SQL script:

select owner, index_name, table_owner, table_name from all_indexes where
table_name='T8000_TIC_UNIT_HISTORY'

This gives the following result:

OWNER INDEX_NAME TABLE_OWNER TABLE_NAME
OPNSUTIL T8000_TIC_UNIT_HISTORY_X OPNSUTIL T8000_TIC_UNIT_HISTORY

To view the columns that are used in this index, use the following script

select index_name, column_name, column_position from all_ind_columns where
index_name='T8000_TIC_UNIT_HISTORY_X'

This gives the following result

INDEX_NAME COLUMN_NAME COLUMN_POSITION
T8000_TIC_UNIT_HISTORY_X CURR_VIN 1

We can use table 'PLAN_TABLE' to explain the SQL queries. By saying
'explaining SQL queries', it is basically analyzing the queries, so as to
return rows that will say, what is I/O cost of processing the queries, which
index was used for each table in the SQL, which access path was used for the
query, etc.

I shall describe in detail about how to use the PLAN_TABLE, based on the
example above.


The structure of the PLAN_TABLE is as follows:

Name Null? Type
STATEMENT_ID VARCHAR2(30)
TIMESTAMP DATE
REMARKS VARCHAR2(80)
OPERATION VARCHAR2(30)
OPTIONS VARCHAR2(30)
OBJECT_NODE VARCHAR2(128)
OBJECT_OWNER VARCHAR2(30)
OBJECT_NAME VARCHAR2(30)
OBJECT_INSTANCE NUMBER
OBJECT_TYPE VARCHAR2(30)
OPTIMIZER VARCHAR2(255)
SEARCH_COLUMNS NUMBER
ID NUMBER
PARENT_ID NUMBER
POSITION NUMBER
OTHER LONG

>From this, the main fields, which will be used for the purpose of this
document, will be the following fields:
STATEMENT_ID, OPERATION, OPTIONS, OBJECT_NAME, OBJECT_TYPE, ID, and POSITION

We use the following SQL script, which is used to explain the query.

explain plan set statement_id = 'pcs' for
select * from tic_unit_history where curr_vin='1FTFF25G9DKA45510'

Let us see what this means:

The statement_id field is the field that is set by us, to identify a unique
operation. We can use any literal to set the field. While analyzing the rows
returned, we will be using this field to identify the required rows.
The query, that is to be explained, is specified after the 'for' keyword.
So, what this means, in a sort of pseudo-code language is:
"Explain the plan for this query and set the statement_id='pcs' for all the
rows that are returned"

To analyze the rows that are returned, use the following script:

SELECT LPAD(' ',2*(LEVEL-1))||operation||' '||options||' '||object_name
||' '||DECODE(id, 0, 'Cost = '||position) "Query Plan"
FROM plan_table
START WITH id = 0 AND statement_id = 'pcs'
CONNECT BY PRIOR id = parent_id AND statement_id ='pcs';

I've highlighted the field names of the PLAN_TABLE in bold. Note that the
statement_id value given in this script is the same as that given in the
previous script (and it is case-sensitive).

This script will return, the following rows:

Query Plan
SELECT STATEMENT Cost = 2
TABLE ACCESS BY ROWID T8000_TIC_UNIT_HISTORY
INDEX UNIQUE SCAN T8000_TIC_UNIT_HISTORY_X

There's only one column that is returned and 3 rows. The "Query Plan" is the
header.

Now I'll explain each row.

Row 1: This row says that the SQL script, which was to be explained, was a
'SELECT' operation. (Remember that SQL script which we had used for
explaining is :
"select * from tic_unit_history where curr_vin='1FTFF25G9DKA45510'")
It also says that 'Cost = 2'. This means that the I/O cost or resource
access was 2. The unit of measure is unimportant. What is important, though,
is that the lessor the cost, the fast the query is going to execute.
Row 2: This row specifies which table was accessed and which access path was
used.
Row 3: This row specifies the index that was used to scan the table.

Note that, in the SQL script that was to be explained, I have used the
column 'curr_vin' in the WHERE clause. Why I used this specific column is
because, there is only one index for the table 'tic_unit_history', and that
index consists of only one column, which is 'curr_vin'. So, in order that
the query be processed faster, I specifically used the column in the WHERE
clause, which was present in the index. That is why the optimizer used the
index. When I executed the explained SQL script, it took 0.604 seconds to
return 1 record.

Let us explain another SQL query.

We have a table T4310_TAX_LOCATIONS with the synonym TAX_LOCATIONS. The
indexes and the columns in the indexes for this table is as follows :

Index Name Column Name Column Position
I4310_PK_TAX_LOCATIONS TAX_LOC_ID 1
I4310_IX2_TAX_LOCATIONS COUNTRY_CODE 1
I4310_IX2_TAX_LOCATIONS STATE_OR_PROV_ABBR 2
I4310_IX2_TAX_LOCATIONS COUNTY_NAME 3
I4310_IX2_TAX_LOCATIONS CITY_NAME 4
I4310_IX2_TAX_LOCATIONS ZIP_OR_POSTAL_CODE 5
I4310_IX3_TAX_LOCATIONS OWNER_TAX_LOC_ID 1

The script for explaining is as follows:
select * from tax_locations where tax_loc_id between 900000 and 900100

So the explain script will be:
explain plan set statement_id = 'pcs' for
select * from tax_locations where tax_loc_id between 900000 and 900100

Analysis script is the same as before. This will return the following:

Query Plan
SELECT STATEMENT Cost = 29
TABLE ACCESS BY ROWID T4310_TAX_LOCATIONS
INDEX RANGE SCAN I4310_PK_TAX_LOCATIONS

When we execute the query, it takes 0.302 seconds to return 101 records.

Suppose we modify the query a bit. Instead of using BETWEEN, we use the
relational operators <= and >=. The query will be as follows:

select * from tax_locations where tax_loc_id >= 900000 and tax_loc_id
<=900100

The explain script will be:
explain plan set statement_id = 'pcs' for
select * from tax_locations where tax_loc_id >= 900000 and tax_loc_id
<=900100

The analysis will be the same as before. But the query executes in 0.152
seconds to return the same 101 records. Note that the time reduction is 50%.

Let's try an example by using the index 'I4310_IX2_TAX_LOCATIONS'. Consider
the following query:
select * from tax_locations where state_or_prov_abbr='IL' and
county_name='ADAMS' and city_name='QUINCY'

The explain script will be:
explain plan set statement_id = 'pcs' for
select * from tax_locations where state_or_prov_abbr='IL' and
county_name='ADAMS' and city_name='QUINCY'

This analysis script will return:
Query Plan
SELECT STATEMENT Cost = 573
TABLE ACCESS FULL T4310_TAX_LOCATIONS

Note that the cost is 573 units and the access path that was used was Full
Table Scan. The execution time for this query was 2.143 seconds to return 5
records

Let's modify the query so as to include the country_code also, in the WHERE
clause. We must note that, all records in this table have country_code='US'.
So we might consider this insignificant, in the normal case. Anyway, we'll
try to explain the query.

select * from tax_locations where country_code='US' and
state_or_prov_abbr='IL' and county_name='ADAMS' and city_name='QUINCY'

The explain script is as follows:
explain plan set statement_id = 'pcs' for
select * from tax_locations where country_code='US' and
state_or_prov_abbr='IL' and county_name='ADAMS' and city_name='QUINCY'

And the analysis script returns:
Query Plan
SELECT STATEMENT Cost = 3
TABLE ACCESS BY ROWID T4310_TAX_LOCATIONS
INDEX RANGE SCAN I4310_IX2_TAX_LOCATIONS

Now, with the 'insignificant' country_code column, we have reduced the cost
from 573 units to 3 units. This is because, the optimizer found all the
columns in the index, except the last one (ZIP_OR_POSTAL_CODE) in the WHERE
clause of the query. So it used the index

No comments:

Post a Comment