Thursday, November 22, 2012

Partitioning: A question of the right strategy

Partitioning: A question of the right strategy

The partitionig option is an essential option of the database for the administration of huge amounts of data. First introduced in Oracle Database 8 the functionality became more comprehensive with every new release. The difficulty of partitioning is not to create partitions but to choose a proper strategy for the design of the partitions. This article shows best practice to find a good partitioning strategy and a reasonable usage of the new partitionig features introduced in Oracle Database 11g.

Partitioning is an option which can be acquired with the enterprise edition of the database. With this option it is possible to split a table into smaller pieces with seperate tablespaces each. This change of the table structure is transparent for the application and can be done online with the usage of the DBMS_REDEFINITION package. But a good application uses partitioning from the scratch. It affects important criteria like managebility, performance, availability and the information lifecycle management, which are planned in the best case in advance. Partitioning can be used in different kinds of applications. Typically online transaction processing (OLTP) systems profit from improvements in managebility and availability while data warehouse systems profit from improvements in managebility and performance. To choose the best partitionig strategy one should decide on the priority of the aims to be achieved in consideration of the following categories:

  • Manageability: Movement, merge, deletetion and exchange of partitions allow a faster execution of tasks than operation based on the whole table. It helps with the loading, deletion and movement of data. The creation and the rebuilding of indexes with partitioning can be done faster than operations affecting the whole table.
  • Performance: The optimizer does not read partitions which do not have relevant data so-called Partition Pruning. Queries can be load balanced over physical devices. The optimizer accelerate queries with Partition-Wise Joins while the execution is optimized for partitions it accesses.
  • Availability: Splitting of a table into smaller tablespaces brings advantages in backup and recovery processes. Fast deletion of data with the drop partition function. Movement of partitions to other physical storage media while the data is still available for select statements.
  • Integration of partitioning in a Information Lifecycle Management (ILM) concept: Together with the Oracle ILM Assistant partitioning helps with its data movement and data deletion operations to store date cost effective. Frequently used data is stored on expensive and fast media while seldom used data resides on slow and cheap media.
Partitioning methods

The partitioning method describes the role that divides a table into partitions. Range partitioning is the most frequently used method which can be based on a time column of the table. For example:
CREATE SEQUENCE SALES_SEQ;
/
CREATE TABLE SALES_PART
PARTITION BY RANGE (TIME_ID)
INTERVAL (INTERVAL '1' YEAR) STORE IN (TAB_1, TAB_2, TAB_3)
(PARTITION P1999 VALUES LESS THAN 
(TO_DATE ('01.01.1999', 'dd.mm.yyyy')) TABLESPACE TAB_1) 
AS
(SELECT sales_seq.nextval SALES_ID, SALES.* FROM SH.SALES SALES 
WHERE TIME_ID > TO_DATE('01.01.1999', 'dd.mm.yyyy'))
/
In the example the sales table has been range partitioned by the time_id (date column). The sequence is used to fill the column sales_id which is used later in this article. The statement only creates the first partition. The other partitions are created automatically with the interval partioning feature introduced in Oracle Database 11g. The tablespaces are chosen with a round robin algorithm by the database system by using the tablespaces stated in the 'STORE IN' clause. After calculating the statistics the view USER_TAB_PARTITIONS shows each partition with the number of rows.
SQL> exec dbms_stats.gather_table_stats(user, 'SALES_PART')

PL/SQL procedure successfully completed.

SELECT PARTITION_NAME, NUM_ROWS, HIGH_VALUE 
FROM USER_TAB_PARTITIONS WHERE TABLE_NAME = 'SALES_PART'
/
PART_NAME NUM_ROWS TABLESP HIGH_VALUE
--------- -------- ------- ----------------------------------------------------------
P1999     0        TAB_1   TO_DATE('1999-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS'...
SYS_P48   247707   TAB_2   TO_DATE('2000-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS'...
SYS_P49   232646   TAB_3   TO_DATE('2001-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS'...
SYS_P50   259418   TAB_1   TO_DATE('2002-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS'...
Another partitioning method is the list partitioning (see example below table sales_list) which has lists of separate values as its partitioning criteria. The lists must be disjoint. While inserting a row the database automatically chooses the matching partition. There is also a possibility to choose a default partition. Without default partition values which do not match to a list of the partitioning table are rejected. Hash partitioning values are assigned a partition by using a hashing algorithm. The advantage is that one can use partitioning even if there is no partitioning criteria in the table.


Image: Range, List and Hash Partitioning
Composite partioning is a combination of two partitioning methods mentioned above. Up to Version 10 R2 a Range-Hash or a Range-List combination was possible. An example for Range-List partitioning are data from states partitioned by month and the name of the state. With the introduction of Oracle Database 11g range and hash partitioning can be used with any combination of a range, list or hash subpartition.

Data load

An advantage of partitioning is the easy load and deletion of data. To stage data from 1999 in the table SALES_PART first you have to load it into a temporary table and then exchange a partition with the table.
SQL> CREATE TABLE TMP_LOAD AS SELECT * FROM SALES_PART WHERE 1=0
/
Table created.

SQL> INSERT INTO TMP_LOAD SELECT sales_seq.nextval, SALES.* FROM 
SH.SALES WHERE TIME_ID < TO_DATE('01.01.1999', 'dd.mm.yyyy')
/
178834 rows created.

SQL> ALTER TABLE SALES_PART EXCHANGE PARTITION P1999 WITH TABLE TMP_LOAD
/
Table altered.

SQL> exec dbms_stats.gather_table_stats(user, 'SALES_PART', 'P1999')

PL/SQL procedure successfully completed.

SELECT PARTITION_NAME, NUM_ROWS FROM USER_TAB_PARTITIONS WHERE TABLE_NAME = 'SALES_PART'
/

PART_NAME NUM_ROWS 
--- ----- -------- 
P1999     178834   
SYS_P48   247707   
SYS_P49   232646   
SYS_P50   259418   
The temporary table must have the same structure as the partitioned table. This is done with the first statement. To ensure that the new table is empty 'where 1=0' is added to the statement. The duration of the exchange partition command does not rely on the amount of data. If the table would have preloaded 10 million rows the exchange command would have taken as long because only the dictionary is updated. The statistics are only collected for the new partition.

Indexes and partitioning

An index for a partitioned table can be partitionied or unpartitioned. The following are the possibilities for indexes:
  • local prefixed: The index is partitioned like the underlying table(local) and the leading columns of the index key conform to the partitioning key of the index.(prefixed)
  • local nonprefixed: The index is partitioned like the underlying table but the leading columns of the index key do not conform to the partitioning key of the index.
  • global prefixed: The index has a different partitioning like the table and the leading columns of the index key must conform to the partitioning key because global non prefixed indexes are not supported by Oracle database. The following example shows a local prefixed index on the column time_id.
SQL> CREATE INDEX TIME_IDX ON SALES_PART (time_id) LOCAL COMPUTE STATISTICS
/
Index created.

SQL> SELECT ui.INDEX_NAME, up.PARTITION_NAME, up.NUM_ROWS, ui.PARTITIONING_TYPE FROM 
USER_IND_PARTITIONS up, USER_PART_INDEXES ui where up.index_name='TIME_IDX'
and ui.index_name='TIME_IDX'
/

INDEX_NAME PART_NAME NUM_ROWS PARTITI
---------- --------- -------- -------
TIME_IDX   SYS_P101  247707   RANGE
TIME_IDX   SYS_P102  232646   RANGE
TIME_IDX   SYS_P103  259418   RANGE
TIME_IDX   P1999     178834   RANGE
The index is range partitioned like the table. When using indexes it is important to know that they can become unavailable by DDL statements against the table. The example shows a global unpartitioned index on sales_id becoming unusable.
SQL> create unique index sales_idx on sales_part (sales_id) global 
/
Index created.

SQL> alter table sales_part drop partition SYS_P50;

Table altered.

SQL> select index_name, status from user_indexes where index_name = 'SALES_IDX'
/
INDEX_NAME                     STATUS
------------------------------ --------
SALES_IDX                       UNUSABLE
The index will become valid again after an index rebuild. From Oracle 9i there is a possibility to maintain global indexes automatically and from Oracle 10g also local indexes can be managed automatically. To use this feature you have to issue the 'UPDATE INDEXES' clause when creating the table.

Increase performance with partition pruning
If the goal is to increase performance with partitioning then partition pruning and partition wise joins will help. With partition pruning only the partitions which are relevant for the query are choosen. The partition criteria must be supplied in the where clause of the SQL statement. When accessing the table with an index the performance enhancement will be up to zero. Depending on the data, full table scans can have an incredible accelerated query.
SQL> explain plan for
select * from sales_part where time_id between 
to_date('01.01.1998', 'dd.mm.yyyy') and to_date('31.12.1998', 'dd.mm.yyyy');

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------
| Id  | Operation              | Name       | Pstart| Pstop 
-----------------------------------------------------------
|   0 | SELECT STATEMENT       |            |       |       
|   1 |  PARTITION RANGE SINGLE|            |     1 |     1 
|*  2 |   TABLE ACCESS FULL    | SALES_PART |     1 |     1 
-----------------------------------------------------------
(Output simplified)
The example shows static partition pruning where at the point of the compilation of the execution plan the partitions are chosen. With dynamic partition pruning the database decides at execution time which partitions to access. This will happen for example if you use bind variables or subqueries. In the given example only one of the three partitions are accessed and two-thirds of the data is untouched to answer the query. If the partitioning is built in an application afterwards then it is often neccesary to optimize queries in the where clause to use partition pruning.

Virtual column partitioning

In Oracle Database 11g some additional features are implemented. If no suitable partitioning key is available one can try to produce a key with virtual columns. Values in virtual columns are not stored in the database but a deterministic function, which is to calculate the values on demand is. Virtual columns can be partitioned with the three base partitioning mehods or with composite partitioning. If a product id consists of category id in the first two numbers then you can extract the category as a virtual column and partition the table with list partitioning using the category id like the example shows.
create table sales_list(
prod_id number, cust_id number,
time_id date, prod_category number(2) as
(substr(prod_id, 1, 2)) virtual)
partition by List (prod_category)
(partition database     values (10),
 partition applications values (20),
 partition middleware   values (30),
 partition others values (default))
/
Reference partitioning

Another new partitioning feature of Oracle Database 11g is the reference partitioning. If you want to have a delivery address for each sale in a different table then you can't partition the new table like the sales table because it has no time_id column. With reference partitionig a connection between the two tables can be made using a foreign key constraint and you can achieve equipartitioning with the sales table.
create table salesdeliver(
sales_id number not null,
address varchar2(400),
constraint fkc_salesdeliver_sales
   foreign key (sales_id) references sales_part)
partition by reference (fkc_salesdeliver_sales)
/
Note: To use the example the sales_part table must be created without reference partitioning and sales_id must be the primary key.

Conclusion

Partitioning is a varied powerful option which can often increase the efficiency in maintanance, availability and performance without changing of the application itself. The enhancements in Oracle Database 11g complete missing features. Customer wishes can be realized more exact using the new Oracle Database 11g partioning features reference partitioning, enhanced composite partitioning, partitioning with virtual columns and interval partitioning.

This Article applies to:Oracle Database 8 and higher, Oracle Database 11g (new features)
Initial published:09.05.2008
Last change:09.05.2008