VMCD.ORG » Bitmap index tips

对于BITMAP index的使用一直存在一些争议我们来看看下面几个观点：

1. Low cardinality – Some dabase vendor, like Oracle, provides very practical suggestion

If the number of distinct values of a column is less than 1% of the number of rows in the table, or if the values in a column are repeated more than 100 times, then the column is a candidate for a bitmap index.
B-tree indexes are most effective for high-cardinality data: that is, data with many possible values, such as CUSTOMER_NAME or PHONE_NUMBER.
There are 100 or more rows for each distinct value in the indexed column. When this limit is met, the bitmap index will be much smaller than a regular index, and you will be able to create the index much faster than a regular index.

2. No or little insert/update

Updating bitmap indexes take a lot of resources. Here are the suggestions
Building and maintaining an index structure can be expensive, and it can consume resources such as disk space, CPU, and I/O capacity. Designers must ensure that the benefits of any index outweigh the negatives of index maintenance.
Use this simple estimation guide for the cost of index maintenance: each index maintained by an INSERT, DELETE, or UPDATE of the indexed keys requires about three times as much resource as the actual DML operation on the table. What this means is that if you INSERT into a table with three indexes, then it will be approximately 10 times slower than an INSERT into a table with no indexes. For DML, and particularly for INSERT-heavy applications, the index design should be seriously reviewed, which might require a compromise between the query and INSERT performance.

3. Multiple Columns

One of the advantage is that multiple bitmap indexes can be merged and the column does not have to selective!More than one column in the table has an index that the optimizer can use to improve performance on a table scan Combining bitmap indexes on non-selective columns allows efficient AND and OR operations with a great number of rowids with minimal I/O

对于观点1 观点2 其实就是说明bitmap index 适合cardinality distinct小,并且并发DML很少的环境(OLAP) 但是对于大量的插入更新操作在cardinality很小的情况下无论是索引的大小还是插入的时间 bitmap index 还是占有一定的优势的但是随着distinct的上升无论index size还是插入的时间都会发生恐怖的变化,我们可以做一个简单的对比：

SQL*Plus: Release 11.2.0.3.0 Production on Thu Jun 7 16:59:35 2012

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>

Elapsed: 00:00:00.00
SQL> create index user_data2.idx_temp_liu_id on user_data2.temp_liu(id);

Index created.

Elapsed: 00:00:00.46
SQL> insert into user_data2.temp_liu select 1 from dual connect by level < 40000001; 40000000 rows created. Elapsed: 00:03:36.70 SQL> commit;

Commit complete.

b-tree index distinct count =1 耗时 3:36.70

Elapsed: 00:00:01.14
SQL> select sum(bytes)/1024/1024/1024 from dba_segments where segment_name =’IDX_TEMP_LIU_ID’;

SUM(BYTES)/1024/1024/1024
————————-
1.06738281

Elapsed: 00:00:00.85
SQL> drop index user_data2.IDX_TEMP_LIU_ID;

Index dropped.

Elapsed: 00:00:01.75
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);

Index created.

Elapsed: 00:00:06.79
SQL> drop index user_data2.IDX_TEMP_LIU_ID;

Index dropped.

Elapsed: 00:00:00.01
SQL> truncate table user_data2.temp_liu;

Table truncated.

Elapsed: 00:00:02.02
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);

Index created.

Elapsed: 00:00:00.01
SQL> select count(*) from user_data2.temp_liu;

COUNT(*)
———-
0

Elapsed: 00:00:00.00
SQL> insert into user_data2.temp_liu select 1 from dual connect by level < 40000001; 40000000 rows created. Elapsed: 00:03:10.28 SQL> commit;

Commit complete.

Elapsed: 00:00:00.06
SQL> select sum(bytes)/1024/1024/1024 from dba_segments where segment_name =’IDX_TEMP_LIU_ID’;

SUM(BYTES)/1024/1024/1024
————————-
.005859375

Elapsed: 00:00:01.95

可以看到耗时小于b-tree 但是index size << b-tree index size 下面测试下distinct count =2的对比

SQL> truncate table user_data2.temp_liu;

Table truncated.

Elapsed: 00:00:00.11
SQL> drop index user_data2.IDX_TEMP_LIU_ID;

Index dropped.

Elapsed: 00:00:00.01
SQL> create index user_data2.idx_temp_liu_id on user_data2.temp_liu(id);

Index created.

Elapsed: 00:00:00.18
SQL>
SQL> insert into user_data2.temp_liu select mod(rownum,2) from dual connect by level < 40000001; 40000000 rows created. Elapsed: 00:10:33.48 SQL> commit;

Commit complete.

Elapsed: 00:00:01.09

SQL> truncate table user_data2.temp_liu ;

Table truncated.

Elapsed: 00:00:02.31
SQL> drop index user_data2.IDX_TEMP_LIU_ID;

Index dropped.

Elapsed: 00:00:00.01
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);

Index created.

Elapsed: 00:00:00.17
SQL> insert into user_data2.temp_liu select mod(rownum,2) from dual connect by level < 40000001; 40000000 rows created. Elapsed: 00:03:43.77

b-tree 耗时接近10分钟 bitmap index 耗时 3:43s 差距继续扩大 bitmap_index << b-tree index

SQL> drop index user_data2.IDX_TEMP_LIU_ID;

Index dropped.

Elapsed: 00:00:02.12
SQL> truncate table user_data2.temp_liu ;

Table truncated.

Elapsed: 00:00:01.04

SQL> insert into user_data2.temp_liu select ceil (rownum/2000000) from dual connect by level < 10000001; 10000000 rows created. Elapsed: 00:00:19.68 SQL> truncate table user_data2.temp_liu;

Table truncated.

Elapsed: 00:00:00.27
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);

Index created.

Elapsed: 00:00:00.01
SQL> insert into user_data2.temp_liu select ceil (rownum/2000000) from dual connect by level < 10000001; 10000000 rows created. Elapsed: 00:00:45.71 SQL> select sum(bytes) from dba_segments where segment_name=’IDX_TEMP_LIU_ID’;

SUM(BYTES)
———-
2097152

Elapsed: 00:00:00.13
SQL> truncate table user_data2.temp_liu;

Table truncated.

Elapsed: 00:00:00.28
SQL> drop index user_data2.IDX_TEMP_LIU_ID
2 ;

Index dropped.

Elapsed: 00:00:00.01
SQL> create index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);

Index created.

Elapsed: 00:00:00.01
SQL> insert into user_data2.temp_liu select ceil (rownum/2000000) from dual connect by level < 10000001; 10000000 rows created. Elapsed: 00:00:41.29 SQL> select sum(bytes) from dba_segments where segment_name=’IDX_TEMP_LIU_ID’;

SUM(BYTES)
———-
277872640

Elapsed: 00:00:00.21
SQL> select distinct id from user_data2.temp_liu ;

ID
———-
3
4
1
2
5

Elapsed: 00:00:01.60

可以看到插入速度并没有得到明显的提升但是size 仍然bitmap << b-tree

SQL> truncate table user_data2.temp_liu;

Table truncated.

Elapsed: 00:00:00.24
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);

Index created.

Elapsed: 00:00:00.02
SQL>
SQL>
SQL>
SQL> insert into user_data2.temp_liu select ceil (rownum/5) from dual connect by level < 10000001; 10000000 rows created. Elapsed: 00:01:39.25 SQL> commit;

Commit complete.

Elapsed: 00:00:00.22
SQL> select sum(bytes) from dba_segments where segment_name=’IDX_TEMP_LIU_ID’;

SUM(BYTES)
———-
104857600

Elapsed: 00:00:00.01

对比之前的插入操作插入1000万数据耗费时间 distinct count=5 << distinct count=2000000

SQL> truncate table user_data2.temp_liu ;

Table truncated.

Elapsed: 00:00:00.37
SQL> drop index user_data2.IDX_TEMP_LIU_ID
2 ;

Index dropped.

Elapsed: 00:00:00.01
SQL> create index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);

Index created.

Elapsed: 00:00:00.02
SQL> insert into user_data2.temp_liu select ceil (rownum/5) from dual connect by level < 10000001; 10000000 rows created. Elapsed: 00:00:38.16 SQL> select sum(bytes) from dba_segments where segment_name=’IDX_TEMP_LIU_ID’;

SUM(BYTES)
———-
177209344

可以看到插入速度已经远远大于 bitmap-index 此时的size 也已经接近b-tree size

总结一下这个小实验在distinct 大量增加的环境下bitmap 索引完全失去了优势大量的维护代价不但让索引的大小成倍增加维护索引的代价也是剧增并且在OLTP这种高并发的环境下BITMAP显得那么格格不入 BITMAP index 只能被用在DW DSS等低并发环境,当一条insert 执行的时候，如果column value=x的，由于要更新bitmap，需要lock value=x的所有行，所以千万别在oltp中使用bitmap index

可以给个流程图：

FROM Richard Foote

Bitmap indexes should only be considered in Data Warehouse, low concurrency DML type environments due to their locking implications and certainly pre 10g, Bitmap indexes had growth issues after significant DML changes. However it’s a complete nonsense to suggest that Bitmap indexes should only be considered with columns with “few” distinct values, else things will run 100s of times slower.

关于DML 操作对于bitmap index的影响我们可以先看一下bitmap index block的结构：

In Oracle 9.2 and below, insertion is very inefficient
When a row is inserted
if leaf row already exists for ROWID then bitmap is updated
otherwise a new leaf row is inserted for the eight bits around the new row

In Oracle 10g, insertion is more efficient
rows are inserted into existing leaf rows where possible
if necessary start ROWID or end ROWID is adjusted

大量的DML 操作造成的 ROWID adjusted 的代价是很大的所以在9i的库我们经常发现bitmap index变的十分巨大(曾经见过一个2T的bitmap index)

上面的写了很多看来bitmap index 真是一无是处了,但是在很多特定情况下 bitmap 还是具有一定优势的就如上面第三个观点所说的 Multiple Columns的情况下
bitmap index merge :
Combining bitmap indexes on non-selective columns allows efficient AND and OR operations with a great number of rowids with minimal I/O

其实可以通过下面的图客观的反映出来

另外bitmap join index也是DW环境一个很不错的选择

总结 when to use bitmap index ?

1.Low cardinality
—
2.little concurrency DML statement
—
3.Multiple Columns using index merge in DW
—
4.bitmap join index in DW
—
5.just use it when you think your OLTP system is not bad enough 🙂