对于BITMAP index的使用 一直存在一些争议 我们来看看下面几个观点:
1. Low cardinality – Some dabase vendor, like Oracle, provides very practical suggestion
If the number of distinct values of a column is less than 1% of the number of rows in the table, or if the values in a column are repeated more than 100 times, then the column is a candidate for a bitmap index.
B-tree indexes are most effective for high-cardinality data: that is, data with many possible values, such as CUSTOMER_NAME or PHONE_NUMBER.
There are 100 or more rows for each distinct value in the indexed column. When this limit is met, the bitmap index will be much smaller than a regular index, and you will be able to create the index much faster than a regular index.
2. No or little insert/update
Updating bitmap indexes take a lot of resources. Here are the suggestions
Building and maintaining an index structure can be expensive, and it can consume resources such as disk space, CPU, and I/O capacity. Designers must ensure that the benefits of any index outweigh the negatives of index maintenance.
Use this simple estimation guide for the cost of index maintenance: each index maintained by an INSERT, DELETE, or UPDATE of the indexed keys requires about three times as much resource as the actual DML operation on the table. What this means is that if you INSERT into a table with three indexes, then it will be approximately 10 times slower than an INSERT into a table with no indexes. For DML, and particularly for INSERT-heavy applications, the index design should be seriously reviewed, which might require a compromise between the query and INSERT performance.
3. Multiple Columns
One of the advantage is that multiple bitmap indexes can be merged and the column does not have to selective!More than one column in the table has an index that the optimizer can use to improve performance on a table scan Combining bitmap indexes on non-selective columns allows efficient AND and OR operations with a great number of rowids with minimal I/O
对于观点1 观点2 其实就是说明bitmap index 适合cardinality distinct小,并且并发DML很少的环境(OLAP) 但是对于大量的插入更新操作在cardinality很小的情况下 无论是索引的大小 还是插入的时间 bitmap index 还是占有一定的优势的 但是随着distinct的上升 无论index size还是插入的时间 都会发生恐怖的变化,我们可以做一个简单的对比:
SQL*Plus: Release 11.2.0.3.0 Production on Thu Jun 7 16:59:35 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL>
Elapsed: 00:00:00.00
SQL> create index user_data2.idx_temp_liu_id on user_data2.temp_liu(id);
Index created.
Elapsed: 00:00:00.46
SQL> insert into user_data2.temp_liu select 1 from dual connect by level < 40000001;
40000000 rows created.
Elapsed: 00:03:36.70
SQL> commit;
Commit complete.
b-tree index distinct count =1 耗时 3:36.70
Elapsed: 00:00:01.14
SQL> select sum(bytes)/1024/1024/1024 from dba_segments where segment_name =’IDX_TEMP_LIU_ID’;
SUM(BYTES)/1024/1024/1024
————————-
1.06738281
Elapsed: 00:00:00.85
SQL> drop index user_data2.IDX_TEMP_LIU_ID;
Index dropped.
Elapsed: 00:00:01.75
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);
Index created.
Elapsed: 00:00:06.79
SQL> drop index user_data2.IDX_TEMP_LIU_ID;
Index dropped.
Elapsed: 00:00:00.01
SQL> truncate table user_data2.temp_liu;
Table truncated.
Elapsed: 00:00:02.02
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);
Index created.
Elapsed: 00:00:00.01
SQL> select count(*) from user_data2.temp_liu;
COUNT(*)
———-
0
Elapsed: 00:00:00.00
SQL> insert into user_data2.temp_liu select 1 from dual connect by level < 40000001;
40000000 rows created.
Elapsed: 00:03:10.28
SQL> commit;
Commit complete.
Elapsed: 00:00:00.06
SQL> select sum(bytes)/1024/1024/1024 from dba_segments where segment_name =’IDX_TEMP_LIU_ID’;
SUM(BYTES)/1024/1024/1024
————————-
.005859375
Elapsed: 00:00:01.95
可以看到 耗时小于b-tree 但是index size << b-tree index size
下面测试下distinct count =2的对比
SQL> truncate table user_data2.temp_liu;
Table truncated.
Elapsed: 00:00:00.11
SQL> drop index user_data2.IDX_TEMP_LIU_ID;
Index dropped.
Elapsed: 00:00:00.01
SQL> create index user_data2.idx_temp_liu_id on user_data2.temp_liu(id);
Index created.
Elapsed: 00:00:00.18
SQL>
SQL> insert into user_data2.temp_liu select mod(rownum,2) from dual connect by level < 40000001;
40000000 rows created.
Elapsed: 00:10:33.48
SQL> commit;
Commit complete.
Elapsed: 00:00:01.09
SQL> truncate table user_data2.temp_liu ;
Table truncated.
Elapsed: 00:00:02.31
SQL> drop index user_data2.IDX_TEMP_LIU_ID;
Index dropped.
Elapsed: 00:00:00.01
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);
Index created.
Elapsed: 00:00:00.17
SQL> insert into user_data2.temp_liu select mod(rownum,2) from dual connect by level < 40000001;
40000000 rows created.
Elapsed: 00:03:43.77
b-tree 耗时接近10分钟 bitmap index 耗时 3:43s 差距继续扩大 bitmap_index << b-tree index
SQL> drop index user_data2.IDX_TEMP_LIU_ID;
Index dropped.
Elapsed: 00:00:02.12
SQL> truncate table user_data2.temp_liu ;
Table truncated.
Elapsed: 00:00:01.04
SQL> insert into user_data2.temp_liu select ceil (rownum/2000000) from dual connect by level < 10000001; 10000000 rows created. Elapsed: 00:00:19.68 SQL> truncate table user_data2.temp_liu;
Table truncated.
Elapsed: 00:00:00.27
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);
Index created.
Elapsed: 00:00:00.01
SQL> insert into user_data2.temp_liu select ceil (rownum/2000000) from dual connect by level < 10000001;
10000000 rows created.
Elapsed: 00:00:45.71
SQL> select sum(bytes) from dba_segments where segment_name=’IDX_TEMP_LIU_ID’;
SUM(BYTES)
———-
2097152
Elapsed: 00:00:00.13
SQL> truncate table user_data2.temp_liu;
Table truncated.
Elapsed: 00:00:00.28
SQL> drop index user_data2.IDX_TEMP_LIU_ID
2 ;
Index dropped.
Elapsed: 00:00:00.01
SQL> create index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);
Index created.
Elapsed: 00:00:00.01
SQL> insert into user_data2.temp_liu select ceil (rownum/2000000) from dual connect by level < 10000001;
10000000 rows created.
Elapsed: 00:00:41.29
SQL> select sum(bytes) from dba_segments where segment_name=’IDX_TEMP_LIU_ID’;
SUM(BYTES)
———-
277872640
Elapsed: 00:00:00.21
SQL> select distinct id from user_data2.temp_liu ;
ID
———-
3
4
1
2
5
Elapsed: 00:00:01.60
可以看到 插入速度并没有得到明显的提升 但是size 仍然bitmap << b-tree
SQL> truncate table user_data2.temp_liu;
Table truncated.
Elapsed: 00:00:00.24
SQL> create bitmap index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);
Index created.
Elapsed: 00:00:00.02
SQL>
SQL>
SQL>
SQL> insert into user_data2.temp_liu select ceil (rownum/5) from dual connect by level < 10000001;
10000000 rows created.
Elapsed: 00:01:39.25
SQL> commit;
Commit complete.
Elapsed: 00:00:00.22
SQL> select sum(bytes) from dba_segments where segment_name=’IDX_TEMP_LIU_ID’;
SUM(BYTES)
———-
104857600
Elapsed: 00:00:00.01
对比之前的插入操作 插入1000万数据 耗费时间 distinct count=5 << distinct count=2000000
SQL> truncate table user_data2.temp_liu ;
Table truncated.
Elapsed: 00:00:00.37
SQL> drop index user_data2.IDX_TEMP_LIU_ID
2 ;
Index dropped.
Elapsed: 00:00:00.01
SQL> create index user_data2.IDX_TEMP_LIU_ID on user_data2.temp_liu (id);
Index created.
Elapsed: 00:00:00.02
SQL> insert into user_data2.temp_liu select ceil (rownum/5) from dual connect by level < 10000001;
10000000 rows created.
Elapsed: 00:00:38.16
SQL> select sum(bytes) from dba_segments where segment_name=’IDX_TEMP_LIU_ID’;
SUM(BYTES)
———-
177209344
可以看到 插入速度已经远远大于 bitmap-index 此时的size 也已经接近b-tree size
总结一下这个小实验 在distinct 大量增加的环境下bitmap 索引完全失去了优势 大量的维护代价 不但让索引的大小成倍增加 维护索引的代价也是剧增 并且在OLTP这种高并发的环境下BITMAP显得那么格格不入 BITMAP index 只能被用在DW DSS等低并发环境,当一条insert 执行的时候,如果column value=x的,由于要更新bitmap,需要lock value=x的所有行,所以千万别在oltp中使用bitmap index
可以给个流程图:
FROM Richard Foote
Bitmap indexes should only be considered in Data Warehouse, low concurrency DML type environments due to their locking implications and certainly pre 10g, Bitmap indexes had growth issues after significant DML changes. However it’s a complete nonsense to suggest that Bitmap indexes should only be considered with columns with “few” distinct values, else things will run 100s of times slower.
关于DML 操作对于bitmap index的影响 我们可以先看一下bitmap index block的结构:
In Oracle 9.2 and below, insertion is very inefficient
When a row is inserted
if leaf row already exists for ROWID then bitmap is updated
otherwise a new leaf row is inserted for the eight bits around the new row
In Oracle 10g, insertion is more efficient
rows are inserted into existing leaf rows where possible
if necessary start ROWID or end ROWID is adjusted
大量的DML 操作造成的 ROWID adjusted 的代价是很大的 所以在9i的库 我们经常发现bitmap index变的十分巨大(曾经见过一个2T的bitmap index)
上面的写了很多 看来bitmap index 真是一无是处了,但是在很多特定情况下 bitmap 还是具有一定优势的 就如上面第三个观点所说的 Multiple Columns的情况下
bitmap index merge :
Combining bitmap indexes on non-selective columns allows efficient AND and OR operations with a great number of rowids with minimal I/O
另外bitmap join index也是DW环境一个很不错的选择
总结 when to use bitmap index ?
1.Low cardinality
—
2.little concurrency DML statement
—
3.Multiple Columns using index merge in DW
—
4.bitmap join index in DW
—
5.just use it when you think your OLTP system is not bad enough 🙂