Data warehouse optimization

Performed by the database’s optimizer, and as a result it would either look at all rows for the specific date and scan for customer or look at all rows for the customer and scan for date. If, on the other hand, you defined a compound b-tree index using date and customer, it would use that index to locate the rows directly. The compound index would perform much better than either of the two simple indexes. Figure 9.7 Simplified b-tree index structure. Cars ID Type Color other 1DGS902 Sedan White 1HUE039 Sedan Silver 2UUE384 Coupe Red 2ZUD923 Coupe White 3ABD038 3KES734 3IEK299 3JSU823 3LOP929 3LMN347 3SDF293 Sedan Sedan Sedan Sedan White White Red Red Coupe Coupe Coupe Silver Silver Silver Cars.ID B-tree Index (Unique) ID Parent 1DGS902 1HUE039 2UUE384 2ZUD923 3ABD038 3KES734 3IEK299 3JSU823 3LOP929 3LMN347 3SDF293 Row 2 1 3 2 7 3 5 4 3 5 10 6 root 7 6 8 11 9 7 10 10 11 < 1 2 4 8 3 6 9 > 5 10 11 = Data Warehouse Optimization 303 B-Tree Index Advantages B-tree indexes work best in a controlled environment. That is to say, you are able to anticipate how the tables will be accessed for both updates and queries. This is certainly attainable in the enterprise data warehouse as both the update and delivery processes are controlled by the data warehouse development team. Careful design of the indexes provides optimal performance with minimal overhead. B-tree indexes are low maintenance indexes. Database vendors have gone to great lengths to optimize their index structures and algorithms to maintain balanced index trees at all times. This means that frequent updating of tables does not significantly degrade index performance. However, it is still a good idea to rebuild the indexes periodically as part of a normal maintenance cycle. B-Tree Index Disadvantages As mentioned earlier, b-tree indexes cannot be used in combination with each other. This means that you must create sufficient indexes to support the anticipated accesses to the table. This does not necessarily mean that you need to create a lot of indexes. For example, if you have a table that is queried by date and by customer and date, you need only create a single compound index using date and customer in that order to support both. The significance of column order in a compound index is another disadvantage. You may be required to create multiple compound indexes or accept that some queries will require sequential scans after exhausting the usefulness of an existing index. Which way you go depends on the indexes you have and the nature of the data. If the existing index results in a scan of a few dozen rows for a particular query, it probably isn’t worth the overhead to create a new index structure to overcome the scan. Keep in mind, the more index structures you create, the slower the update process becomes. B-tree indexes tend to be large. In addition to the columns that make up the index, an index row also contains 16 to 24 additional bytes of pointer and other internal data used by the database system. Also, you need to add as much as 40 percent to the size as overhead to cover nonleaf nodes and dead space. Refer to your database system’s documentation for its method of estimating index sizes. Bitmap Indexes Bitmap indexes are almost never seen in OLTP type databases, but are the darlings of dimensional data marts. Bitmap indexes are best used in environments whose primary purpose is to support ad hoc queries. These indexes, however, are high-maintenance structures that do not handle updating very well. Let’s examine the bitmap structure to see why.

46 trang | Chia sẻ: tlsuongmuoi | Lượt xem: 1811 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Data warehouse optimization, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Các file đính kèm theo tài liệu này:

mastering_data_warehouse_design_relational_and_dimensional_techniques00008_1626.pdf