By: Milind Zodge
Overview
When you design a Data warehouse or Data Mart you come across many Date data type attributes in dimensions and/or facts tables. And . In this article I have pointed out a design technique for Date columns for fact table which gives highest performance.
Design
Consider a data mart having a fact table "Order" with many columns like Order Number, Order Date, Shipped Date and Amount and a "Time" dimension which have an entry for each day. You use "time_id" for "Order Date" however most of the time "Shipped Date" is kept as a Date column.
Consider in your reporting system you want to design a report to report number of orders shipped in a particular year. Now you will have format the shipped date column so that you can compare its year portion to get result. If you have a massive fact table this query is going to take more time as it will not be using any index, well you can create index to solve this problem.
Now consider you have a report which report number of orders shipped in a particular month, day, quarter etc. To speed up this operation you will have create index probably more than one. However if we use the id column and index on that column then you can avoid the above problem
Add shipped_date_id column along with the Shipped Date column in the fact table. Derive the value by using Time dimension. So whenever you query you always use index.
Conclusion
This way you can achieve maximum performance without adding more indexes. You can just go to your time dimension get the required ids and join it with your fact table which will use index defined on "shipped_date_id" column.
No comments:
Post a Comment