Wednesday, December 16, 2009

Week 4 Data Warehouse and OLAP

This week, there are a lot of things for me to absorb. The lecture is very packed and almost run out of time :)

In this week, I have learned the differences between data warehouse and operational database. Ms Chong also went though the different type of data warehouse schemas and Dimension tables. Different OLAP servers is also covered in the lecture.

Difference between data warehouse and operational database

Data warehouse is made up of both internal and external data. By extracting all the necessary data from different source and perform ETL process will help you to create a data warehouse.

There are three different design of a data warehouse

  • Star Schema
  • Snowflake Schema
  • Constellation Schema
Star Schema
Star schema architecture is the simplest data warehouse design. It is a data modeling technique used to map multidimensional decision support data into relational database.

You may check out this link for detail information about star schema. This website explain the structure of a Star Schema and different components in it. I think it will be helpful as there are examples given.

Snowflake Schema

Snowflake schema architecture is a more complex variation of a star schema design. The main difference is that dimensional tables in a snowflake schema are normalized, so they have a typical relational database design.

Snowflake schemas are generally used when a dimensional table becomes very big and when a star schema can’t represent the complexity of a data structure

I like to recommend this blog as it has provide both advantage and disadvantage of Snowflake Schema

Constellation Schema

Constellation Schema is made up of two or more Fact table and it is usually used for bottom up approach. Different Fact table are linked by dimension table.

Slowly changing dimensions are dimensions which change over time. There are three type of slowly changing dimensions

Type 1

  • Overwrites the previous dimension information
  • Does not track changes
  • Usually results in in accurate analysis

Type 2

  • Added four more supplementary attributes to track history of a dimension
  • This allow the tracking of entire history but require large data storage

Type 3

  • Only implement tow more additional attributes
  • Tracking only the current and original state of a dimension member
OLAP
An OLAP (Online analytical processing) cube is a data structure that allows fast analysis of data

Relational OLAP (ROLAP)

  • Real time and flexible than cube
  • Query response is generally slower
  • Low Storage requirement
  • Greater scalability
Multidimensional OLAP (MOLAP)

  • It can process faster
  • Implement for cubes with frequent use and rapid query response

Hybrid OLAP (HOLAP)

  • It contain all the advantage of MOLAP and ROLAP but require large volumes of storage space.
A lot of information in one lecture, happy learning~
Cheers~

No comments:

Post a Comment