Monday, January 18, 2010

Week 15 Future of Data Warehouse, Data Mining and Data Visualisation

This weeks' lecture is all about trends and future development of the data warehouse. I get an overall picture of the data warehouse development. I am very surprised by the media of storage which is covered in the lecture. According to Ms Chong, the future of data warehousing is not high-performance disk storage, but an array of alternative storage. This involves two forms of storage. Near-line storage involves an automated silo where tape cartridges are handled automatically. Secondary storage is slower and less expensive, such as CD-ROMs and floppy disks.

I think this trend of alternative storage is very reasonable as for data that are placed there once and left alone, so do not need to be updated at high speed. If data gets accessed less often as it ages, it can be moved to secondary storage, giving the resources to the new data and making access to newer data more efficient.

I have read through this article from Corporate Information Factory. It was written by W. H. Inmon. In this article it mention that high performance disk storage plays only a secondary role in the future of data warehousing. The real future of data warehousing is in a storage media collectively known as "alternative storage". This support what Ms Chong have taught us in the lecture.

I have extracted out from the article some reasons that why high-performance disk storage not a choice for data warehouse.
  • Secondary storage is a form of disk storage but whose disk is slower, significantly less expensive and less cached than high performance storage.
  • Data warehouse data is very stable.
  • The ability to store far more data on near line and/or secondary storage.
  • The rate at which secondary storage and near line storage is getting cheaper is at a faster rate than high performance storage.
Most organization is currently under the stage of non integrated information architecture. For the next few years companies will be moving towards integrated information center where data format and information is all standardized. According to Philip Howard, most important trend is towards the integration of text mining and data mining.

The future of Data Mining is to react more quickly and offer better service, and do it all with fewer people and at a lower cost.

The increase in hardware speed and capacity makes it possible to analyze data sets that were too large just a few years ago. However while the available data exponentially, the industry is looking into automatic procedures for data mining.

Data Mining is also been used in protecting privacy information.
  • One current intrusion detection technique is misuse detection – scanning for malicious activity patterns known by signatures.
  • Another technique is anomaly detection where there is an attempt to identify malicious activity based on deviations from norms.
STEPHEN FEW and PERCEPTUAL EDGE mention that data visualization is increasingly taking its rightful place as an important part of business intelligence.

Data visualization has in recent years become an established area of study in academia. Many universities now have faculty members who focus on visualization and a few have excellent programs that serve the needs of many graduate students who produce worthwhile research studies and prototype applications.

Both of them expect that data visualization will continue for the next few years to pursue and mature those trends that have already begun. Dashboards, visual analytics, and even simple graphs will continue to develop and conform to best practices. They also have seen evidence that newer efforts are emerging that will soon develop into full-blown trends.

According to STEPHEN FEW and PERCEPTUAL EDGE, another expression of data visualization that has captured the imagination of many in the business world in recent years is geo-spatial visualization. The popularity of Google Earth and other similar Web services have contributed a great deal to this interest.

Another trend that has made the journey in recent years from the academic research community to commercial software tackles the problem of displaying large sets of quantitative data in the limited space of a screen. The most popular example of this is the treemap.

That's all I like to share for this week

Happy Reading
Cheers
Chenyuan




No comments:

Post a Comment