In this week's lecture Ms Chong has introduced text mining to us. Text mining is performing data mining on an unstructured data. It is important for any data analyze to equip with this skill as 80% of organizational information is unstructured textual forms.
Some example of unstructured textual data will be:
- Remarks of a call centre officer
- Open question from a survey
- Web sites
- Annual reports

After creation of the attribute dictionary, user has to remove common words which is useless in for data mining (e.g. the, from , of, a)
Some challenges of Textual data mining
Some common textual data mining will be email document and telephone transcripts. Performing textual data mining on this two information will involves more difficult problems.
Some common problems faced during textual data mining will be spelling and grammar errors in the data. (e.g. customer, cust, customar, csmr). Most of the time data analyze has to group all the different words such as customer, cust, customar under one single attribute before doing any analysis.
Other common problems will be Semantic analysis and Syntax analysis.
How do we apply Text Mining to our daily operation?
Text mining can be apply to stop email spam or phishing though analysis of the document content.
Automatic process a message or email and route the message to the most appropriate department.
Identify most common problems from a help center.
Some time organization will receive hundreds of resumes, text mining can help to filter resumes to open positions.
Text mining can also help us to monitoring the website activities and find out user behaviours when browsing the website. The website admin can use the information to improve the website structure and make it more user friendly.
Just like any other data mining, text mining involve of 7KDD steps.
Like to recommend this page to my friends as it explains Typical Applications for Text Mining in detail and the different approaches to Text Mining.
Web Mining is also covered in this lecture
There are three different types of web mining
- Web content mining
- Web structure mining
- Web usage mining

Web server log file is very useful but there are some disadvantages of it. It is sometime difficult in differentiating individual user sessions. The same host addresses may be access by multiple users. The create a more accurate Session file, data analyst can combine referring pages with the host address to identify each individual user. Cookies will be the best choice if it is been allowed to be placed on the users' computer.
I find this case study of web usage mining very interesting.
No comments:
Post a Comment