Difference between classification and clustering in data mining? [closed]

Can someone explain what the difference is between classification and clustering in data mining?

If you can, please give examples of both to understand the main idea.


In general, in classification you have a set of predefined classes and want to know which class a new object belongs to.

Clustering tries to group a set of objects and find whether there is some relationship between the objects.

In the context of machine learning, classification is supervised learning and clustering is unsupervised learning.

Also have a look at Classification and Clustering at Wikipedia.


Please read the following information:

enter image description here

enter image description here enter image description here


If you have asked this question to any data mining or machine learning persons they will use the terms supervised learning and unsupervised learning to explain you the difference between clustering and classification. So let me first explain you about the key word supervised and unsupervised.

Supervised learning: suppose you have a basket and it is filled with some fresh fruits and your task is to arrange the same type fruits at one place. suppose the fruits are apple,banana,cherry, and grape. so you already know from your previous work that, the shape of each and every fruit so it is easy to arrange the same type of fruits at one place. here your previous work is called as trained data in data mining. so you already learn the things from your trained data, This is because of you have a response variable which says you that if some fruit have so and so features it is grape, like that for each and every fruit.

This type of data you will get from the trained data. This type of learning is called as supervised learning. This type solving problem comes under Classification. So you already learn the things so you can do you job confidently.

unsupervised : suppose you have a basket and it is filled with some fresh fruits and your task is to arrange the same type fruits at one place.

This time you don't know any thing about that fruits, you are first time seeing these fruits so how will you arrange the same type of fruits.

What you will do first is you take on the fruit and you will select any physical character of that particular fruit. suppose you taken color.

Then you will arrange them based on the color, then the groups will be some thing like this. RED COLOR GROUP: apples & cherry fruits. GREEN COLOR GROUP: bananas & grapes. so now you will take another physical character as size, so now the groups will be some thing like this. RED COLOR AND BIG SIZE: apple. RED COLOR AND SMALL SIZE: cherry fruits. GREEN COLOR AND BIG SIZE: bananas. GREEN COLOR AND SMALL SIZE: grapes. job done happy ending.

here you didn't learn any thing before ,means no train data and no response variable. This type of learning is known unsupervised learning. clustering comes under unsupervised learning.


+Classification: you are given some new data, you have to set new label for them.

For example, a company wants to classify their prospect customers. When a new customer comes, they have to determine if this is a customer who is going to buy their products or not.

+Clustering: you're given a set of history transactions which recorded who bought what.

By using clustering techniques, you can tell the segmentation of your customers.