Implementation of Image Segmentation for Natural Images using Clustering Methods

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 3, March 2013)

175

Implementation of Image Segmentation for Natural Images

using Clustering Methods

Saiful Islam

1

, Dr. Majidul Ahmed

2

1_{Assistant Professor, Dhamdhama Anchalik College, Department of Computer Science (HoD), Assam, INDIA} 2_{Assistant Professor, Gauhati Commerce College, Department of Information Technology (HoD), Assam, INDIA}

Abstract— Natural image is one of the fundamental

problems in image processing and Computer Vision. Image segmentation is the process of partitioning an image into multiple meaningful regions or sets of pixels with respect to a particular application. Image segmentation is a critical and essential component of image analysis system. In literature, there are many image segmentation techniques. One of the most important techniques is Clustering methods for natural image segmentation. Clustering methods were one of the first techniques used for the segmentation of natural images. Clustering in image segmentation is defined as the process of identifying groups of similar image primitive. The purpose of clustering is to get meaningful result, effective storage and fast retrieval in various fields. In literature, there are many Clustering methods for natural image segmentation. In this paper, we used three Clustering methods to implement and comparisons between them for segmentation of Natural images and they are K-Means clustering, K-Medoids clustering and Hierarchical clustering.

Keywords— Clustering Methods, Hierarchical Clustering,

K-Means Clustering, K-Medoids Clustering, MATLAB, Natural Image Segmentation.

I. INTRODUCTION

A. Image and Image Segmentation

An image is an two dimensional array or matrix of square pixels arranged in columns and rows. Each pixel represents the color or gray at a single point in the image. An image comes from „imago‟ which is Latin word. Natural images consist of an overwhelming number of visual patterns generated by very diverse stochastic processes in nature. Natural images are particularly noisy due to the environment they were produced. A digital image is a numeric representation (binary) of a two-dimensional image. Any image from a scanner, or from a digital camera, or in a computer, is a digital image.

The segmentation of natural images has become a very important task in today‟s scenario. Image segmentation is a fundamental process in many image, video, and computer vision applications.

The image segmentation for natural images is one of the classical problems in computer vision. The task of partitioning a natural image into regions with homogeneous texture, commonly referred to as image segmentation, is widely accepted as a crucial function for high-level image understanding, significantly reducing the complexity of content analysis of images. Image segmentation is typically used to locate objects and boundaries in images. It is one of the most difficult tasks in image processing because it determines the quality of the final result of analysis. There are many applications of natural image segmentation. Some of them applications are Medical imaging, Diagnosis, Treatment planning, Face recognition, Iris recognition, Fingerprint recognition, Traffic control systems, and so on.

B. Clustering

The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. A cluster is an ordered list of objects, which have some common characteristics. Clustering is a data mining technique used in statistical data analysis, data mining, pattern recognition, image analysis etc

.

Clustering is a classification technique. Given a vector of N measurements describing each pixel or group of pixels in an image, a similarity of the measurement vectors and

therefore their clustering in the N-dimensional

(2)

International Journal of Emerging Technology and Advanced Engineering

176 Fig.1. Similar data points grouped together into Clusters.

II. OBJECTIVES OF THIS PAPER

The objectives of this paper are as below

1. In this paper, we identify multiple objectives associated with image segmentation problems for natural images.

2. The main aim of this paper is, provide to implement and comparisons the three most important Clustering methods like K-Means clustering, K-Medoids clustering and Hierarchical clustering for natural image segmentation and to find their advantages and disadvantages.

III. METHODOLOGY (CLUSTERING METHODS) Clustering methods are commonly applied in image segmentation and statistic. Clustering methods can be classified into Supervised Clustering and Unsupervised Clustering. A supervised clustering demands human interaction to decide the clustering criteria and it includes hierarchical approaches such as relevance feedback methods. On the other hand, an unsupervised clustering decides the clustering criteria by itself and it includes density based clustering methods. According to the characteristics of clustering algorithm, we can roughly subdivide into partitional algorithms and hierarchical algorithms. A partitional algorithm divides a data set into a single partition, whereas a hierarchical algorithm divides a data set into a sequence of nested partitions. There are many clustering methods in the literature for natural image segmentation. Among these methods we have used the most commonly used clustering methods for natural image

segmentation are K-Means clustering, K-Medoids

clustering and Hierarchical clustering.

A. K-Means Clustering

The most popular method for image segmentation is K-Means clustering. The K-K-Means clustering is one of the simplest unsupervised learning algorithms that solve the well known clustering problems.

K-Means is the clustering algorithm used to determine the natural spectral groupings present in a data set. This algorithm partitions the image into K clusters (C1,

C2,………….., CK), represented by their centers or means. The center of each cluster is calculated as the mean of all the instances belonging to that cluster. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. Therefore, the better option is to place them as possible far away from each other. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early group age is done. At this point, it is need to re-calculate k new centroids as centers of the clusters resulting from the previous step. After these k new centroids, a new binding has to be done between the same data points and the nearest new centroid. As a result a loop has been generated and this loop we may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more. Finally, this algorithm aims at minimizing an objective function. The K-means algorithm is an iterative technique that is used to partition an image into K clusters.

The K-Means clustering algorithm is composed of the following steps:

1. Place K points into the space rep resented by the objects that are being clustered. These points represent initial group centroids.

2. Assign each object to the group that has the closest centroid.

3. When all objects have been assigned, recalculate the positions of the K centroids.

4. Repeat Step 2 and 3 until the centroids no longer move.

Experiment: The K-Means clustering algorithm is implemented using MATLAB and tested with the following image.

(3)

International Journal of Emerging Technology and Advanced Engineering

177 Advantages and disadvantages of K-Means clustering:

The advantages of K-Means Clustering are as below

ʘ K-Means Clustering works well when clusters are not well separated from each other, which is frequently encountered in images.

ʘ K-Means algorithm is easy to implement.

ʘ Its time complexity is O(n), where n is the number of patterns. It is faster than the hierarchical clustering.

ʘ It works perfectly fine with all natural images.

The disadvantages of K-Means Clustering are as below

ʘ The number of clusters is to be predefined in each iteration, which creates a problem for of huge image databases.

ʘ K-means clustering has problems when clusters are of different sizes, Densities, and Non-globular shapes.

ʘ K-means clustering has problems when the data contains outliers.

ʘ We cannot show the clustering details as

Hierarchical clustering does.

B. K-Medoids Clustering

A partitional algorithm, which attempts to minimize the SSE (Sum of Squared Error), is the K-Medoids clustering. The K-Means algorithm is sensitive to outliers since an object with an extremely large value may substantially distort the distribution of data. Instead of taking the mean value of the objects in a cluster as a reference point, a Medoid can be used, which is the most centrally located object in a cluster. Thus the partitioning method can still be performed based on the principle of minimizing the sum of the dissimilarities between each object and its corresponding reference point. This forms the K-medoids clustering method. This algorithm is very similar to the K -means algorithm. It differs from the latter mainly in its representation of the different clusters. Each cluster is represented by the most centric object in the cluster, rather than by the implicit mean that may not belong to the cluster.

A general version of K-means algorithm is called Medoids clustering method. The basic strategy of K-Medoids clustering algorithms is to find K clusters in n objects by first arbitrarily finding a representative object for each cluster. Each remaining object is clustered with the medoid to which it is the most similar. K-Medoids method uses representative objects as reference points instead of taking the mean value of the objects in each cluster.

The algorithm takes the input parameter K, the number of clusters to be partitioned among a set of n objects. A typical K-Medoids algorithm for partitioning based on medoid or central objects is as follows:

Input:

K: The number of clusters.

D: A data set containing n objects.

Output:

A set of K clusters that minimizes the sum of the dissimilarities of all the objects to their nearest medoid. Arbitrarily choose K objects in D as the initial representative objects.

The most common K-Medoids clustering is the Partitioning Around Medoids (PAM) algorithm and it is as follows:

1. Randomly select K of the n data points as the medoids.

2. Associate each data point to the nearest medoid. 3. For each medoid m and each data point o

associated to m swap m and o and compute the total cost of the average dissimilarity of o to all the data points associated to m. Select the medoid o with the lowest cost of the configuration. 4. Repeat alternating steps 2 and 3 until there is no

change in the assignments.

Experiment:This method is implemented using MATLAB and tested with the following image.

Fig.3

Advantages and disadvantages of K-Medoids clustering:

The advantages of K-Medoids Clustering are as below

ʘ K-Medoids clustering is immune to noise and outliers hence more suitable than K-Means clustering.

ʘ K-Medoids clustering is computationally more intensive.

(4)

International Journal of Emerging Technology and Advanced Engineering

178

The disadvantages of K-Medoids clustering are as below

ʘ K-Medoids clustering is computationally much costlier that K-means clustering.

ʘ K-Medoids is applied when dealing with

categorical data.

ʘ PAM works effectively for small data sets, but does not scale well for large data sets.

C. Hierarchical Clustering

One of the well known methods for image segmentation is Hierarchical clustering. It is the process of integrating different images and building them as a cluster in the form of a tree and then developing step by step in order to form a small cluster. The concept of hierarchical clustering is to construct a dendrogram representing the nested grouping of patterns (for image, known as pixels) and the similarity levels at which groupings change. The hierarchical clustering can be divided into two kinds of algorithm as below

1. Agglomerative (bottom up) hierarchical clustering:

Each object initially represents a cluster of its own. Then clusters are successively merged until the desired cluster structure is obtained.

The Hierarchical Agglomerative clustering algorithm is composed as the following steps:

1. Set each pattern in the database as a cluster Ci

and compute the proximity matrix including the distance between each pair of patterns.

2. Use the proximity matrix to find out the most similar pair of clusters and then merge these two clusters into one cluster. After that, update the proximity matrix.

3. Repeat Step 1 and 2 until all patterns in one cluster or just achieve the similarity we demand.

2. Divisive(top down) hierarchical clustering:

All objects initially belong to one cluster.

Then the cluster is divided into sub-clusters which are successively divided into their own sub-clusters. This process continues until the desired cluster structure is obtained.

The Hierarchical Divisive clustering algorithm is composed as the following steps:

1. Start with one cluster of the whole image. 2. Find the pattern xiin cluster Cisatisfied d(x, Ci)

= max (d(y, Ci)), for yCi, where i = 1, 2, …,

Nand N is the current number of clusters in the whole image.

3. Split xiout as a new cluster Ci+N, and then compute

d(y, Ci) and d(y, Ci+N), for. If d(y, Ci) > d(y, Ci+N),

then split y out of Ci and merge it into Ci+N.

4. Repeat to Step 2 until all of the clusters are not change anymore.

Experiment: The Hierarchical clustering method is implemented using MATLAB and tested with the following image

Fig.4

Advantages and disadvantages of Hierarchical Clustering:

The advantages of Hierarchical Clustering are as below

ʘ

The process and relationships of Hierarchical clustering can just be realized by checking the dendrogram.

ʘ

The result of hierarchical clustering presents high correlation with the characteristics of original database.

ʘ

We only need to compute the distances between each pattern, instead of calculating the centroid of clusters.

ʘ

Hierarchical clustering methods are more

versatile.

ʘ

It is easy to handle of any forms of similarity or distance.

The disadvantages of Hierarchical Clustering are as below

ʘ

Hierarchical clustering methods can never undo what was done previously. Namely there is no back-tracking capability.

ʘ

No objective function is directly minimized.

ʘ

Difficulty handling different sized clusters and

convex shapes.

(5)

International Journal of Emerging Technology and Advanced Engineering

179

ʘ

Hierarchical clustering methods do not scale up well with the number of observations.

IV. COMPARISONS OF THE VARIOUS CLUSTERING

METHODS

We used different natural images to experiment using K-means clustering, K-Medoids clustering and Hierarchical clustering. There are some differences between various Clustering methods for natural image segmentation as below

A. Comparisons between K-Means and K-Medoids clustering

1. K-Medoids is a generalized of K-Means

clustering.

2. K-Medoids clustering is computationally more intensive than k-means clustering.

3. K-Means can work only with numerical,

quantitative variable types but K-Medoids can work with any distance measure.

4. The K-Medoids method is more robust than the K-Means algorithm in the presence of noise and outliers because K-Medoid is less influenced by outliers or other extreme values than K-Means clustering.

5. K–Medoids is computationally much costlier than K-Means clustering.

6. Unlike K-Means clustering algorithm, K-Medoids is not sensitive to dirty images.

7. The efficiency of K-Means clustering is comparatively more than K-Medoids clustering. 8. K-Means clustering is easy to implement but

k-Medoids clustering is complicated to implement.

B. Comparisons between Hierarchical and Partitional (K-Means and K-Medoids) clustering.

1. The Partitional clustering is faster than Hierarchical clustering.

2. Hierarchical clustering requires only a similarity measure, while Partitional clustering requires stronger assumptions such as number of clusters and the initial centers.

3. Hierarchical clustering does not require any input

parameters, while Partitional clustering

algorithms require the number of clusters to start running.

4. Hierarchical clustering returns a much more meaningful and subjective division of clusters but Partitional clustering results in exactly K clusters.

5. Hierarchical clustering algorithms are more suitable for categorical data as long as a similarity measure can be defined accordingly than Partitional clustering.

6. Hierarchical clustering is a sequential partitioning process, which results in a hierarchical nested cluster structure, while partitional clustering is an iterativepartitioning process.

V. MATLAB

The name MATLAB stands for matrix laboratory. MATLAB is a software program that allows us to do data

manipulation and visualization, image analysis,

calculations, math and programming. It can be used to do very simple as well as very sophisticated tasks. MATLAB is a high-performance language for technical computing. MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. MATLAB can import/export several image formats such as BMP (Microsoft Windows Bitmap), GIF (Graphics Interchange Files), HDF (Hierarchical Data Format), JPEG (Joint Photographic Experts Group), PCX (Paintbrush), PNG (Portable Network Graphics), TIFF (Tagged Image File Format), XWD (X Window Dump). MATLAB can also load raw-data or other types of image data.

VI. CONCLUSION

(6)

International Journal of Emerging Technology and Advanced Engineering

180

Different Clustering methods work better under different conditions. The K-Means clustering has better performance and easy to implement than other Clustering methods.

REFERENCES

[1] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 3rd ed., Prentice Hall, New Jersey 2008.

[2] W. K. Pratt, Digital Image Processing, 3th ed., John Wiley & Sons, Inc., Los Altos, California, 2007.

[3] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, Prentice Hall, 1988.

[4] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, vol. 31, issue 3, pp. 264-323, Sep. 1999.

[5] R. C. Dubes, A. K. Jain, (1976). “Clustering techniques: the user‟s dilemma”, Pattern Recognition.

[6] R. H. Turi, (2001). “Clustering-Based Color Image Segmentation”, PhD Thesis, Monash University, Australia.

[7] T. Kanungo, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman, & A. Y. Wu (2002) “An efficient k-means clustering algorithm: Analysis and implementation” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp.881-892.

[8] S. Mary Praveena, Dr. IlaVennila,” Optimization Fusion Approach for Image Segmentation Using K-Means Algorithm”, International Journal of Computer Applications (0975 – 8887) Volume 2 – No.7, June 2010.

[9] Matteo Matteucci. “Tutorial on Clustering Algorithms,” Politecnico

di Milano,

http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/kmeans.h tml (accessed October 4, 2010).

[10] Teknomo, Kardi. "K-means Clustering Tutorials," http://people.revoledu.com/kardi/tutorial/K-means/(acessed October 6, 2010).

[11] An Efficient k-means Clustering Algorithm: Analysis and Implementation by Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth. Silverman Angela Y. Wu. [12] K. S. Fu, “A survey on image segmentation,” Pattern Recognition,

vol. 13, pp. 3–16, 1981.

[13] Osama Abu Abbas, "Comparisons Between Data Clustering Algorithms," The International Arab Journal of Information Technology, vol. 5, no. 3, p. 320, July 2008.

[14] The MarthWorks 2007. Matlab 7. Creating Graphical User Interfaces. United States of America. The MathWorks.

[15] The MarthWorks 2004. Getting started with Matlab. Version 7. United States of America. The MathWorks.

[16] R. M. Haralick and L. G. Shapiro, “Image segmentation techniques,” Computer Vision Graphics Image Process., vol. 29, pp. 100–132, 1985.

[17] N. R. Pal and S. K. Pal, "A review on image segmentation techniques," Pattern Recognition, vol. 26, pp. 1277-1294, 1993. [18] Huiyu Zhou, Abdul H. Sadka, Mohammad R. Swash, Jawid Azizi

and Abubakar S. Umar., “Content Based Image Retrieval and Clustering: A Brief Survey” school of Engineering and Design, Brunel University, Uxbridge, UB8 3PH, UK

[19] Google:http://sites.google.com/site/dataclusteringalgorithms/k-mean-clustering-algorithm.

[20] Research issues on K-means Algorithm: An Experimental Trial Using Matlab by Joaquin Perez Ortega, Ma. Del Rocio Boone Rojas and Maria J. Somodevil Garcia.

AUTHORS PROFILE

Saiful Islam, Assistant Professor, HoD,

Department of Computer Science,

Dhamdhama Anchalik College,

Dhamdhama, Nalbari(Assam), India. He has Completed MCA (Computer Science & Engineering) in 2009 from Tezpur University, Assam, India. He is pursuing PhD from CMJ University, Meghalaya, India, in the department of Computer Science and Applications. His interesting research areas are Image Processing, Data Mining, and Database Management System. He has three years teaching experiences. He is also Co-coordinator of Study Centre at Dhamdhama Anchalik College under IDOL (Gauhati University).

Dr. Majidul Ahmed, Assistant professor , HOD, Department of

Information Technology, Gauhati