An approach to cbir using k-Means clustering and polygon based shape features - Dominic Mai

For an approach that does not use any semantical knowledge surprisingly good results were produced distinguishing broad categories like sunrise, sea or mountain. But the semantical gap still needs to be bridged for eliminating obvious fail classifications that are produced due to similar colorconfigurations in images that are from a human viewpoint have very little in common. An approach that uses classification techniques like Support Vector Machines or ADA Boost could drastically improve performance if it was possible to extract meaningful information (like people, indoor, buildings, etc.) out of an image in a first classification step.

8 trang | Chia sẻ: thucuc2301 | Lượt xem: 615 | Lượt tải: 0

Bạn đang xem nội dung tài liệu An approach to cbir using k-Means clustering and polygon based shape features - Dominic Mai, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52 45 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên AN APPROACH TO CBIR USING K-MEANS CLUSTERING AND POLYGON BASED SHAPE FEATURES Dominic Mai, Toi Nguyen Van  Faculty of information Technology Thai Nguyen University ABSTRACT In this paper an approach to Content Based Image Retrieval (CBIR) is examined that uses K- means clustering for segmenting an image and then extracts global and local features using color and shape information over the extracted regions to compute a similarity measure between images. Although color is used as main information source for creating the clusters that an image is formed of, shape factors of the separated regions will be taken into account for the retrieval process as color is heavily dependent on the lighting of a scene. A fuzzy representation of features is chosen that suits the blurry nature of image segmentation. Từ khóa: Shape information, K-means, global feature, local feature, segment an image * INTRODUCTION In 2006 over 300 million photos were uploaded to flickr, one of the biggest photo sharing communities on the internet. This number is just for illustrating the fact that the number of pictures stored in electronic databases is increasing rapidly and the task of efficiently retrieving pictures stored in such a database is becoming more and more important. Text based search techniques can only be applied if pictures have been assigned meaningful labels describing semantic entities like people, outdoor scene, etc. Unfortunately, understanding a picture in the way humans do is a very hard task that is not yet solved by automated algorithms – this is also known as the semantic gap. As most of pictures do not come labeled or inside a labeled context (e.g. a website) automatic retrieval of images cannot be done using text retrieval techniques. Luckily, it is not necessary to semantically understand a picture for performing a satisfying retrieval on a database. In this paper an approach to Content Based Image Retrieval (CBIR) is examined that uses K- means clustering for segmenting an image and then extracts global and local features using color and shape information over the extracted regions to compute a similarity measure between images. Although color is used as main information source for creating the clusters that an image is formed of, it will not play a major role * Tel: 0912847077 for the retrieval process as the concept of color is quite difficult [1] – just think of the different white balance settings of your camera you have to make when shooting under different lighting conditions to make white appear as white. Therefore the shape of the found regions is analyzed using a polygonal representation of the shape’s boundary which is not dependent on color information. To account for the inherent uncertainty coming with the applied segmentation process – a good segmentation cannot be achieved without semantically understanding the image – a fuzzy representation of the regions is applied. A reference System has been implemented using JAVA. RELATED WORK This paper is mainly based on the ideas for segmentation and similarity computation presented in [3]. In the process of analyzing the image, it is partitioned into blocks of 4x4 which contain average color values over the three bands in the RGB color model. These blocks (Blockfeatures) are then clustered using an iterative K-means [2] with a small refinement that leads to superior performance over randomly choosing the initial point set. A Cauchy membership function is used to account for fuzziness in assigning Blockfeatures to their respective clusters. It turns out that only the center points of a cluster are needed for the comparison measure applied which leads to very little Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52 46 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên computation costs concerning memory and processing time. Although proposed in [3] texture features yielded by extracting high frequency coefficients of an orthogonal transformation on the Y layer in YUV color space were not included as they yielded counterproductive results considering the task of clustering. Three shape features are computed over the reconstructed boundary of a region which is represented as a polygon. These features are compactness, solidity and eccentricity as proposed in [2], although a different approach to the computation of solidity is computed which was more suitable on the available data. Polygonal based feature extraction has shown superior results to the mere computation of normalized inertia of different orders. FEATURE EXTRACTION The first step in the processing pipe of an image is the extraction of features describing quadratic regions in the image. A size of 4x4 has proven a good tradeoff between the conservation of detail and the enhancement of processing time. In this preprocessing step the 3 color bands red green and blue of the 16 pixels are averaged to yield an 3R - feature vector (normalized to [0 . . 1] ) which is used for the K-means clustering algorithm. This averaging is equal to computing the DC coefficient of a discrete cosine transformation which is the most important for representing this Block of image data [6]. The loss of detailed structure is not severe as we will take account for blurry boundaries with a fuzzy assignment of these Blockfeatures to their clusters. The advantages are more robustness against noise and a drastically reduced computation time over working with single pixels. We will not use texture features based on higher order coefficients of an orthogonal base transformation (e.g. Wavelet or DCT) as their implications are counterintuitive to our task: While they work very good on extracting boundaries they have not been useful in distinguishing between different textures at object level. (This might be due to the relatively small area covered by the Blockfeature, so additional research is required). Also position information through x/y pixel coordinates have not been included as they tend to produce connected compact1 regions: Blockfeatures that are further away from the center point of the cluster will receive a penalty proportional to their distance to this center. Although it is usually true that semantic entities of a picture also form connected pixel-clusters, this has not proven useful with the used similarity measure. K-MEANS CLUSTERING The key idea of this CBIR approach is that similar semantic entities in different pictures will consist of pixels that are similar in color and have a sufficient dissimilarity from the background (this of course does not hold for all classes – e.g. think of a camouflaged animal). Put in other words, pixels that belong to an Object will yield Blockfeatures that are close together in respect to a distance measure in the underlying feature space. The distance measure we will use is the Euclidean distance which can be calculated through formula (1) in any metric vector space:    2 ),( ii babadist (1) In 2 respectively 3 dimensions this resembles our intuitive notion of distance and therefore also is an appropriate choice for higher dimensions. The K-means clustering algorithm is an iterative technique that tries to minimize intra cluster variance on a predefined number of clusters. Put in other words, the algorithm tries to minimize the distance between elements of a cluster and its representative center, also called centroid. The algorithm first presented in [2] works like this: k-means(data[], k) 1. randomly chose k centroids out of data[] 2. Assign each point of data[] the centroid which is the closest in respect to the distance measure 3. Re-compute the centroids locations by Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52 47 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên averaging over the elements of a cluster 4. Stop when number of changes in assignment step drops below a threshold threshkm The algorithm is well studied and works fast but due to the random nature of the input data, it doesn’t produce the same results in every run and also it is not assured to provide a global optimum although it is sure to converge to a solution once initialized. But the major drawback is that the number of clusters needs to be specified in advance – but this is something we usually don’t know in advance as it is dependent on the contents in the analyzed image. Therefore an iterative approach is chosen which gradually increases the number of clusters and computes a quality measure on the found clustering. The quality measure applied is the average distance of the elements to their centroid which is a good indicator if further clustering would benefit the task. Two stop criteria are applied: Either when the quality measure drops below a specified threshold abs or when there is no significant change in two iteration steps (quality difference rel ) . Luckily K-means clustering is fast, so it is possible to start with k=2 and gradually increase the number of clusters. Empirical results show that normally no more than 8 clusters are being created. For the iterative K-means a slight change has been made for the initialization step which reduced the computation time compared to the random initialization: only the new cluster is initialized by chance, the others are taken from the previous segmentation result. To the overall segmentation process using the k- means algorithm three thresholds have to be set which work as abort criterion for the k-means ( threshkm ) or iterative k-means ( absrel  , ) algorithm respectively. It was not trivial to find an appropriate setting for the used picture set as a good segmentation is necessary for good retrieval performance. Parameters were set to 10threshkm 005.0 absrel  which resulted in a maximum of 8 regions per image and a computation time of ~1 sec on a 2GHZ processor. The centroids ĉi of the clusters and the average inter centroid difference are computed using equation (2) where C denotes the number of clusters.      1 0 1 )ˆ,ˆ( )1(* 2 c i c ik kif ccdist cc d (2) Note that due to the nature of the K–means clustering the centroid holds the average color values for its region and also is the regions center off mass. SHAPE FEATURES To be not only dependent on the color information three shape features are being computed over the found regions: 1. Compactness 2. Solidity 3. Eccentricity In a first step the polygonal representation of the 2- dimensional boundary of the Blockfeatures belonging to one region is extracted. Remember that the Blockfeature though being part of an arbitrary high dimensional feature space have their origin in the 2- dimensional pixel space they were originally created from. As the 2-dimensional data is likely to be scattered the exact reconstruction of the boundary is a hard task as it is not easy to determine if a Blockfeature is inside the object or part of its border when the Blockfeatures do not form a connected region in 2-d picture space. Luckily exact reconstruction is not necessary as we only need consistent result over similar “looking” regions identified by the K-means algorithm. The algorithm for constructing the polygonal shape is inspired by the way a radar sweep works: A beam is fired from the shapes center of mass and in every position the point which lies on the beam and is most far away from the center is added to the polygon. This algorithm works in time linear to the desired # of desired points for the polygon. With this strategy we are able to reconstruct Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52 48 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên every convex shape and also star-like concave polygonal shapes (intuitive proof idea: In a star every point of the boundary is visible from the center). With these 2 classes a wide variety of possible shapes is covered and furthermore it is not too important as clusters might appear mixed when projected into 2-d picture space. Therefore we will account for this fact with the solidity measure that relates the Area covered by the Polygon to the Area obtained by counting the Blockfeatures that belong to this region (eq. (3)). real polygon A A Solidity (3) The Area covered by a polygon can be calculated in time linear to the number of vertexes. The compactness relates the Area of the polygon to its perimeter, obtaining a measure that indicates the equality of the examined shape to a circle where 1 would mean that the examined shape has perfect equality (eq. (4)). P A scompactnes 4  (4) The last shape feature that we use is the eccentricity which relates the short axis to the long axis of the best fitting ellipse. It can be computed using the (p, q) central order moments (eq. (5)). See [2] for details on computation. 2 11 2 02200220 2 11 2 02200220 4)( 4)( uuuuu uuuuu ecc    (5) Finally we yield a feature vector for a region which is element of [0. .1] 6 consistin of three color features which represent the average color of the region in the RGB color bands and the above described features reflecting properties of the shape the region has. Note that all described shape features are invariant to translation, rotation and scaling. FUZZY REPRESENTATION Due to the inherent insecurity in the segmentation process a fuzzy assignment of Blockfeatures to their respective centers is chosen. This means that every Bockfeature belongs to every region with a certain degree based on its distance to the region’s centroid. A membership function is chosen that maps the distance of a Blockfeature to its centroid to a membership degree in [0. .1]. There is a wide variety of functions yielding such a measure; we chose the Cauchy function (eq. (6)) due to its simplicity and computational advantages.          d cxdist C i i )ˆ,( 1 1 (6) The Cauchy function does a mapping of the distance of a Blockfeature to the centroid icˆ . It has two tunable parameters },{ d which determine the degree of fuzziness. In other words distances until a certain threshold are nearly perfectly mapped to one of the centers (value close to one) and distances bigger than this threshold are mapped on a smooth transition approaching 0 quickly. Figure 1: Cauchy function with 5,0  d In Figure 1 a Cauchy function is plotted with the indicated settings for  and d: d gives the distance which is mapped to 0.5 and  gives the smoothness of the transition. Bigger values for  result in a fast jump to zero (small degree of fuzziness) while smaller values indicate a slowly transition to zero (big degree of fuzziness). In our system d is set to the average inter cluster difference (eq. (2)) and 5 have shown good practical results. Luckily it is not necessary to store the set of membership-degree to each centroid for every Blockfeature as it turns out that due to the character of the Cauchy function and the similarity measure we will apply, only the centroids icˆ and the average inter cluster Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52 49 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên difference will need to be stored. Note that the same ideas explained for the color features here hold for the shape features computed for each region: Due to the uncertainty in segmentation we will allow a certain distance between the shape features that is considered as equal. SIMILARITY MEASURE FOR REGIONS In this section we will introduce a similarity measure for fuzzy regions. In the next section a comparison scheme is introduced to compare whole images based on the similarity of their regions. At first we need to define the intersection operation on fuzzy sets, which will be the base for our similaritymeasure. Again we chose a computationally easy approach which maintains our intuitive notion of intersection between two regions. Recall that in standard set theory an element belongs to the intersection of two sets iff it belongs to set A and set B.  )(),(min)( xCxCxC BABA  (7) The intersection of two fuzzy sets is given by a membership function (eq. (7)) of an element [0..1] 6 that indicates its belonging to region A and region B. Based on this definition we now can define a similarity measure between two regions represented by their respective centroids ACˆ and BCˆ . )(sup),( 6]1..0[ xCBAsim BA x    (8) This similarity measure has the big advantage that due to the unimodality of our membership function the element x which defines the supremum must lie on a line connecting the centroids representing these two regions. Therefore we can compute the similarity measure using eq. (9)   ),()( )( ),( BAdistdd dd BAsim ba ba    (9) Details on the derivation process can be found in (2). Note that the parameters id and  correspond to the average inter cluster difference and fuzziness parameter introduced in the previous section. COMPARING IMAGES Based on the similarity measure introduced in the previous section we can now develop an algorithm for comparing two images and computing a similarity measure in [0..1]. The clustering produced by the K-means algorithm is dependent on the image contents and the choice of thresholds for abortion, therefore it cannot be guaranteed that similar objects are separated in the same way, producing the same number of clusters. This fact is illustrated in Figure 2 showing the results of different segmentations of the same image due to different settings of thresholds. Figure 2: Segmentation of the same image with different thresholds for K-means 005.0abs yields 4 clusters (middle); 05.0abs yields 2 clusters (right) (Red dots indicate the positions of the icˆ when projected to 2D picture space) To account for the fact that usually we have to compare images that are segmented into different numbers of clusters and that these clusters always come in random order, we make pair wise comparisons between each pair of clusters. This yields a similarity vector which has 21 KK  entries where iK is the number of clusters found by the K-means segmentation step. The first elements correspond to the regions 1..1 K of the first image, the second half of the similarity vector corresponds to the regions 2..1 K . For every entry the similarity of this region with all regions of the other image is computed using eq. (9). The highest similarity value is stored in the similarity vector. In this example a correspondence of [0.999, 0.991] has been found for comparing the right image to the middle one which eventually leads to a similarity of 0.97. This justifies the approach and shows that the clustering algorithm yields comparable results even though the clusters might look quite different at first glance. Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52 50 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên In a second step the similarity vector  is collapsed to a value in [0. .1] by a scalar multiplication with a weight vector  . We observe that each entry in the similarity vector i ]1..0[ , 1 meaning that a perfect match for a region has been found in the other image. For assuring that the final similarity measure also is in [0. .1], the weight vector must fulfill the constraint given through eq.(10). 1 i (10) The easiest way to perform the weighing is to assign equal weights to all the components of  , which already leads to usable results but favors the image which has been segmented into more regions. There are several possible ways one can think of as a weighing scheme, here are some possible examples:  Equal weighing to all regions  Account for different number of regions in the images  Use area covered by the regions as an importance measure for assigning more weight to bigger areas (background favored) or to smaller areas (highlight favored)  Account for the position of the centroid icˆ and assign higher weights to centroids that lie in the center region of the image (center favored) or apply higher weights to centroids that lie in the border region of the image.  Assign weights based on the inter- cluster distance giving more weight to centroids that are further away from others There are many possible weighing schemes one can think of. In (2) a mixed approach accounting for area (background favored) and centroid position (border favored) is chosen. WEIGHING Practical tests with the implemented CBIR system indicate that the border/background favored weighing scheme generates superior results to the equal-, highlight-, and center- favored weighing schemes. The biggest subjective gain in retrieval performance comes from the weighing with the area percentage while only a small gain in performance results from the applying of the border favored weighing scheme. Also it turns out that the shape features tend to become an issue of robustness if given too much weight. While helping to find images with very similar contents taken under different lighting conditions they also produce results that show little similarity compared to the standards of a human observer. Therefore, an additional weighing step is applied before comparing the centroids using similarity measure (eq. (9)) by simply multiplying the different dimensions of the icˆ with a linear scaling factor. A scaling of 1 for the color Bands and 0.5 for the shape features has proven subjectively superior results to an equal weighing scheme. EXPERIMENTS A reference system has been implemented using JAVA 1.6. Only native JAVA functionality was used so it should run without problems under different operating systems although it was only tested on a Celeron 1.3 GHz/1.2GB RAM and a Pentium Dual Core 3 GHz/1GB Ram both Systems running Java 1.6.0 under a Windows XP operating system. The test database used contained 772 Images taken by the author containing a mix of several broad classes as indoor, outdoor, portrait, landscape, mountains, sea and beach under different lighting conditions (nearly all day/night times, artificial lighting). The pictures all were scaled down to a resolution of 384x265 or 265x384 respectively. Only a subjective evaluation has been done as it is a quite hard task obtaining ground truth for a CBIR system (5). In the first step a database was created calculating the segmentation and storing the relevant information (less than 1kb per file). A time-consuming step that luckily only has to be done once. On the Celeron 1.3GHz system the calculation time per image was approx. 5.3 sec. (vs. 4sec on the 3GHz proc.) In Figure 3 you can see a screenshot of the running system. The user chooses a query image either by selecting a file or using the Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52 51 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên query this button which facilitates browsing through the image DB. If the selected file is not in the database, it is first processed and then compared against the database. Figure 3: CBIR System implemented using Java 1.6 The result is returned as an ordered list showing the rating generated by the similarity measure. Although the image has to be compared against the whole database the query process is quite fast with ~0.1sec on the Celeron 1.3GHz. Note that no optimized underlying data structure has been used for the database. In Figure 4 you can see a result set produced when queried with the image in the upper left corner. The system managed to find 8 of 10 images in the DB taken from a sunrise at Cat Ba Island. One image of the set was taken in during sunset which is perfectly valid to be included in the result while 2 images would represent the category (day, beach) which is not really appropriate and one image found (rightmost, 2nd row) is obviously wrong to be included from a semantical point of view. Figure 4: 12 first query results, query image is in the upper left corner, similarity rankings from [1..0.94] CONCLUSION For an approach that does not use any semantical knowledge surprisingly good results were produced distinguishing broad categories like sunrise, sea or mountain. But the semantical gap still needs to be bridged for eliminating obvious fail classifications that are produced due to similar color- configurations in images that are from a human viewpoint have very little in common. An approach that uses classification techniques like Support Vector Machines or ADA Boost could drastically improve performance if it was possible to extract meaningful information (like people, indoor, buildings, etc.) out of an image in a first classification step. REFERENCES [1]. Symon D’O. Cotton, School of Computer Science, University of Birmingham, England, (1995), Colour, colour spaces and the human visual system, Technical report. [2]. Cheng Chang, Liu Wenyin, Hongjiang Zhang, (2001), Image retrieval based on region shape similarity, Proceedings of SPIE, the International Society for Optical Engineering ISSN 0277-786X, vol. 4315, pp. 31-38. [3].Julia Vogel and Bernt, (2002), On Performance Characterization and Optimization for Image Retrieval, Proceedings of 7th European Conference on Computer Vision, Springer, pp. 49-63. [4]. Wong, J. A. Hartigan and M. A., (1979), Algorithm AS136: A k- means Clustering Algorithm. Applied Statistics, vol. 28, pp. 100-108. [5]. Yixin Chen, Department of Computer Science and Engineering, The Pennsylvania State University, (June 2003), A Machine Learning Approach to Content-Based Image Indexing and Retrieval, Ph.D. Dissertation. [6]. JPEG. t81.pdf. [Online]. AN APPROACH TO CBIR USING K-MEANS CLUSTERING AND POLYGON BASED SHAPE FEATURES Dominic Mai, Nguyễn Văn Tới Khoa Công nghệ thông tin – ĐH Thái Nguyên Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52 52 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên TÓM TẮT Trong bài báo này, một cách tiếp cận cho tra cứu ảnh dựa vào nội dung hình ảnh (CBIR) được khảo sát. Phương pháp này sử dụng thuật toán phân lớp dữ liệu K-mean để phân đoạn ảnh sau đó trích rút các đặc trưng toàn cục và đặc trưng địa phương bằng cách sử dụng thông tin màu sắc và hình dạng trên các vùng để tính toán mức độ tương tự giữa các hình ảnh. Mặc dù màu sắc là nguồn thông tin chính được sử dụng cho việc tạo các cụm nhưng hình dạng của các vùng cũng được đưa vào trong quá trình tra cứu ảnh. Một cách biểu diễn mờ cho các đặc trưng được lựa chọn, nó phù hợp với sự không rõ ràng của việc phân đoạn ảnh. Từ khóa: phân lớp dữ liệu, K-mean , đặc trưng toàn cục, đặc trưng địa phương, phân đoạn ảnh

Các file đính kèm theo tài liệu này:

brief_3834_9779_anapproachtocbirusingkmeansclustering_516_2052815.pdf