An approach to cbir using k-Means clustering and polygon based shape features - Dominic Mai
For an approach that does not use any
semantical knowledge surprisingly good
results were produced distinguishing broad
categories like sunrise, sea or mountain. But
the semantical gap still needs to be bridged
for eliminating obvious fail classifications
that are produced due to similar colorconfigurations in images that are from a
human viewpoint have very little in
common.
An approach that uses classification
techniques like Support Vector Machines or
ADA Boost could drastically improve
performance if it was possible to extract
meaningful information (like people, indoor,
buildings, etc.) out of an image in a first
classification step.
8 trang |
Chia sẻ: thucuc2301 | Lượt xem: 521 | Lượt tải: 0
Bạn đang xem nội dung tài liệu An approach to cbir using k-Means clustering and polygon based shape features - Dominic Mai, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52
45
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
AN APPROACH TO CBIR USING K-MEANS CLUSTERING
AND POLYGON BASED SHAPE FEATURES
Dominic Mai, Toi Nguyen Van
Faculty of information Technology Thai Nguyen University
ABSTRACT
In this paper an approach to Content Based Image Retrieval (CBIR) is examined that uses K-
means clustering for segmenting an image and then extracts global and local features using
color and shape information over the extracted regions to compute a similarity measure between
images. Although color is used as main information source for creating the clusters that an image
is formed of, shape factors of the separated regions will be taken into account for the retrieval
process as color is heavily dependent on the lighting of a scene. A fuzzy representation of
features is chosen that suits the blurry nature of image segmentation.
Từ khóa: Shape information, K-means, global feature, local feature, segment an image
*
INTRODUCTION
In 2006 over 300 million photos were
uploaded to flickr, one of the biggest photo
sharing communities on the internet. This
number is just for illustrating the fact that the
number of pictures stored in electronic
databases is increasing rapidly and the task of
efficiently retrieving pictures stored in such a
database is becoming more and more
important. Text based search techniques can
only be applied if pictures have been assigned
meaningful labels describing semantic entities
like people, outdoor scene, etc.
Unfortunately, understanding a picture in the
way humans do is a very hard task that is not
yet solved by automated algorithms – this is
also known as the semantic gap.
As most of pictures do not come labeled or
inside a labeled context (e.g. a website)
automatic retrieval of images cannot be done
using text retrieval techniques.
Luckily, it is not necessary to semantically
understand a picture for performing a
satisfying retrieval on a database. In this
paper an approach to Content Based Image
Retrieval (CBIR) is examined that uses K-
means clustering for segmenting an image
and then extracts global and local features
using color and shape information over the
extracted regions to compute a similarity
measure between images.
Although color is used as main information
source for creating the clusters that an image
is formed of, it will not play a major role
*
Tel: 0912847077
for the retrieval process as the concept of
color is quite difficult [1] – just think of the
different white balance settings of your
camera you have to make when shooting
under different lighting conditions to make
white appear as white. Therefore the shape
of the found regions is analyzed using a
polygonal representation of the shape’s
boundary which is not dependent on color
information.
To account for the inherent uncertainty
coming with the applied segmentation
process – a good segmentation cannot be
achieved without semantically understanding
the image – a fuzzy representation of the
regions is applied.
A reference System has been implemented
using JAVA.
RELATED WORK
This paper is mainly based on the ideas for
segmentation and similarity computation
presented in [3]. In the process of analyzing
the image, it is partitioned into blocks of
4x4 which contain average color values
over the three bands in the RGB color
model. These blocks (Blockfeatures) are then
clustered using an iterative K-means [2] with
a small refinement that leads to superior
performance over randomly choosing the
initial point set.
A Cauchy membership function is used to
account for fuzziness in assigning
Blockfeatures to their respective clusters. It
turns out that only the center points of a
cluster are needed for the comparison
measure applied which leads to very little
Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52
46
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
computation costs concerning memory and
processing time.
Although proposed in [3] texture features
yielded by extracting high frequency
coefficients of an orthogonal transformation
on the Y layer in YUV color space were not
included as they yielded counterproductive
results considering the task of clustering.
Three shape features are computed over the
reconstructed boundary of a region which is
represented as a polygon. These features are
compactness, solidity and eccentricity as
proposed in [2], although a different
approach to the computation of solidity is
computed which was more suitable on the
available data. Polygonal based feature
extraction has shown superior results to the
mere computation of normalized inertia of
different orders.
FEATURE EXTRACTION
The first step in the processing pipe of an
image is the extraction of features describing
quadratic regions in the image. A size of 4x4
has proven a good tradeoff between the
conservation of detail and the enhancement of
processing time. In this preprocessing step the
3 color bands red green and blue of the 16
pixels are averaged to yield an
3R - feature
vector (normalized to [0 . . 1] ) which is used
for the K-means clustering algorithm. This
averaging is equal to computing the DC
coefficient of a discrete cosine
transformation which is the most important
for representing this Block of image data [6].
The loss of detailed structure is not severe as
we will take account for blurry boundaries
with a fuzzy assignment of these
Blockfeatures to their clusters. The
advantages are more robustness against noise
and a drastically reduced computation time
over working with single pixels.
We will not use texture features based on
higher order coefficients of an orthogonal
base transformation (e.g. Wavelet or DCT) as
their implications are counterintuitive to our
task: While they work very good on
extracting boundaries they have not been
useful in distinguishing between different
textures at object level. (This might be due to
the relatively small area covered by the
Blockfeature, so additional research is
required).
Also position information through x/y pixel
coordinates have not been included as they
tend to produce connected compact1 regions:
Blockfeatures that are further away from the
center point of the cluster will receive a
penalty proportional to their distance to this
center. Although it is usually true that
semantic entities of a picture also form
connected pixel-clusters, this has not proven
useful with the used similarity measure.
K-MEANS CLUSTERING
The key idea of this CBIR approach is that
similar semantic entities in different
pictures will consist of pixels that are
similar in color and have a sufficient
dissimilarity from the background (this of
course does not hold for all classes – e.g.
think of a camouflaged animal). Put in other
words, pixels that belong to an Object will
yield Blockfeatures that are close together in
respect to a distance measure in the
underlying feature space. The distance
measure we will use is the Euclidean distance
which can be calculated through formula (1)
in any metric vector space:
2
),( ii babadist (1)
In 2 respectively 3 dimensions this
resembles our intuitive notion of distance
and therefore also is an appropriate choice
for higher dimensions.
The K-means clustering algorithm is an
iterative technique that tries to minimize
intra cluster variance on a predefined number
of clusters. Put in other words, the algorithm
tries to minimize the distance between
elements of a cluster and its representative
center, also called centroid.
The algorithm first presented in [2] works
like this:
k-means(data[], k)
1. randomly chose k centroids out of data[]
2. Assign each point of data[] the centroid
which is the closest in respect to the distance
measure
3. Re-compute the centroids locations by
Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52
47
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
averaging over the elements of a cluster
4. Stop when number of changes in
assignment step drops below a threshold
threshkm
The algorithm is well studied and works fast
but due to the random nature of the input
data, it doesn’t produce the same results in
every run and also it is not assured to provide
a global optimum although it is sure to
converge to a solution once initialized.
But the major drawback is that the number of
clusters needs to be specified in advance – but
this is something we usually don’t know in
advance as it is dependent on the contents in
the analyzed image. Therefore an iterative
approach is chosen which gradually increases
the number of clusters and computes a
quality measure on the found clustering.
The quality measure applied is the average
distance of the elements to their centroid
which is a good indicator if further clustering
would benefit the task. Two stop criteria are
applied: Either when the quality measure
drops below a specified threshold abs or
when there is no significant change in two
iteration steps (quality difference rel ) .
Luckily K-means clustering is fast, so it is
possible to start with k=2 and gradually
increase the number of clusters. Empirical
results show that normally no more than 8
clusters are being created.
For the iterative K-means a slight change has
been made for the initialization step which
reduced the computation time compared to
the random initialization: only the new cluster
is initialized by chance, the others are taken
from the previous segmentation result.
To the overall segmentation process using the
k- means algorithm three thresholds have to
be set which work as abort criterion for the
k-means ( threshkm ) or iterative k-means
( absrel , ) algorithm respectively.
It was not trivial to find an appropriate
setting for the used picture set as a good
segmentation is necessary for good retrieval
performance. Parameters were set to
10threshkm
005.0 absrel
which resulted in a maximum of 8 regions per
image and a computation time of ~1 sec on a
2GHZ processor.
The centroids ĉi of the clusters and the
average inter centroid difference are
computed using equation (2) where C denotes
the number of clusters.
1
0 1
)ˆ,ˆ(
)1(*
2 c
i
c
ik
kif ccdist
cc
d (2)
Note that due to the nature of the K–means
clustering the centroid holds the average
color values for its region and also is the
regions center off mass.
SHAPE FEATURES
To be not only dependent on the color
information three shape features are being
computed over the found regions:
1. Compactness
2. Solidity
3. Eccentricity
In a first step the polygonal representation of
the 2- dimensional boundary of the
Blockfeatures belonging to one region is
extracted. Remember that the Blockfeature
though being part of an arbitrary high
dimensional feature space have their origin in
the 2- dimensional pixel space they were
originally created from. As the 2-dimensional
data is likely to be scattered the exact
reconstruction of the boundary is a hard task
as it is not easy to determine if a Blockfeature
is inside the object or part of its border when
the Blockfeatures do not form a connected
region in 2-d picture space.
Luckily exact reconstruction is not necessary
as we only need consistent result over similar
“looking” regions identified by the K-means
algorithm. The algorithm for constructing the
polygonal shape is inspired by the way a radar
sweep works: A beam is fired from the shapes
center of mass and in every position the point
which lies on the beam and is most far away
from the center is added to the polygon. This
algorithm works in time linear to the desired
# of desired points for the polygon.
With this strategy we are able to reconstruct
Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52
48
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
every convex shape and also star-like concave
polygonal shapes (intuitive proof idea: In a
star every point of the boundary is visible
from the center). With these 2 classes a wide
variety of possible shapes is covered and
furthermore it is not too important as clusters
might appear mixed when projected into 2-d
picture space. Therefore we will account for
this fact with the solidity measure that relates
the Area covered by the Polygon to the Area
obtained by counting the Blockfeatures that
belong to this region (eq. (3)).
real
polygon
A
A
Solidity (3)
The Area covered by a polygon can be
calculated in time linear to the number of
vertexes. The compactness relates the Area of
the polygon to its perimeter, obtaining a
measure that indicates the equality of the
examined shape to a circle where 1 would
mean that the examined shape has perfect
equality (eq. (4)).
P
A
scompactnes
4
(4)
The last shape feature that we use is the
eccentricity which relates the short axis to the
long axis of the best fitting ellipse. It can be
computed using the (p, q) central order
moments (eq. (5)). See [2] for details on
computation.
2
11
2
02200220
2
11
2
02200220
4)(
4)(
uuuuu
uuuuu
ecc
(5)
Finally we yield a feature vector for a region
which is element of [0. .1]
6
consistin of three
color features which represent the average
color of the region in the RGB color bands
and the above described features reflecting
properties of the shape the region has. Note
that all described shape features are invariant
to translation, rotation and scaling.
FUZZY REPRESENTATION
Due to the inherent insecurity in the
segmentation process a fuzzy assignment of
Blockfeatures to their respective centers is
chosen. This means that every Bockfeature
belongs to every region with a certain degree
based on its distance to the region’s centroid.
A membership function is chosen that maps
the distance of a Blockfeature to its centroid
to a membership degree in [0. .1]. There is a
wide variety of functions yielding such a
measure; we chose the Cauchy function (eq.
(6)) due to its simplicity and computational
advantages.
d
cxdist
C
i
i
)ˆ,(
1
1 (6)
The Cauchy function does a mapping of the
distance of a Blockfeature to the centroid
icˆ . It has two tunable parameters },{ d
which determine the degree of fuzziness. In
other words distances until a certain threshold
are nearly perfectly mapped to one of the
centers (value close to one) and distances
bigger than this threshold are mapped on a
smooth transition approaching 0 quickly.
Figure 1: Cauchy function with 5,0 d
In Figure 1 a Cauchy function is plotted with
the indicated settings for and d: d gives
the distance which is mapped to 0.5 and
gives the smoothness of the transition. Bigger
values for result in a fast jump to zero
(small degree of fuzziness) while smaller
values indicate a slowly transition to zero
(big degree of fuzziness).
In our system d is set to the average inter
cluster difference (eq. (2)) and 5 have
shown good practical results.
Luckily it is not necessary to store the set of
membership-degree to each centroid for every
Blockfeature as it turns out that due to the
character of the Cauchy function and the
similarity measure we will apply, only the
centroids
icˆ and the average inter cluster
Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52
49
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
difference will need to be stored.
Note that the same ideas explained for the
color features here hold for the shape features
computed for each region: Due to the
uncertainty in segmentation we will allow a
certain distance between the shape features
that is considered as equal.
SIMILARITY MEASURE FOR REGIONS
In this section we will introduce a similarity
measure for fuzzy regions. In the next
section a comparison scheme is introduced
to compare whole images based on the
similarity of their regions.
At first we need to define the intersection
operation on fuzzy sets, which will be the base
for our similaritymeasure. Again we chose a
computationally easy approach which
maintains our intuitive notion of intersection
between two regions. Recall that in standard
set theory an element belongs to the
intersection of two sets iff it belongs to set A
and set B.
)(),(min)( xCxCxC BABA (7)
The intersection of two fuzzy sets is given by
a membership function (eq. (7)) of an element
[0..1]
6
that indicates its belonging to region
A and region B. Based on this definition we
now can define a similarity measure between
two regions represented by their respective
centroids ACˆ and BCˆ .
)(sup),(
6]1..0[
xCBAsim BA
x
(8)
This similarity measure has the big advantage
that due to the unimodality of our
membership function the element x which
defines the supremum must lie on a line
connecting the centroids representing these
two regions. Therefore we can compute the
similarity measure using eq. (9)
),()(
)(
),(
BAdistdd
dd
BAsim
ba
ba
(9)
Details on the derivation process can be
found in (2). Note that the parameters id
and correspond to the average inter
cluster difference and fuzziness parameter
introduced in the previous section.
COMPARING IMAGES
Based on the similarity measure introduced in
the previous section we can now develop an
algorithm for comparing two images and
computing a similarity measure in [0..1]. The
clustering produced by the K-means
algorithm is dependent on the image
contents and the choice of thresholds for
abortion, therefore it cannot be guaranteed
that similar objects are separated in the same
way, producing the same number of clusters.
This fact is illustrated in Figure 2 showing
the results of different segmentations of the
same image due to different settings of
thresholds.
Figure 2: Segmentation of the same image with
different thresholds for K-means
005.0abs yields 4 clusters (middle);
05.0abs yields 2 clusters (right)
(Red dots indicate the positions of the icˆ when
projected to 2D picture space)
To account for the fact that usually we have to
compare images that are segmented into
different numbers of clusters and that these
clusters always come in random order, we
make pair wise comparisons between each
pair of clusters. This yields a similarity vector
which has 21 KK entries where iK is the
number of clusters found by the K-means
segmentation step. The first elements
correspond to the regions 1..1 K of the first
image, the second half of the similarity vector
corresponds to the regions 2..1 K . For every
entry the similarity of this region with all
regions of the other image is computed using
eq. (9). The highest similarity value is stored
in the similarity vector.
In this example a correspondence of [0.999,
0.991] has been found for comparing the right
image to the middle one which eventually
leads to a similarity of 0.97. This justifies the
approach and shows that the clustering
algorithm yields comparable results even
though the clusters might look quite different
at first glance.
Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52
50
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
In a second step the similarity vector is
collapsed to a value in [0. .1] by a scalar
multiplication with a weight vector . We
observe that each entry in the similarity
vector i ]1..0[ , 1 meaning that a perfect
match for a region has been found in the
other image. For assuring that the final
similarity measure also is in [0. .1], the
weight vector must fulfill the constraint
given through eq.(10).
1 i (10)
The easiest way to perform the weighing is
to assign equal weights to all the components
of , which already leads to usable results
but favors the image which has been
segmented into more regions.
There are several possible ways one can think
of as a weighing scheme, here are some
possible examples:
Equal weighing to all regions
Account for different number of
regions in the images
Use area covered by the regions as an
importance measure for assigning more
weight to bigger areas (background favored)
or to smaller areas (highlight favored)
Account for the position of the
centroid icˆ and assign higher weights to
centroids that lie in the center region of the
image (center favored) or apply higher
weights to centroids that lie in the border
region of the image.
Assign weights based on the inter-
cluster distance giving more weight to
centroids that are further away from others
There are many possible weighing schemes
one can think of. In (2) a mixed approach
accounting for area (background favored) and
centroid position (border favored) is chosen.
WEIGHING
Practical tests with the implemented CBIR
system indicate that the border/background
favored weighing scheme generates superior
results to the equal-, highlight-, and center-
favored weighing schemes. The biggest
subjective gain in retrieval performance
comes from the weighing with the area
percentage while only a small gain in
performance results from the applying of the
border favored weighing scheme.
Also it turns out that the shape features tend
to become an issue of robustness if given too
much weight. While helping to find images
with very similar contents taken under
different lighting conditions they also
produce results that show little similarity
compared to the standards of a human
observer. Therefore, an additional weighing
step is applied before comparing the centroids
using similarity measure (eq. (9)) by simply
multiplying the different dimensions of the
icˆ with a linear scaling factor. A scaling of 1
for the color Bands and 0.5 for the shape
features has proven subjectively superior
results to an equal weighing scheme.
EXPERIMENTS
A reference system has been implemented
using JAVA 1.6. Only native JAVA
functionality was used so it should run
without problems under different operating
systems although it was only tested on a
Celeron 1.3 GHz/1.2GB RAM and a Pentium
Dual Core 3 GHz/1GB Ram both Systems
running Java 1.6.0 under a Windows XP
operating system. The test database used
contained 772 Images taken by the author
containing a mix of several broad classes as
indoor, outdoor, portrait, landscape,
mountains, sea and beach under different
lighting conditions (nearly all day/night times,
artificial lighting). The pictures all were scaled
down to a resolution of 384x265 or 265x384
respectively. Only a subjective evaluation has
been done as it is a quite hard task obtaining
ground truth for a CBIR system (5).
In the first step a database was created
calculating the segmentation and storing the
relevant information (less than 1kb per file).
A time-consuming step that luckily only has
to be done once. On the Celeron 1.3GHz
system the calculation time per image was
approx. 5.3 sec. (vs. 4sec on the 3GHz proc.)
In Figure 3 you can see a screenshot of the
running system. The user chooses a query
image either by selecting a file or using the
Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52
51
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
query this button which facilitates browsing
through the image DB. If the selected file is
not in the database, it is first processed and
then compared against the database.
Figure 3: CBIR System implemented using Java 1.6
The result is returned as an ordered list
showing the rating generated by the
similarity measure. Although the image has
to be compared against the whole database
the query process is quite fast with ~0.1sec on
the Celeron 1.3GHz. Note that no optimized
underlying data structure has been used for
the database.
In Figure 4 you can see a result set produced
when queried with the image in the upper left
corner. The system managed to find 8 of 10
images in the DB taken from a sunrise at Cat
Ba Island. One image of the set was taken in
during sunset which is perfectly valid to be
included in the result while 2 images would
represent the category (day, beach) which is
not really appropriate and one image found
(rightmost, 2nd row) is obviously wrong to
be included from a semantical point of view.
Figure 4: 12 first query results, query image is in the
upper left corner, similarity rankings from [1..0.94]
CONCLUSION
For an approach that does not use any
semantical knowledge surprisingly good
results were produced distinguishing broad
categories like sunrise, sea or mountain. But
the semantical gap still needs to be bridged
for eliminating obvious fail classifications
that are produced due to similar color-
configurations in images that are from a
human viewpoint have very little in
common.
An approach that uses classification
techniques like Support Vector Machines or
ADA Boost could drastically improve
performance if it was possible to extract
meaningful information (like people, indoor,
buildings, etc.) out of an image in a first
classification step.
REFERENCES
[1]. Symon D’O. Cotton, School of Computer
Science, University of Birmingham, England,
(1995), Colour, colour spaces and the human
visual system, Technical report.
[2]. Cheng Chang, Liu Wenyin, Hongjiang Zhang,
(2001), Image retrieval based on region shape
similarity, Proceedings of SPIE, the International
Society for Optical Engineering ISSN 0277-786X,
vol. 4315, pp. 31-38.
[3].Julia Vogel and Bernt, (2002), On Performance
Characterization and Optimization for Image
Retrieval, Proceedings of 7th European Conference
on Computer Vision, Springer, pp. 49-63.
[4]. Wong, J. A. Hartigan and M. A., (1979),
Algorithm AS136: A k- means Clustering
Algorithm. Applied Statistics, vol. 28, pp. 100-108.
[5]. Yixin Chen, Department of Computer Science
and Engineering, The Pennsylvania State
University, (June 2003), A Machine Learning
Approach to Content-Based Image Indexing and
Retrieval, Ph.D. Dissertation.
[6]. JPEG.
t81.pdf. [Online].
AN APPROACH TO CBIR USING K-MEANS CLUSTERING
AND POLYGON BASED SHAPE FEATURES
Dominic Mai, Nguyễn Văn Tới
Khoa Công nghệ thông tin – ĐH Thái Nguyên
Dominic Mai Tạp chí KHOA HỌC & CÔNG NGHỆ 64(02): 45 - 52
52
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên
TÓM TẮT
Trong bài báo này, một cách tiếp cận cho tra cứu ảnh dựa vào nội dung hình ảnh (CBIR) được
khảo sát. Phương pháp này sử dụng thuật toán phân lớp dữ liệu K-mean để phân đoạn ảnh sau đó
trích rút các đặc trưng toàn cục và đặc trưng địa phương bằng cách sử dụng thông tin màu sắc và
hình dạng trên các vùng để tính toán mức độ tương tự giữa các hình ảnh. Mặc dù màu sắc là nguồn
thông tin chính được sử dụng cho việc tạo các cụm nhưng hình dạng của các vùng cũng được đưa
vào trong quá trình tra cứu ảnh. Một cách biểu diễn mờ cho các đặc trưng được lựa chọn, nó phù
hợp với sự không rõ ràng của việc phân đoạn ảnh.
Từ khóa: phân lớp dữ liệu, K-mean , đặc trưng toàn cục, đặc trưng địa phương, phân đoạn ảnh
Các file đính kèm theo tài liệu này:
- brief_3834_9779_anapproachtocbirusingkmeansclustering_516_2052815.pdf