ikmeans.h File Reference

Integer K-Means clustering. More...

#include "generic.h"

Go to the source code of this file.

Data Structures

struct  _VlIKMFilt
 Integer K-Means filter. More...

Typedefs

typedef vl_int32 vl_ikm_acc
typedef vl_uint8 vl_ikm_data

Enumerations

enum  VlIKMAlgorithms { VL_IKM_LLOYD, VL_IKM_ELKAN }
 IKM algorithms. More...

Functions

Create and destroy
VlIKMFiltvl_ikm_new (int method)
 Create a new IKM quantizer.
void vl_ikm_delete (VlIKMFilt *f)
 Delete IKM qunatizer.
Process data
void vl_ikm_init (VlIKMFilt *f, vl_ikm_acc const *centers, int M, int K)
 Inintialize quantizer with centers.
void vl_ikm_init_rand (VlIKMFilt *f, int M, int K)
 Inintialize quantizer with random centers.
void vl_ikm_init_rand_data (VlIKMFilt *f, vl_ikm_data const *data, int M, int N, int K)
 Inintialize with centers from random data.
int vl_ikm_train (VlIKMFilt *f, vl_ikm_data const *data, int N)
 Train clusters.
void vl_ikm_push (VlIKMFilt *f, vl_uint *asgn, vl_ikm_data const *data, int N)
 Project data to clusters.
vl_uint vl_ikm_push_one (vl_ikm_acc const *centers, vl_ikm_data const *data, int M, int K)
 Project one datum to clusters.
Retrieve data and parameters
VL_INLINE int vl_ikm_get_ndims (VlIKMFilt const *f)
 Get data dimensionality.
VL_INLINE int vl_ikm_get_K (VlIKMFilt const *f)
 Get the number of centers K.
VL_INLINE int vl_ikm_get_verbosity (VlIKMFilt const *f)
 Get verbosity level.
VL_INLINE int vl_ikm_get_max_niters (VlIKMFilt const *f)
 Get maximum number of iterations.
VL_INLINE vl_ikm_acc const * vl_ikm_get_centers (VlIKMFilt const *f)
 Get maximum number of iterations.
Set parameters
VL_INLINE void vl_ikm_set_verbosity (VlIKMFilt *f, int verb)
 Set verbosity level.
VL_INLINE void vl_ikm_set_max_niters (VlIKMFilt *f, int max_niters)
 Set maximum number of iterations.


Detailed Description

Integer K-means (IKM) is an implementation of K-means clustering (or vector quantization, VQ) for integer data. This is particuarlry useful for clustering large collections of visual descriptors.

Use the function vl_ikm_new() to create a IKM quantizer. Initialize the IKM quantizer with K clusters by vl_ikm_init() or similar function. Use vl_ikm_train() to train the quantizer. Use vl_ikm_push() or vl_ikm_push_one() to quantize new data.

Given data $x_1,\dots,x_N\in R^d$ and an a number of clusters $K$, the goal is to find assigments $a_i\in\{1,\dots,K\},$ and centers $c_1,\dots,c_K\in R^d$ so that the expected distortion

\[ E(\{a_{i}, c_j\}) = \frac{1}{N} \sum_{i=1}^N d(x_i, c_{a_i}) \]

is minimized. Here $d(x_i, c_{a_i})$ is the distortion, i.e. the cost we pay for representing $ x_i $ by $ c_{a_i} $. IKM uses the squared distortion $d(x,y)=\|x-y\|^2_2$.

Algorithms

Initialization

Most K-means algorithms are iterative and needs an initalization in the form of an initial choice of the centers $c_1,\dots,c_K$. We include the following options:

Lloyd

The Lloyd (also known as Lloyd-Max and LBG) algorithm iteratively:

This algorithm is not particularly efficient because all data points need to be compared to all centers, for a complexity $O(dNKT)$, where T is the total number of iterations.

Elkan

Elkan algorithm is an optimized variant of Lloyd. By making use of the triangle inequality, many comparisons of data points and centers are avoided, especially at later iterations. Usually 4-5 times less comparisons than Lloyd are preformed, determining a similar speedup in the execution time.

Author:
Brian Fulkerson

Andrea Vedaldi


Typedef Documentation

typedef vl_int32 vl_ikm_acc

IKM accumulator data type

typedef vl_uint8 vl_ikm_data

IKM data type


Enumeration Type Documentation

enum VlIKMAlgorithms

Enumerator:
VL_IKM_LLOYD  Lloyd algorithm
VL_IKM_ELKAN  Elkan algorithm


Function Documentation

void vl_ikm_delete ( VlIKMFilt f  ) 

Parameters:
f IKM qunatizer.

VL_INLINE vl_ikm_acc const * vl_ikm_get_centers ( VlIKMFilt const *  f  ) 

Parameters:
f IKM filter.
Returns:
maximum number of iterations.

VL_INLINE int vl_ikm_get_K ( VlIKMFilt const *  f  ) 

Parameters:
f IKM filter.
Returns:
number of centers K.

VL_INLINE int vl_ikm_get_max_niters ( VlIKMFilt const *  f  ) 

Parameters:
f IKM filter.
Returns:
maximum number of iterations.

VL_INLINE int vl_ikm_get_ndims ( VlIKMFilt const *  f  ) 

Parameters:
f IKM filter.
Returns:
data dimensionality.

VL_INLINE int vl_ikm_get_verbosity ( VlIKMFilt const *  f  ) 

Parameters:
f IKM filter.
Returns:
verbosity level.

void vl_ikm_init ( VlIKMFilt f,
vl_ikm_acc const *  centers,
int  M,
int  K 
)

Parameters:
f IKM quantizer.
centers centers.
M data dimensionality.
K number of clusters.

void vl_ikm_init_rand ( VlIKMFilt f,
int  M,
int  K 
)

Parameters:
f IKM quantizer.
M data dimensionality.
K number of clusters.

void vl_ikm_init_rand_data ( VlIKMFilt f,
vl_ikm_data const *  data,
int  M,
int  N,
int  K 
)

Parameters:
f IKM quantizer.
data data.
M data dimensionality.
N number of data.
K number of clusters.

VlIKMFilt* vl_ikm_new ( int  method  ) 

Parameters:
method Clustering algorithm.
The function allocates initializes a new IKM quantizer to operate based algorithm method.

method has values in the enumerations VlIKMAlgorithms.

Returns:
new IKM qunatizer.

void vl_ikm_push ( VlIKMFilt f,
vl_uint asgn,
vl_ikm_data const *  data,
int  N 
)

Parameters:
f IKM qunatizer.
asgn Assigments (out).
data data.
N number of data (N >= 1).
The function projects the data data on the integer K-kmeans clusters specified by the IKM quantizer f. Notice that the quantizer must be initialized.

vl_uint vl_ikm_push_one ( vl_ikm_acc const *  centers,
vl_ikm_data const *  data,
int  M,
int  K 
)

Parameters:
centers centers.
data datum to project.
K number of centers.
M dimensionality of the datum.
The function projects the specified datum data on the clusters specified by the centers centers.

Returns:
the cluster index.

VL_INLINE void vl_ikm_set_max_niters ( VlIKMFilt f,
int  max_niters 
)

Parameters:
f IKM filter.
max_niters maximum number of iterations.

VL_INLINE void vl_ikm_set_verbosity ( VlIKMFilt f,
int  verb 
)

Parameters:
f IKM filter.
verb verbosity level.

int vl_ikm_train ( VlIKMFilt f,
vl_ikm_data const *  data,
int  N 
)

Parameters:
f IKM qunatizer.
data data.
N number of data (N >= 1).
Returns:
-1 if an overflow may have occured.


Generated on Mon Jan 21 17:43:32 2008 for vlfeat by  doxygen 1.5.4