sift.h File Reference

Scale Invariant Feature Transform (SIFT). More...

#include <stdio.h>
#include "generic.h"

Go to the source code of this file.

Data Structures

struct  _VlSiftKeypoint
 SIFT filter keypoint. More...
struct  _VlSiftFilt
 SIFT filter. More...

Typedefs

typedef vl_single vl_sift_pix
 SIFT filter pixel type.

Functions

Create and destroy
VlSiftFiltvl_sift_new (int width, int height, int O, int S, int o_min)
 Create a new SIFT filter.
void vl_sift_delete (VlSiftFilt *f)
 Delete SIFT filter.
Process data
int vl_sift_process_first_octave (VlSiftFilt *f, vl_sift_pix const *im)
 Start processing a new image.
int vl_sift_process_next_octave (VlSiftFilt *f)
 Process next octave.
void vl_sift_detect (VlSiftFilt *f)
 Detect keypoints.
int vl_sift_calc_keypoint_orientations (VlSiftFilt *f, double angles[4], VlSiftKeypoint const *k)
 Calculate the keypoint orientation(s).
void vl_sift_calc_keypoint_descriptor (VlSiftFilt *f, vl_sift_pix *descr, VlSiftKeypoint const *k, double angle)
 Compute the descriptor of a keypoint.
void vl_sift_keypoint_init (VlSiftFilt const *f, VlSiftKeypoint *k, double x, double y, double sigma)
 Initialize a keypoint from its position and scale.
Retrieve data and parameters
VL_INLINE int vl_sift_get_octave_index (VlSiftFilt const *f)
 Get current octave index.
VL_INLINE int vl_sift_get_octave_num (VlSiftFilt const *f)
 Get number of octaves.
VL_INLINE int vl_sift_get_octave_first (VlSiftFilt const *f)
 Get first octave.
VL_INLINE int vl_sift_get_octave_width (VlSiftFilt const *f)
 Get current octave width.
VL_INLINE int vl_sift_get_octave_height (VlSiftFilt const *f)
 Get current octave height.
VL_INLINE int vl_sift_get_level_num (VlSiftFilt const *f)
 Get number of levels per octave.
VL_INLINE int vl_sift_get_keypoints_num (VlSiftFilt const *f)
 Get number of keypoints.
VL_INLINE double vl_sift_get_peak_thresh (VlSiftFilt const *f)
 Get peaks treashold.
VL_INLINE double vl_sift_get_edge_thresh (VlSiftFilt const *f)
 Get edges threshold.
VL_INLINE double vl_sift_get_norm_thresh (VlSiftFilt const *f)
 Get norm threshold.
VL_INLINE vl_sift_pixvl_sift_get_octave (VlSiftFilt const *f, int s)
 Get current octave data.
VL_INLINE VlSiftKeypoint const * vl_sift_get_keypoints (VlSiftFilt const *f)
 Get keypoints.
Set parameters
VL_INLINE void vl_sift_set_peak_thresh (VlSiftFilt *f, double t)
 Set peaks threshold.
VL_INLINE void vl_sift_set_edge_thresh (VlSiftFilt *f, double t)
 Set edges threshold.
VL_INLINE void vl_sift_set_norm_thresh (VlSiftFilt *f, double t)
 Set norm threshold.


Detailed Description

Author:
Andrea Vedaldi
Running the SIFT filter usually involves the following steps:

SIFT scale space

SIFT keypoints are computed at different scales. To this end, the image is projected in a Gaussian scale space by convoving it by isotropic Gaussian kernels of increasing variance. This results in a somewhat complex structure, illustrated next:

sift-ss.png

parameter alt. name standard value set by
$O$ O as big as possible vl_sift_new()
$o_{\mathrm{min}}$ o_min -1 vl_sift_new()
$S$ S 3 vl_sift_new()

SIFT detector

The SIFT frames (keypoints) are extracted based on peaks (local extrema) of the DoG scale space. Peaks are searched in a neighborhood of 3x3x3 samples (in space and scale). The previous figure shows the scale levels interested by this search (they are the ones at the intersection of two green arrows). Peaks are then quadratically interpolated. Finally they are filtered and the orientation(s) is computed as explained in the next sections.

Peak threshold

Peaks too short may be generated by noise and are discarded. This is done by comparing the absolute value of the DoG scale space at the peak with the peak threshold $t_p$ and discarding the peak this value is below the threshold.

Edge threshold

Peaks too flat are generated by edges and do not yield stable features. These peaks are detected and removed as follows. Given a peak $x,y,\sigma$, the algorithm evaluates the Jacobian of the $x,y$ slice of DoG scae space at the scale $\sigma$. Then the following score (similar to the Harri's function) is computed:

\[ \frac{(\mathrm{tr}\,G(x,y))^2}{\det G(x,y)} \]

This score as a minimum (equal to 4) when both eigenvaues of the Jacobian are equal (curved peak) and is bigger and bigger as one of the eigenvalues grows and the other stays small. Peaks are retained if the score is below the quantity $(t_e+1)(t_e+1)/t_e$, where $t_e$ is the edge threshold. Notice that this quantity has a minimum equal to 4 when $t_e=1$ and grows thereafter. Therefore the range of the edge threshold is $[1,\infty)$.

Norm threshold

Near-uniform patches do not yield stable features. By default, all descriptors will be computed, but when this option is set, descriptors who have a small norm before scaling will be set explicitly to zero.

Orientations

A peak in the DoG scale space fixes 3 parameters of the keypoints, i.e. position and scale. It remains to choose an orientation. In order to do this, SIFT computes an histogram of the gradient orientations in a Gaussian window of std. dev. 1.5 times bigger than the scale $\sigma$ of the keypoint.

sift-orient.png

The histogram is then smoothed and the maximum is selected. In addition to the biggest mode, up to other three modes whose amplitude is within the 80% of the biggest mode are retained too, returned as additional orientations.

parameter alt. name standard value set by
$t_e$ edge_thresh 10.0 vl_sift_set_edge_thresh()
$t_p$ peak_thresh 0 vl_sift_set_peak_thresh()

SIFT descriptor

The SIFT descriptor is a three dimensional histogram $h(\theta,x,y)$ of the orientation $\theta$ and position $(x,y)$ of the gradient inside a patch surronding the keypoint. The following picutres illustrates the geometry of the histogram:

sift-bins.png

While SIFT descriptor $h(\theta,x,y)$ is a 3-D array but it is usually presented as a vector. This vector is obtained by stacking the 3-D array being $\theta$ is the fastest varying index and y the slowest.

The histogram uses soft binning, so that bins partially overlap. There are $B_p$ bins along the x and y directions and $B_o$ along the $\theta$ direction.

The actual sizes of the descriptor depend on the size of the keypoint as follows:

The following table summarizes the descriptors parameters along with their standard vale.

parameter alt. name standard value
$B_p$ BP 4
$B_o$ BO 8
$m$ magnif 3

The keypoint coordinates (x,y) are expressed in the standard image convention (y axis pointing down). This also establishes the convention for expressing the angle th of a vector v (here v could be either the gradient of the image or the direction of the keypoint). To slightly complicate the matter, however, the index th of the descriptor h(th,x,y) follows the opposite convention (this is for compatibility with Lowe's original SIFT implementation), as shown by the figure:

sift-angle.png

Acknowledgments

Author:
Andrea Vedaldi

Function Documentation

void vl_sift_calc_keypoint_descriptor ( VlSiftFilt f,
vl_sift_pix descr,
VlSiftKeypoint const *  k,
double  angle0 
)

Parameters:
f SIFT filter.
descr SIFT descriptor (output)
k keypoint.
angle0 keypoint direction.
The function computes the SIFT descriptor of the keypoint k of orientation angle0. The function fills the buffer descr which must be large enough to hold the descriptor.

The function assumes that the keypoint is on the current octave. If not, it does not do anything.

int vl_sift_calc_keypoint_orientations ( VlSiftFilt f,
double  angles[4],
VlSiftKeypoint const *  k 
)

Parameters:
f SIFT filter.
angles orientations (output).
k keypoint.
The function computes the orientation(s) of the keypoint k. The function returns the number of orientations found (up to four). The orientations themselves are written to the vector angles.

Remarks:
The function requires the keypoint octave k->o to be equal to the filter current octave vl_sift_get_octave. If this is not the case, the function returns zero orientations.

The function requries the keypoint scale level k->s to be in the range s_min+1 and s_max-2 (where usually s_min=0 and s_max=S+2). If this is not the case, the function returns zero orientations.

Returns:
number of orientations found.

void vl_sift_delete ( VlSiftFilt f  ) 

Parameters:
f SIFT filter to delete.
The function frees the resources allocated by vl_sift_new().

void vl_sift_detect ( VlSiftFilt f  ) 

The function detect keypoints in the current octave filling the internal keypoint buffer. Keypoints can be retrieved by vl_sift_get_keypoints().

Parameters:
f SIFT filter.

Index GSS

For internal use only.

Index matrix A

VL_INLINE double vl_sift_get_edge_thresh ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
threshold.

VL_INLINE VlSiftKeypoint const * vl_sift_get_keypoints ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
pointer to the keypoints list.

VL_INLINE int vl_sift_get_keypoints_num ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
number of keypoints.

VL_INLINE int vl_sift_get_level_num ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
number of leves per octave.

VL_INLINE double vl_sift_get_norm_thresh ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
threshold.

VL_INLINE vl_sift_pix * vl_sift_get_octave ( VlSiftFilt const *  f,
int  s 
)

Parameters:
f SIFT filter.
s level index.
The level index s ranges in the interval s_min = -1 and s_max = S + 2, where S is the number of levels per octave.

Returns:
pointer to the octave data for level s.

VL_INLINE int vl_sift_get_octave_first ( VlSiftFilt const *  f  ) 

-------------------------------------------------------------------

Parameters:
f SIFT filter.
Returns:
index of the first octave.

VL_INLINE int vl_sift_get_octave_height ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
current octave height.

VL_INLINE int vl_sift_get_octave_index ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
index of the current octave.

VL_INLINE int vl_sift_get_octave_num ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
number of octaves.

VL_INLINE int vl_sift_get_octave_width ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
current octave width.

VL_INLINE double vl_sift_get_peak_thresh ( VlSiftFilt const *  f  ) 

Parameters:
f SIFT filter.
Returns:
threshold ;

void vl_sift_keypoint_init ( VlSiftFilt const *  f,
VlSiftKeypoint k,
double  x,
double  y,
double  sigma 
)

Parameters:
f SIFT filter.
k SIFT keypoint (output).
x x coordinate of the center.
y y coordinate of the center.
sigma scale.
The function initializes the structure k from the location x and y and scale sigma of the keypoint.

VlSiftFilt* vl_sift_new ( int  width,
int  height,
int  O,
int  S,
int  o_min 
)

Parameters:
width image width.
height image height.
O number of octaves.
S number of levels per octave.
o_min first octave index.
The function allocates and returns a new SIFT filter for the specified image and scale space geomtery.

Setting O to a negative value sets the number of octaves to the maximum possible value depending on the size of the image.

Returns:
the new SIFT filter.
See also:
vl_sift_delete().

int vl_sift_process_first_octave ( VlSiftFilt f,
vl_sift_pix const *  im 
)

Parameters:
f SIFT filter.
im image data.
The function starts processing a new image by computing its Gaussian scale spadce at the lower octave. It also empties the internal keypoint buffer.

Returns:
error code. The function returns VL_ERR_EOF if there are no more octaves to process.
See also:
vl_sift_process_next_octave().

int vl_sift_process_next_octave ( VlSiftFilt f  ) 

Parameters:
f SIFT filter.
The function computes the next octave of the Gaussian scale space. Notice that this clears the record of any feature detected in the previous octave.

Returns:
error code. The function returns the error VL_ERR_EOF when there are no more ocvatves to process.
See also:
vl_sift_process_first_octave().

VL_INLINE void vl_sift_set_edge_thresh ( VlSiftFilt f,
double  t 
)

Parameters:
f SIFT filter.
t threshold.

VL_INLINE void vl_sift_set_norm_thresh ( VlSiftFilt f,
double  t 
)

Parameters:
f SIFT filter.
t threshold.

VL_INLINE void vl_sift_set_peak_thresh ( VlSiftFilt f,
double  t 
)

Parameters:
f SIFT filter.
t threshold.


Generated on Mon Jan 21 17:43:33 2008 for vlfeat by  doxygen 1.5.4