aib.h File Reference

Agglomerative Information Bottleneck (AIB). More...

#include "generic.h"

Go to the source code of this file.

Typedefs

typedef vl_double vl_aib_prob
 AIB probability type.
typedef vl_uint vl_aib_node
 AIB node type.

Functions

vl_aib_nodevl_aib (vl_aib_prob *Pcx, vl_uint nlabels, vl_uint nvalues, double **cost)
 Runs AIB on Pcx.


Detailed Description

Author:
Brian Fulkerson

Andrea Vedaldi

This provides an implementation of Agglomerative Information Bottleneck (AIB) as first described in:

[Slonim] N. Slonim and N. Tishby. Agglomerative information bottleneck. In Proc. NIPS, 1999

AIB takes a discrete valued feature $x$ and a label $c$ and gradually compresses $x$ by merging values while preserving as mush as possible the mutual information $I(x,c)$.

While the algorithm is equivalent to the one described in [Slonim], it uses some speedups that enable handling much larger datasets. Let N be the number of feature values and C the number of labels. [Slonim] algorithm is $O(N^2)$ space and $O(C N^3)$ time. This algorithm is $O(N)$ space and $O(C N^2)$ time in common cases ($O(N^3C)$ in the worst case).

Overview

Given a discrete feature $x \in \mathcal{X} = \{x_1,\dots,x_N\}$ and a category label $c = 1,\dots,C$ with joint probability $p(x,c)$, AIB computes a compressed feature $[x]_{ij}$ by merging two values $x_i$ and $x_j$. Among all the pairs $ij$, AIB chooses the one that yields the smallest loss in the mutual information

\[ D_{ij} = I(x,c) - I([x]_{ij},c) = \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} + \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} - \sum_c (p(x_i)+p(x_j)) \log \frac {p(x_i,c)+p(x_i,c)}{(p(x_i)+p(x_j))p(c)} \]

AIB iterates this procedure as long as the desired level of compression is achieved.

Entropy constrained AIB

Here we suggest an `entropy constrained' version of AIB. This version of AIB optimizes the information-entropy trade-off

\[ F(x,c|\beta) = I(x,c) - \beta H(x), \quad \beta > 0. \]

Joining $ij$ yields an improvement

\[ \Delta_{ij} = F([x]_{ij},c|\beta) - F(x,c|\beta) \]

of this goal function. For small values of $\beta$, there is no convenience in performing any merge (i.e. $\Delta_{ij} < 0$ for all pairs). However for $\beta$ big enough a merge will yield a positive improvement $\Delta_{ij} \geq 0$. The minimum vale of $\beta$ for which this happens is

\[ \beta = \min_{ij} \left(- \frac{I(x,c) - I([x]_{ij},c)}{H(x) - H([x]_{ij})} \right). \]

This also identifies the pair $ij$ that we shoudl merge. Entropy constrained AIB is therefore the same as AIB, except that it works with the adjusted matrix

\[ D_{ij} = - \frac{I(x,c) - I([x]_{ij},c)}{H(x) - H([x]_{ij})}. \]

Volume regularized AIB

Blha

Algorithm details

Computing $D_{ij}$ requires $O(C)$ operations. For example, in standard AIB we need to calculate

\[ D_{ij} = I(x,c) - I([x]_{ij},c) = \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} + \sum_c p(x_i) \log \frac{p(x_i,c)}{p(x_i)p(c)} - \sum_c (p(x_i)+p(x_j)) \log \frac {p(x_i,c)+p(x_i,c)}{(p(x_i)+p(x_j))p(c)} \]

Thus in a basic implementation of AIB, finding the optimal pair $ij$ of feature values requires $O(CN^2)$ operations in total. In order to join all the $N$ values, we repeat this procedure $O(N)$ times, yielding $O(N^3 C)$ time and $O(1)$ space complexity (this does not account for the space need to store the input).

The complexity can be improved by reusing computations. For instance, we can store the matrix $D = [ D_{ij} ]$ (which requires $O(N^2)$ space). Then, after joining $ij$, all of the matrix D except the rows and columns (the matrix is symmetric) of indexes i and j is unchanged. These two rows and columns are deleted and a new row and column, whose computation requires $O(NC)$ operations, are added for the merged value $x_{ij}$. Finding the minimal element of the matrix still requires $O(N^2)$ operations, so the complexity of this algorithm is $O(N^2C + N^3)$ time and $O(N^2)$ space.

We can obtain a much better expected complexity as follows. First, instead of storing the whole matrix D, we store the smallest element (index and value) of each row as $(q_i, D_i)$ (notice that this is also the best element of each column, being D symmetric). This requires $O(N)$ space only and finding the minimal element of the marix requires $O(N)$ operations only. After joining $ij$, we have to update efficiently this representation. This is easily done as follows:

This algorithm requires only $O(N)$ space and $O(\gamma(N) C N^2)$ time, where $\gamma(N)$ is the expected number of times we fall in the last case. In common cases one has $\gamma(N) \approx \mathrm{const.}$, so the time saving is significant.

Author:
Brian Fulkerson

Function Documentation

vl_aib_node* vl_aib ( vl_aib_prob Pcx,
vl_uint  nlabels,
vl_uint  nvalues,
double **  cost 
)

Parameters:
Pcx joint probability table (column major).
nlabels number of rows in Pcx (number of labels).
nvalues number of columns in Pcx (number of feature values).
cost cost of each node (out).
The function runs Agglomerative Information Bottleneck (AIB) on the joint probabilit talbe Pcx which has labels along the columns and feature valeus along the rows. AIB iteratively merges the two vlaues of the feature x that casuses the smallest decrease in mutual information between the random variables x and c.

Merge operations are arranged in a binary tree. The nodes of the tree correspond to the original feature values and any other value obtained as a result of a merge operation. The nodes are indexed in breadth-first order, starting from the leaves. The first index is zero. In this way, the leaves correspond directly to the original feature values. In total there are 2*nvalues-1 nodes.

The function returns an array whit one element per tree node. Each element is the index the parent node. The root parent is equal to zero.

Feature values with null probability are ignored by the algorithm and their nodes have parents indexing a non-existant tree node (a value bigger than 2*nvalues-1).

If cost is not equal to NULL, then the function will also return a vector with the information level after each merge. cost has nvalues entries: The first is the value of the cost funcitonal before any merge, and the other are the cost after the nvalues-1 merges.

Returns:
the array of parents representing the tree. The array has 2*nvalues-1 elements.


Generated on Mon Jan 21 17:43:32 2008 for vlfeat by  doxygen 1.5.4