#include "generic.h"
Go to the source code of this file.
Typedefs | |
typedef vl_double | vl_aib_prob |
AIB probability type. | |
typedef vl_uint | vl_aib_node |
AIB node type. | |
Functions | |
vl_aib_node * | vl_aib (vl_aib_prob *Pcx, vl_uint nlabels, vl_uint nvalues, double **cost) |
Runs AIB on Pcx. |
[Slonim] N. Slonim and N. Tishby. Agglomerative information bottleneck. In Proc. NIPS, 1999
AIB takes a discrete valued feature and a label
and gradually compresses
by merging values while preserving as mush as possible the mutual information
.
While the algorithm is equivalent to the one described in [Slonim], it uses some speedups that enable handling much larger datasets. Let N be the number of feature values and C the number of labels. [Slonim] algorithm is space and
time. This algorithm is
space and
time in common cases (
in the worst case).
AIB iterates this procedure as long as the desired level of compression is achieved.
Joining yields an improvement
of this goal function. For small values of , there is no convenience in performing any merge (i.e.
for all pairs). However for
big enough a merge will yield a positive improvement
. The minimum vale of
for which this happens is
This also identifies the pair that we shoudl merge. Entropy constrained AIB is therefore the same as AIB, except that it works with the adjusted matrix
Thus in a basic implementation of AIB, finding the optimal pair of feature values requires
operations in total. In order to join all the
values, we repeat this procedure
times, yielding
time and
space complexity (this does not account for the space need to store the input).
The complexity can be improved by reusing computations. For instance, we can store the matrix (which requires
space). Then, after joining
, all of the matrix D except the rows and columns (the matrix is symmetric) of indexes i and j is unchanged. These two rows and columns are deleted and a new row and column, whose computation requires
operations, are added for the merged value
. Finding the minimal element of the matrix still requires
operations, so the complexity of this algorithm is
time and
space.
We can obtain a much better expected complexity as follows. First, instead of storing the whole matrix D, we store the smallest element (index and value) of each row as (notice that this is also the best element of each column, being D symmetric). This requires
space only and finding the minimal element of the marix requires
operations only. After joining
, we have to update efficiently this representation. This is easily done as follows:
This algorithm requires only space and
time, where
is the expected number of times we fall in the last case. In common cases one has
, so the time saving is significant.
vl_aib_node* vl_aib | ( | vl_aib_prob * | Pcx, | |
vl_uint | nlabels, | |||
vl_uint | nvalues, | |||
double ** | cost | |||
) |
Pcx | joint probability table (column major). | |
nlabels | number of rows in Pcx (number of labels). | |
nvalues | number of columns in Pcx (number of feature values). | |
cost | cost of each node (out). |
x
that casuses the smallest decrease in mutual information between the random variables x
and c
.
Merge operations are arranged in a binary tree. The nodes of the tree correspond to the original feature values and any other value obtained as a result of a merge operation. The nodes are indexed in breadth-first order, starting from the leaves. The first index is zero. In this way, the leaves correspond directly to the original feature values. In total there are 2*nvalues-1
nodes.
The function returns an array whit one element per tree node. Each element is the index the parent node. The root parent is equal to zero.
Feature values with null probability are ignored by the algorithm and their nodes have parents indexing a non-existant tree node (a value bigger than 2*nvalues-1
).
If cost is not equal to NULL, then the function will also return a vector with the information level after each merge. cost has nvalues
entries: The first is the value of the cost funcitonal before any merge, and the other are the cost after the nvalues-1
merges.
2*nvalues-1
elements.