Modellselektion von Clusteringverfahren mittels minimaler Beschreibungslaenge (bibtex)

by Alexander Selb

Abstract:

This diploma thesis presents the use of the \emphminimum description length (\emphMDL) principle for unsupervised clustering. Two frameworks are presented which determine the right size of neural networks used for \emphcrisp/hard and \emphfuzzy clustering purposes. The optimization procedure starts with a network consisting of a large number of network nodes and iteratively reduces the net size until a complexity criterion is met. The \emphMDL principle is used to measure the complexity of the network. Different instantiations of the \emphMDL algorithm, depending on the coding procedure, are possible and derived in the thesis. In addition, we demonstrate how the \emphMDL principle increases the robustness of the training algorithm by providing a criterion for \emphoutlier detection. Also a \emphgrowing component based on \emphMDL is presented to cope with the formation of new data clusters due to non-stationary data. In order to overcome the problem of initialization the \emphgrowing neural gas (GNG) algorithm is used which thereby also increases the computational efficiency of the \emphMDL algorithm. Furthermore, we present additional methods to increase the computational efficiency. Typical applications, like 2 dimensional clustering, data compression, \emphRGB color image segmentation and learning the \emphhand-eye coordination for robot control, are used to demonstrate the behaviour of the \emphMDL algorithm.

Reference:

Modellselektion von Clusteringverfahren mittels minimaler Beschreibungslaenge (Alexander Selb), Technical report, PRIP, TU Wien, 2000.

Bibtex Entry:

@TechReport{PTR-Selb00a, author = "Alexander Selb", institution = "PRIP, TU Wien", number = "PRIP-TR-062", title = {Modellselektion von {C}lusteringverfahren mittels minimaler {B}eschreibungslaenge}, year = "2000", url = "ftp://ftp.prip.tuwien.ac.at/pub/publications/trs/tr62.pdf", abstract = "This diploma thesis presents the use of the \emph{minimum description length} (\emph{MDL}) principle for unsupervised clustering. Two frameworks are presented which determine the right size of neural networks used for \emph{crisp/hard} and \emph{fuzzy} clustering purposes. The optimization procedure starts with a network consisting of a large number of network nodes and iteratively reduces the net size until a complexity criterion is met. The \emph{MDL} principle is used to measure the complexity of the network. Different instantiations of the \emph{MDL} algorithm, depending on the coding procedure, are possible and derived in the thesis. In addition, we demonstrate how the \emph{MDL} principle increases the robustness of the training algorithm by providing a criterion for \emph{outlier} detection. Also a \emph{growing} component based on \emph{MDL} is presented to cope with the formation of new data clusters due to non-stationary data. In order to overcome the problem of initialization the \emph{growing neural gas} (GNG) algorithm is used which thereby also increases the computational efficiency of the \emph{MDL} algorithm. Furthermore, we present additional methods to increase the computational efficiency. Typical applications, like 2 dimensional clustering, data compression, \emph{RGB} color image segmentation and learning the \emph{hand-eye} coordination for robot control, are used to demonstrate the behaviour of the \emph{MDL} algorithm.", }

Powered by bibtexbrowser