Machine learning with Lipschitz classifiers

Stuhlsatz, André

Please use this identifier to cite or link to this item: http://dx.doi.org/10.25673/5050

Title:	Machine learning with Lipschitz classifiers
Author(s):	Stuhlsatz, André
Referee(s):	Wendemuth, Andreas Findeisen, Rolf
Granting Institution:	Otto-von-Guericke-Universität Magdeburg
Issue Date:	2010
Extent:	Online-Ressource (PDF-Datei: 230 S., 3842 KB)
Type:	Hochschulschrift
Type:	PhDThesis
Language:	English
Publisher:	Universitätsbibliothek Otto von Guericke University Library, Magdeburg, Germany
URN:	urn:nbn:de:101:1-201104192945
Subjects:	Maschinelles Lernen Lipschitz-Bedingung Support-Vektor-Maschine Hochschulschrift Online-Publikation
Abstract:	The classification of complex patterns is one of the most impressive cognitive achievements of the human brain. Humans have the ability to recognize a complex image, like for example that of a known person, and to distinguish it from other objects within half a second. While for a solution of this task the brain has access to a massive parallelism and a vast, hierarchically organized, and auto-associative memory, common computer architectures are just able to a sequential processing of information stored in a non auto-associative memory. Even modern, parallelly operating, multi-processor systems are far away from the performance of our brain. However, nowadays, it is possible to solve complex and memory extensive pattern recognition problems, like the recognition of handwritten digits or the transcription of speech, satisfactorily with a common computer by the use of modern statistical and algorithmic learning approaches. One of the most successful pattern recognition methods is the so-called Support Vector Machine (SVM). The SVM is based on the learning paradigm of structural risk minimization, which outperforms empirical approaches if only few data is available for solving the considered classification problem. Although the SVM has proven very good recognition performances in many cases, the SVM also comes up with limitations, for example if specific a priori knowledge shall be used. In particular, the increasing complexity of applications requires a high adaptivity of the classification method to the specific problem. Also concerning this point, the SVM is limited due to a restricted variety of implementable classification functions. The objective of the present thesis is the development of new learning algorithms for the classification of patterns, that on the one hand overcome the limitations of the SVM, but on the other hand are based on the same theoretical concepts facilitating the good performance of the SVM. Two new algorithms will be presented that are justified by a theoretical generalization of the SVM, and which will be utilized for the first time for a practical implementation. In contrast to the SVM, the new methods make accessible a much larger function class for constructing a classifier. This is an important prerequisite for flexible adaptation of the classifier to difficult classification tasks with particular requirements as well as for the integration of a priori knowledge about the problem at hand. In this work, the way to implementable algorithms leads across different mathematical reformulations of the original problem. Starting with the theoretical generalization of the SVM, it results a restricted optimization problem that is difficult to solve in general. In a first step, this problem is expressed in terms of a restricted minimax-problem by a modification of the suitable classification functions to a still very large function class consisting of (affine-)linear combinations of at least one-time continuously differentiable functions. In the next step, the minimax-problem is converted into a so-called Semi-Infinite Problem (SIP). It turns out, that this particular mathematical problem is appropriate in order to obtain a solution of the original problem for the considered function class using well-known optimization methods. To further exploit the problem structure, an equivalent dual problem is derived from the SIP. Therefore, we prove a duality theorem about the equality of the optimal values of the dual and the original problem. For solving the dual problem, a multilevel iterative approach is developed from which the proposed algorithms follow by pursuing different solution strategies. Moreover, all sub-optimization methods of any stage necessary for an implementation in software are developed. Namely, these are an adapted interior-point-method, a simulated annealing based search heuristics and a particular gradient decent approach. Furthermore, options are depicted for an improvement of efficiency for future implementations. Besides the emphasis on the theoretical development of new learning methods and their practical implementations, all algorithms were implemented in the MATLAB(R) programming environment for the experimental part of the present thesis. Hence, they are also available for further research purposes in future. For the first time, classification results are explored and evaluated in comparison to the SVM on different data sets. As test data, an artificial 2d-dataset as well as two real-world datasets were used. In the concluding experiment, a scenario is prototypically considered to which the SVM is only inadequately applicable and which shall precisely prove the capability of the new methods in that case. It follows, regarding the considered datasets, the proposed learning methods reach comparably good classification accuracy like the SVM in standard applications. Moreover, the particular benefit of the new methods is reflected theoretically and experimentally in the ability to solve classification problems using decision functions that are not accessible to SVMs. Thereby, the underlying ideas, which make the SVM excel compared to other approaches with respect to generalization performances in case of few available learning information, are adequately transported into the proposed new environment. This opens the way for a design and a use of new classifiers that have not been implementable in a robust and generalizing basic concept so far.
URI:	https://opendata.uni-halle.de//handle/1981185920/11140 http://dx.doi.org/10.25673/5050
Open Access:	Open access publication
Appears in Collections:	Fakultät für Elektrotechnik und Informationstechnik

Files in This Item:

File	Description	Size	Format
thesis-as.pdf		3.84 MB	Adobe PDF	View/Open

Show full item record BibTeX EndNote