The rapidly growing amount of data, available from di?erent technologies in the ?eld of bio-sciences, high-energy physics, economy, climate analysis, and in several other scienti?c disciplines, requires a new generation of machine learning and statistical methods to deal with their complexity and hete- geneity. As data collections becomes easier, data analysis is required to be more sophisticated in order to extract useful information from the available data. Even if data can be represented in several ways, according to their structural characteristics, ranging from strings, lists, trees to graphs and other more complex data structures, in most applications they are typically represented as a matrix whose rows correspond to measurable characteristics called f- tures, attributes, variables, depending on the considered discipline and whose columns correspond to examples (cases, samples, patterns). In order to avoid confusion,we will talk about features and examples.In real-worldtasks,there canbe manymorefeatures than examples(cancer classi?cationbasedongene expressionlevels in bioinformatics) or there can be many more examples than features(intrusion detection in computer/networksecurity).
In addition, each example can be either labeled or not. Attaching labels allows to distinguish members of the same class or group from members of other classes or groups. Hence, one can talk about supervised and unsupervised tasks that can be solved by machine learning methods. Since it is widely accepted that no single classi?er or clustering algorithm canbesuperiortotheothers,ensemblesofsupervisedandunsupervisedme- ods are gaining popularity. A typical ensemble includes a number of clas- ?ers/clustererswhosepredictionsarecombinedtogetheraccordingtoacertain rule, e.g. majority vote.