Today, the program of the Black Box Discrete Optimization Benchmarking (BB-DOB) Workshop 2019 has been released. This third installment of the BB-DOB Workshop series (after BB-DOB@GECCO 2018 and BB-DOB@PPSN 2018) takes place at the Genetic and Evolutionary Computation Conference (GECCO'19) in Prague, Czech Republic, whose program can be found here. The long term aim of our workshop series is to produce, for the domain of discrete optimization: a well-motivated benchmark function testbed, an experimental set-up and methods for the generation of data output for post-processing and approaches for the proper presentation of the results in graphs and tables. The aims of this GECCO 2019 BB-DOB workshop are to finalize the benchmarking testbed for discrete optimization and to promote a discussion of which performance measures should be used. Our workshop has six accepted submissions and one exciting panel discussion. It will span over two sessions and take place on sunday, July 14, 08:30-12:30 in room Club A. The schedule will leave room for discussions and socializing. We are looking forward to welcome you in July. Don't miss out on the paper presentations and the panel discussion!
Session 1: Sunday, July 14, 2019, 08:30-10:20, room Club A
- 08:30-08:40: Pietro S. Oliveto. Introduction to the workshop.
- 08:40-09:05: Ivan Ignashov, Arina Buzdalova, Maxim Buzdalov, and Carola Doerr. Illustrating the Trade-Off between Time, Quality, and Success Probability in Heuristic Search.
Benchmarking aims to investigate the performance of one or several algorithms for a set of reference problems by empirical means. An important motivation for benchmarking is the generation of insight that can be leveraged for designing more efficient solvers, for selecting a best algorithm, and/or for choosing a suitable instantiation of a parametrized algorithm. An important component of benchmarking is its design of experiment (DoE), which comprises the selection of the problems, the algorithms, the computational budget, etc., but also the performance indicators by which the data is evaluated. The DoE very strongly depends on the question that the user aims to answer. Flexible benchmarking environments that can easily adopt to users’ needs are therefore in high demand. With the objective to provide such a flexible benchmarking environment, the recently released IOHprofiler not only allows the user to choose the sets of benchmark problems and reference algorithms, but provides in addition a highly interactive, versatile performance evaluation. However, it still lacks a few important performance indicators that are relevant to practitioners and/or theoreticians. In this discussion paper we focus on one particular aspect, the probability that a considered algorithm reaches a certain target value within a given time budget. We thereby suggest to extend the classically regarded fixed-target and fixed-budget analyses by a fixed-probability measure. Fixed-probability curves are estimated using Pareto layers of (target, budget) pairs that can be realized by the algorithm with the required certainty. We also provide a first implementation towards a fixed-probability module within the IOHprofiler environment.
- 09:05-09:30: Borja Calvo, Ofer M. Shir, Josu Ceberio, Carola Doerr, Hao Wang, Thomas Bäck, and Jose A. Lozano. Bayesian Performance Analysis for Black-Box Optimization Benchmarking.
The most commonly used statistics in Evolutionary Computation (EC) are of the Wilcoxon-Mann-Whitney-test type, in its either paired or non-paired version. However, using such statistics for drawing performance comparisons has several known drawbacks. At the same time, Bayesian inference for performance analysis is an emerging statistical tool, which has the potential to become a promising complement to the statistical perspectives offered by the aforementioned p-value type test. This work exhibits the practical use of Bayesian inference in a typical EC setting, where several algorithms are to be compared with respect to various performance indicators. Explicitly, we examine performance data of 11 evolutionary algorithms (EAs) over a set of 23 discrete optimization problems in several dimensions. Using this data, and following a brief introduction to the relevant Bayesian inference practice, we demonstrate how to draw the algorithms’ probabilities of winning. Apart from fixed-target and fixed-budget results for the individual problems, we also provide an illustrative example per groups of problems. We elaborate on the computational steps, explain the associated uncertainties, and articulate considerations such as the prior distribution and the sample sizing. We also present as a reference the classical p-value tests.
- 09:30-09:55: Maxim Buzdalov. Towards Better Estimation of Statistical Significance when Comparing Evolutionary Algorithms.
The use of well-established statistical testing procedures to compare the performance of evolutionary algorithms often yields pessimistic results. This requires increasing the number of independent samples, and thus the computation time, in order to get results with the necessary precision. We aim at improving this situation by developing statistical tests that are good in answering typical questions coming from benchmarking of evolutionary algorithms. Our first step, presented in this paper, is a procedure that determines whether the performance distributions of two given algorithms are identical for each of the benchmarks. Our experimental study shows that this procedure is able to spot very small differences in the performance of algorithms while requiring computational budgets which are by an order of magnitude smaller (e.g. 15x) compared to the existing approaches.
- 09:55-10:20: Aleš Zamuda. Binary 100-Digit Challenge using IEEE-754 Coded Numerical Optimization Scenarios (100b-Digit) and V-shape Binary Distance-based Success History Differential Evolution (DISHv).
This paper proposes a new discrete optimization benchmark 100b- Digit, a binary discretized version for the 100-Digit Challenge. The continuous version 100-Digit Challenge utilizing continuous input parameters for a fitness function was suggested for the competitions at 2019 GECCO and 2019 CEC, while this paper proposes an extension, a discrete version of the 100-Digit Challenge, discretizing using an IEEE-754 double precision floating-point representation for the search space variables. Furthermore, the paper also presents a viable recent state-of-theart algorithm discretization to be applicable on the new 100b-Digit benchmark. V-shape transfer function is applied for binarization of the variables from the Distance-based Success History Differential Evolution (DISH) to create the new DISHv algorithm, which is then run on the 100b-Digit benchmark. The preliminary results for the DISHv algorithm are then reported for a basic value-to-reach termination criterion over 100b-Digit functions. This combination of 100b-Digit benchmark and DISHv algorithm demonstrates how to bridge a gap between recent continuous optimizers and different discrete benchmarks, and vice-versa.
Session 2: Sunday, July 14, 2019 10:40-12:30, room Club A
- 10:40-11:05: Markus Wagner. Kinder Surprise's Debut in Discrete Optimisation — A Real-World Toy Problem that can be Subadditive.
Kinder SurpriseTM are chocolate eggs that house a small toy. Fans of the toys across the world are collecting and trading these toys, with some toys being very rare and highly priced. With this paper, we present and publish a data set that was extracted from the German Kinder Surprise pricing guide for 2019. The data set contains prices and photos of 187 sets of figurines produced between 1979 and 2018. In total 2366 items are included. We also provide a few first ideas for using this data as a benchmark data set for discrete optimisation, i.e., for subadditive functions with matroid constraints, for combinatorial auctions, and for functions with occasionally violated conditions.
- 11:05-11:30: Carola Doerr, Furong Ye, Naama Horesh, Hao Wang, Ofer M. Shir, and Thomas Bäck. Benchmarking Discrete Optimization Heuristics with IOHprofiler.
Automated benchmarking environments aim to support researchers in understanding how different algorithms perform on different types of optimization problems. Such comparisons carry the potential to provide insights into the strengths and weaknesses of different approaches, which can be leveraged into designing new algorithms. Carefully selected benchmark problems are also needed as training sets in the context of algorithm selection and configuration. With the ultimate goal to create a meaningful benchmark set for iterative optimization heuristics, we compile and assess in this work a selection of discrete optimization problems that subscribe to different types of fitness landscapes. All problems have been implemented and tested within IOHprofiler, our recently released software built to assess iterative heuristics solving combinatorial optimization problems. For each selected problem we compare performances of eleven different heuristics. Apart from fixed-target and fixed-budget results for the individual problems, we also derive ECDF results for groups of problems. To obtain these, we have implemented an add-on for IOHprofiler, which allows aggregation of performance data across different benchmark functions.
- 11:30-12:30: Panel Discussion on Benchmarking. Members:
- Thomas Bäck, Leiden University, Leiden, The Netherlands
- Johann Dreo, Thales Research & Technology, Palaiseau, France
- Jose A. Lozano, University of the Basque Country, Bilbao, Spain
- Boris Naujoks, TH Köln, Cologne, Germany
- Olivier Teytaud, Facebook Artificial Intelligence lab, Paris, France
- Vanessa Volz, Game AI Research Group, Queen Mary University of London, London, UK
- Markus Wagner, University of Adelaide, Adelaide, SA, Australia
- Moderator: Carola Doerr, Sorbonne University, Paris, France