From November 9 to 13, 2020, the Lorentz Center Workshop "Benchmarked: Optimization Meets Machine Learning" is jointly organized by Carola Doerr, Thomas Stützle, Mike Preuss, Marc Schoenauer, and Joaquin Vanschoren as an online event. It brings together experts from all fields of benchmarking and automated algorithm configuration and selection with focus on optimization and had more than 100 registered participants. In particular, the Benchmarked: Optimization Meets Machine Learning workshop, the goal is to discuss the impact of automated decision-making on heuristic optimization. More specifically, it is discussed how the possibility to automatically select and configure optimization heuristics changes the requirements for their benchmarking. The key objectives of this Lorentz Center workshop are:
- to develop a joint vision on the next generation of benchmarking optimization heuristics in the context of automated algorithm selection and configuration, and
- to design a clear road-map guiding the research community towards this vision.
It is discussed what an ideal benchmarking environment would look like, how such an "ideal tool" compares to existing software, and how we can close the gap by improving the compatibility between ongoing and future projects. The aim is to designing a full benchmarking engine that ranges from modular algorithm frameworks over problem instance generators and landscape analysis tools to automated algorithm configuration and selection techniques, all the way to a statistically sound evaluation of the experimental data.
In this setting, Prof. Thomas Weise organized a first and co-organizes a second breakout session on "Data Formats for Benchmarking." The rationale behind these specific sessions is that technical details are often ignored in research. However, technicalities such as the data format used for storing experimental results can nevertheless have a big influence on our research. The data format determines what information will be available after experiment. This includes what information is available for evaluation. But it also determines whether the experiment will be easy to replicate or whether the results can be validated. It also determines which tools we can use for evaluating the results. The data format may even determine how we can execute an experiment (in parallel? in a distributed fashion? can experiments be restarted?). The goal of the breakout session is to collect thoughts and ideas about suitable data formats for storing the output of experiments in optimization and machine learning. The aim is to collect a set of requirements for a good format and structure. If these are well understood, it may be possible to eventually define a simple and clear standard for the future – or at least some guidelines that can help researchers to not miss any detail that should be considered when storing experimental data.