Today, our article "Automatically discovering clusters of algorithm and problem instance behaviors as well as their causes from experimental data, algorithm setups, and instance features" has appeared in the Applied Soft Computing journal published by Elsevier, which describes our research topic Automating Scientific Research in Optimization. This article maybe marks the first contribution where a significant part of the high-level work of a researcher in the fields of optimization and machine learning is automated by a process applying different machine learning steps.
Thomas Weise, Xiaofeng Wang, Qi Qi, Bin Li, and Ke Tang. Automatically discovering clusters of algorithm and problem instance behaviors as well as their causes from experimental data, algorithm setups, and instance features. Applied Soft Computing Journal (ASOC), 73:366–382, December 2018.
doi:10.1016/j.asoc.2018.08.030 / share link (valid until November 6, 2018)
In the fields of heuristic optimization, we aim to get good solutions for computationally hard problems. Solving the Travelling Salesman Problem, for instance, means to find the shortest tour that goes through n cities and returns back to the starting point. Such problems often cannot be solved to optimality in feasible time due to their complexity. This means that algorithms often start with a more or less random initial guess about the solution and then step-by-step improve it. This means performance has two dimensions: the runtime we grant to the algorithm until we stop it and take the best-so-far result and the solution quality of that best-so-far result. Since there currently are not yet sufficient theoretical tools to assess the performance of such algorithms, researchers conduct many experiments and compare the results. This often means to apply many different setups of an algorithm to many different instances of a problem type. Since optimization algorithms are often randomized, multiple repetitions of the experiments are needed. Evaluating such experimental data is not easy. Moreover, as evaluation result, we do not just want to know which algorithm performs best and which problem is the hardest ― a researcher wants to know why.
In this paper, we introduce a process which can 1) automatically model the progress of algorithm setups on different problem instances based on data collected in experiments, 2) use these models to discover clusters of algorithm (or problem instance) behaviors, and 3) propose causes why a certain algorithm setup (or problem instance) belongs to a certain algorithm (or problem instance) behavior cluster. These high-level conclusions are presented in form of decision trees relating algorithm parameters (or instance features) to cluster ids. We emphasize the duality of analyzing algorithm setups and problem instances. Our process is implemented as open source software and tested in two case studies, on the Maximum Satisfiability Problem and the Traveling Salesman Problem. Besides its basic application to raw experimental data, yielding clusters and explanations of ‘‘quantitative’’ algorithm behavior, our process also allows for ‘‘qualitative’’ conclusions by feeding it with data which is normalized based on problem features or algorithm parameters. It can also be applied recursively, e.g., to further investigate the behavior of the algorithms in the cluster with the best-performing setups on the problem instances belonging to the cluster of hardest instances. Both use cases are investigated in the case studies. We conclude our article by a comprehensive analysis of the drawbacks of our method and with suggestions on how it can be improved.