In this section, we survey our most important results. Results that were obtained in cooperation with other research areas are discussed in the section on collaborations. Our research provides foundations for the cluster: geometry is a key mode for information representationl search, data mining, and information retrieval are required for efficient data access; ad-hoc networks, routing algorithms and algorithmic game theory are essential for the network infrastructure; data in multimodal applicatations is massive; and security and reliability are major concerns.
Computational Geometry and Geometric Computing
Computational Geometry seeks to understand the issues that arise when computing with a large number of geometric objects. These issues range from developing and analyzing algorithms, to determining the computational complexity of geometric problems, and to analyzing the combinatorics of geometric configurations. We have achieved progress in all these issues.
Elbassioni et al. present a 4-approximation algorithm for the problem of placing the fewest guards on a 1.5D terrain so that every point of the terrain is seen by at least one guard. This improves on the previous best approximation factor of 5. Their method is based on rounding the linear programming relaxation of the corresponding covering problem. Bringmann and Friedrich consider the computation of the volume of the union of high-dimensional geometric objects. While showing that this problem is #P-hard even for very simple bodies, they give a fast FPRAS for all objects where one can (1) test whether a given point lies inside the object, (2) sample a point uniformly, and (3) calculate the volume of the object in polynomial time. It suffices to be able to answer all three questions approximately. It implies that Klee's measure problem can be approximated efficiently even though it is #P-hard and hence cannot be solved exactly in polynomial time in the number of dimensions unless P=NP. Their algorithm also allows efficient approximation of the volume of the union of convex bodies given by weak membership oracles.
Tverberg's Theorem is a deep result about partitioning finite point sets into parts whose convex hulls intersect. Basit et al. use a technique that Tverberg and Vrecica recently discovered and show the following extension of the centerpoint theorem. Given any set P of n points in the plane, and a parameter 1/3 ≤ c ≤ 1, one can always find a disk D such that any closed half-space containing D contains at least cn points of P. Furthermore, D contains at most (3c - 1)n/2 points of P (the case c = 1 is trivial - take any D containing P; the case c = 1/3 is the centerpoint theorem). They also show that, for all c, this bound is tight up to a constant factor, and extend the upper bound to ℜd.
The computation of 2D arrangements of arbitrary algebraic curves is a fundamental problem in 2D geometric computing. Eigenwillig and Kerber show how to compute it with a refined Bentley-Ottmann sweep-line algorithm. The geometric primitives required are the cylindrical algebraic decompositions of the plane for one or two curves. We compute them by a new and efficient method that combines adaptive-precision root finding (see below) with a small number of symbolic computations. The implementation is efficient and produces the mathematically correct arrangement, undistorted by rounding error, for any set of input segments. Our algorithm is implemented in the EXACUS library AlciX.
In 3D, we have made significant progress on computing the topology of a single surface. Berberich, Kerber, and Sagraloff present a method to compute the exact topology of a real algebraic surface S, implicitly given by a polynomial ƒ ∈ Q[x,y,z] of arbitrary total degree N. Additionally, our analysis provides geometric information as it supports the computation of arbitrary precise samples of S including critical points. We compute a stratification ΩS of S into O(N5) nonsingular cells, including the complete adjacency information between these cells. The algorithm uses numerical and combinatorial methods to minimize costly symbolic computations. A complete C++-implementation of the stratification algorithm is available. It shows good performance on benchmark problems from algebraic geometry.
Isolating the roots of one- and two-dimensional polynomial systems lies at the core of the algorithms above. Mehlhorn and Sagraloff describe a bisection algorithm for root isolation of polynomials with real coefficients. It is assumed that the coefficients can be approximated with arbitrary precision; exact computation in the field of coefficients is not required. We refer to such coefficients as bitstream coefficients.
Search, Data Mining and Information Retrieval
We mention three results. Each constitutes significant progress on a well-studied and fundamental problem. R. Seidel considers the complexity of sorting strings in the model that counts comparisons between symbols and not just comparisons between strings. He shows that for any set of strings S, the complexity of sorting S can naturally be expressed in terms of the trie induced by S. This holds not only for lower bounds but also for the running times of various algorithms. Thus this "data-specific" analysis allows a direct comparison of different algorithms running on the same data. He gives such "data-specific" analysis for various versions of quicksort and versions of mergesort. As a corollary, he arrives at a very simple analysis of quicksorting random strings, which so far had required rather sophisticated mathematical tools.
Cuckoo hashing is the hashing data structure of choice. In Cuckoo hashing, each element can be stored in k random locations out of the n locations of the table. Fountoulakis and Pangiotou prove that for each k ≥ 3 with probability 1 - o(1) the maximum number of elements that can be hashed is (1 - o(1))c*kn, and more items prevent successful allocation. The constant ck is determined explicitly.
The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis.
But even the smoothed analysis so far are unsatisfactory, as the bounds are still superpolynomial in the number n of data points. Arthur et al. settle the smoothed running time of the k-means method. They show that the smoothed number of iterations is bounded by a polynomial in n and 1/σ, where σ is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.
Certifying Algorithms
A certifying algorithm is an algorithm that produces, with each output, a certificate or witness (easy-to-verify proof) that the particular output has not been compromised by a bug. A user of a certifying algorithm inputs x, receives the output y and the certificate w, and then checks, either manually or by use of a program, that w proves that y is a correct output for input x. In this way, he or she can be sure of the correctness of the output without having to trust the algorithm. We put forward the thesis that certifying algorithms are far superior to non-certifying algorithms, and that for complex algorithmic tasks, only certifying algorithms are satisfactory. Acceptance of this thesis would lead to a change in how algorithms are taught and researched. The widespread use of certifying algorithms would greatly enhance the reliability of algorithmic software. In we survey the state of the art in certifying algorithms and add to it. In particular, we start a theory of certifying algorithms and prove that the concept is universal. Many safety-critical embedded systems are subject to certification requirements; some systems may be required to meet multiple sets of certification requirements from different certification authorities. Certification requirements in such "mixed-criticality" systems give rise to interesting scheduling problems that cannot be addressed satisfactorily using techniques from conventional scheduling theory. Bonifaci et al. study a formal model for representing such mixed-criticality workloads. They quantify, via the metric of processor speedup factor, the effectiveness of two techniques, reservation-based scheduling and priority-based scheduling, that are widely used in scheduling such mixed-criticality systems, showing that the latter of the two is superior to the former. They also show that the speedup factors are tight for these techniques.
Ad-Hoc Networks and Routing Algorithms
Broadcast is one of the fundamental operations in networks. Doerr, Friedrich and Sauerwald propose and analyse a quasirandom analogue to the classical push model for disseminating information in networks ("randomized rumor spreading"). In the classical model, in every round, each informed node chooses a neighbor at random and informs it. Results of Frieze and Grimmett (Discrete Appl. Math. 1985) show that this simple protocol succeeds in spreading a rumor from one node of a complete graph to all others within O(log n) rounds. For the network structured as a hypercube or a random graph G(n; p) with p ≤ (1 + ε)(log n)/n, O(log n) rounds also suffice (Feige, Peleg. Raghavan, and Upfal, Random Struct. Algorithms 1990). In the quasirandom model, we assume that each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position in the list, but from then on informs its neighbors in the order given by the list. Surprisingly, irrespective of the orders of the lists, the above-mentioned bounds still hold. In addition, we also show an O(log n) bound for sparsely connected random graphs G(n; p) with p = (log n + ƒ(n))/n, where ƒ(n) ≥ ω(1) and ƒ(n) = O(log log n). Here, the classical model needs strictly more rounds. Hence, the quasirandom model achieves similar or better broadcasting times with a greatly reduced use of random bits.
Algorithmic Game Theory
Scheduling on related machines (Q||Cmax) is one of the most important problems in the field of algorithmic mechanism design. Each machine is controlled by a selfish agent, and her valuation can be expressed via a single parameter, her speed. Archer and Tardos showed that a (non-polynomial) allocation that minimizes the makespan can be truthfully implemented. On the other hand, if we leave out the game-theoretic issues, the complexity of the problem has been completely settled - the problem is strongly NP-hard, while there exists a PTAS. This problem is the most well-studied in single-parameter algorithmic mechanism design. It gives an excellent ground for exploring the boundary between truthfulness and efficient computation. Since the work of Archer and Tardos, quite a lot of deterministic and randomized mechanisms have been suggested. Recently, a breakthrough result showed that a randomized, truthful-in-expectation PTAS exists. On the other hand, for the deterministic case, the best known approximation factor is 2.8. It has been a major open question whether there exists a deterministic truthful PTAS, or whether truthfulness has an essential, negative impact on the computational complexity of the problem. Christodoulou and Kovacs give a definitive answer to this important question by providing a truthful deterministic PTAS.
Handling and Manipulating Massive Data
The streaming model was developed to handle massive data sets. In this setting, one assumes that one is only given a constant amount of main memory to work with, and the (much larger) input can be scanned one or multiple times. Mestre et al. study the maximum weight matching problem in the semi-streaming model (they assume that main memory is large enough to hold a matching, but not large enough to hold the whole input graph) and improve on the current best one-pass algorithm due to Zelke (Proc. STACS 2008) by devising a deterministic approach whose performance guarantee is 4.91 + eps. In addition, they study preemptive online algorithms, a sub-class of one-pass algorithms where one is only allowed to maintain a feasible matching in memory at any point in time. All known results prior to Zelke belong to this sub-class. They provide a lower bound of 4.967 on the competitive ratio of any such deterministic algorithm, and hence show that future improvements will have to store in memory a set of edges which is not necessarily a feasible matching.
Security
Security is an aspect orthogonal to efficiency. Information-flow analysis is a powerful technique for reasoning about the sensitive information exposed by a program during its execution. Backes et al. present the first automatic method for information-flow analysis that discovers what information is leaked and computes its comprehensive quantitative interpretation. The leaked information is characterized by an equivalence relation on secret artifacts, and is represented by a logical assertion over the corresponding program variables. Our measurement procedure computes the number of discovered equivalence classes and their sizes. This provides a basis for computing a set of quantitative properties, which includes all established information-theoretic measures in quantitative information flow. Our method exploits an inherent connection between formal models of qualitative information flow and program verification techniques. We provide an implementation of our method that builds upon existing tools for program verification and information-theoretic analysis. Our experimental evaluation indicates the practical applicability of the presented method.
Handling NP-Completeness
See the report on Jiong Guo's IRG.