Sublinear Algorithms, Online Packing, Hashing and Clustering

Research output: Book/ReportPh.D. thesisResearch

In this thesis we study the theoretical properties of several algorithmic problems. The first two problems aim at completing a certain task in sublinear time, namely using a number of operation that is sublinear in the input size. A short description of these tasks follows.
Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.
Sum estimation via weighted sampling: We are given a set U of items and each item u ∈ U has a weight w(u). We can sample elements from U with probability proportional to their weight. Our task is to estimate the combined weight P u∈U w(u) using as few samples as possible
Then, we study two problems concerned with online packing of polygons. In both cases, we give upper and lower bounds on the competitive ratio of these problems, described below.
Online packing of rectangles: We are given a sequence of axis-parallel rectangles online and we have to irrevocably place them on the plane so as to minimize the area or perimeter of their axis-parallel bounding box.
Translational packing of convex polygons: We are given a sequence of convex polygons online and we have to irrevocably place them in a given container on the plane without rotating them. The goal is to use as little container space as possible. We prove a surprising superconstant lower bound on the competitive ratio for several well-studied container types.
Besides packing, we study hash functions.
Locally uniform hashing: We design Tornado tabulation, a new tabulation-based hash function that aims at filling the gap between (unrealistic) fully-random hashing often used in the analysis of algorithms and practical hash functions with weak theoretical guarantees. In theory, Tornado has strong distributional properties, ensuring that its keys enjoy a certain local full randomness. Moreover, Tornado is practical and it is implementable in a few lines of C code.
Finally, we include a result on clustering.
Local-search seeding for k-means: The most popular heuristic for k-means clustering, Lloyd’s algorithm, inherits its theoretical guarantees from the seeding strategy employed to initialize cluster centers. We design a local-search algorithm to initialize cluster centers that improves the approximation ratios of previous seeding strategies. We claim that our algorithm is practical and include experiments to support its effectiveness.
Locally uniform hashing: We design Tornado tabulation, a new tabulation-based hash function that aims at filling the gap between (unrealistic) fully-random hashing often used in the analysis of algorithms and practical hash functions with weak theoretical guarantees. In theory, Tornado has strong distributional properties, ensuring that its keys enjoy a certain local full randomness. Moreover, Tornado is practical and it is implementable in a few lines of C code.Sum estimation via weighted sampling: We are given a set U of items and each item u ∈ U has a weight w(u). We can sample elements from U with probability proportional to their weight. Our task is to estimate the combined weight P u∈U w(u) using as few samples as possibleSum estimation via weighted sampling: We are given a set U of items and each item u ∈ U has a weight w(u). We can sample elements from U with probability proportional to their weight. Our task is to estimate the combined weight P u∈U w(u) using as few samples as possibleSum estimation via weighted sampling: We are given a set U of items and each item u ∈ U has a weight w(u). We can sample elements from U with probability proportional to their weight. Our task is to estimate the combined weight P u∈U w(u) using as few samples as possibleSum estimation via weighted sampling: We are given a set U of items and each item u ∈ U has a weight w(u). We can sample elements from U with probability proportional to their weight. Our task is to estimate the combined weight P u∈U w(u) using as few samples as possibleSum estimation via weighted sampling: We are given a set U of items and each item u ∈ U has a weight w(u). We can sample elements from U with probability proportional to their weight. Our task is to estimate the combined weight P u∈U w(u) using as few samples as possibleEarth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.Earth Mover’s Distance: We are given two distribution μ, ν over a generic metric space (M, d). A transport plan between μ and ν is a distribution ξ over M2 that has μ and ν as its marginals. We define the cost of a transport plan ξ as E(x,y)∼ξ [d(x, y)]. Our task is to compute the minimum cost of transport plans between μ and ν.
Original languageEnglish
PublisherDepartment of Computer Science, Faculty of Science, University of Copenhagen
Number of pages240
Publication statusPublished - 2024

ID: 399069255