Poster
Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures
Alina Ene · Alessandro Epasto · Vahab Mirrokni · Hoai-An Nguyen · Huy Nguyen · David Woodruff · Peilin Zhong
West Exhibition Hall B2-B3 #W-1015
Modern datasets often involve enormous amounts of information arriving over time — think of monitoring website visits, social media posts, or network activity. In these settings, it's crucial to make fast, smart decisions using limited memory. This paper tackles the classic maximum coverage problem: imagine you're allowed to pick a few groups (like news sources) to follow, and you want to cover as many different topics (items) as possible. We design the first algorithm that solves this problem in a challenging setting where the data is constantly changing — items can be added or removed from groups — and the algorithm must update its decisions quickly and with very little memory.Beyond this, we apply our techniques to a timely and important task: figuring out which features in a dataset could make individuals easy to identify. This is vital for protecting privacy in fields like healthcare or finance. Our approach leads to a massive speedup — up to 210 times faster — compared to previous solutions.