Lecture #17: Dimension reduction (continued)
1. An equivalent view of estimating
Again, you have a data stream of elements , each element drawn from the universe . This stream defines a frequency vector , where is the number of times element is seen. Consider the following algorithm to computing .
Take a (suitably random) hash function . Maintain counter , which starts off at zero. Every time an element comes in, increment the counter . And when queried, we reply with the value .
Hence, having seen the stream that results in the frequency vector , the counter will have the value . Does at least have the right expectation? It does:
And what about the variance? Recall that , so let us calculate
What does Chebyshev say then?
Not that hot: in fact, this is usually more than .
But if we take a collection of such independent counters , and given a query, take their average , and return . The expectation of the average remains the same, but the variance falls by a factor of . And we get
So, our probability of error on any query is at most if we take .
1.1. Hey, those calculations look familiar
Sure. This is just a restatement of what we did in lecture. There we took a matrix and filled with random values—hence each row of corresponds to a hash function from to . And taking rows in the matrix corresponds to the variance reduction step at the end.
1.2. Limited Independence
How much randomness do you need for the hash functions? Indeed, hash functions which are -wise independent suffice for the above proofs to go through. And how does one get a -wise independent hash function? Watch this blog (and the HWs) for details.