HyperLogLog – what is it?

Calculating the exact cardinality of a multiset requires an amount of memory proportional to the cardinality. When you are dealing with a large dataset, the memory is going to be a bounding factor. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, at the cost of obtaining only an approximation of the cardinality.