CountSketches, Feature Hashing and the Median of Three
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Dokumenter
- Fulltext
Forlagets udgivne version, 450 KB, PDF-dokument
In this paper, we revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector v to a vector of dimension (2t - 1)s, where t, s > 0 are integer parameters. It is known that a CountSketch allows estimating coordinates of v with variance bounded by parallel to v parallel to/s. For t > 1, the estimator takes the median of 2t - 1 independent estimates, and the probability that the estimate is off by more than 2 parallel to v parallel to/root s is exponentially small in t. This suggests choosing t to be logarithmic in a desired inverse failure probability. However, implementations of CountSketch often use a small, constant t. Previous work only predicts a constant factor improvement in this setting. Our main contribution is a new analysis of CountSketch, showing an improvement in variance to O(min{parallel to v parallel to, parallel to v parallel to/s}) when t > 1. That is, the variance decreases proportionally to s, asymptotically for large enough s.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 38 th International Conference on Machine Learning |
Redaktører | M Meila, T Zhang |
Forlag | PMLR |
Publikationsdato | 2021 |
Sider | 6011-6020 |
Status | Udgivet - 2021 |
Begivenhed | 38th International Conference on Machine Learning (ICML) - Virtual Varighed: 18 jul. 2021 → 24 jul. 2021 |
Konference
Konference | 38th International Conference on Machine Learning (ICML) |
---|---|
By | Virtual |
Periode | 18/07/2021 → 24/07/2021 |
Navn | Proceedings of Machine Learning Research |
---|---|
Vol/bind | 139 |
ISSN | 2640-3498 |
Links
- https://proceedings.mlr.press/v139/
Forlagets udgivne version
ID: 301135523