论文标题
多项式空间中的高维几何流
High-Dimensional Geometric Streaming in Polynomial Space
论文作者
论文摘要
许多现有用于流数学数据分析的现有算法受到空间复杂性的指数依赖性的困扰,这对于处理高维数据集是不希望的。特别是,一旦$ d \ geq \ log n $,尽管自从[AHV04]自[AHV04]以来,诸如维持凸船体和Löwner-John椭圆形的问题等问题,例如保持凸船体和Löwner-John椭圆形的问题,但没有已知的非平凡流算法。 我们通过用$ \ mathrm {poly}(d,\ log n)$ factor扭曲来交易,同时将这些结果改善为$ \ mathrm {poly}(d,\ log n)$ strace。我们通过设计第一个流媒体算法来实现这些结果,该算法用于维护$ \ ell_ \ infty $ undty $ usppace嵌入,并带有$ \ mathrm {poly}(d,\ log n)$ space和$ \ mathrm {poly}(d,poly}(d,\ log n)$ santfort。我们的算法在\ emph {在线内核}模型中也提供了类似的保证。一路上,我们通过用$ \ log n $依赖性替换日志条件编号依赖性来提高在线数值线性代数的结果,从而回答了[BDM+20]的问题。我们的技术在杠杆分数,数值线性代数中的基本对象和计算几何形状之间提供了一种新颖的联系。 对于$ \ ell_p $子空间嵌入,我们在空间和一通流媒体算法之间几乎提供了最佳的权衡。例如,我们使用$ o(d^2 \ log n)$ space和$ o(((d \ log n)^{1/2-1/p})$ p> 2 $畸变,而先前的确定性算法则相反,我们在$ p> 2 $中给出了确定性核心。 我们的技术在离线环境中具有影响,在该设置中,我们在空间复杂性和子空间草图数据结构的变形之间给出了最佳的权衡。为此,我们给出了[LT80]的“密度变化”定理的基本证明,并使其成为算法。
Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once $d\geq\log n$, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and Löwner-John ellipsoids of $n$ points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to $\mathrm{poly}(d,\log n)$ bits of space by trading off with a $\mathrm{poly}(d,\log n)$ factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for $\ell_\infty$ subspace embeddings with $\mathrm{poly}(d,\log n)$ space and $\mathrm{poly}(d,\log n)$ distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a $\log n$ dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For $\ell_p$ subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using $O(d^2\log n)$ space and $O((d\log n)^{1/2-1/p})$ distortion for $p>2$, whereas previous deterministic algorithms incurred a $\mathrm{poly}(n)$ factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.