论文标题
将个人偏见降低以进行更公平的二进制子正确检测
Denoising individual bias for a fairer binary submatrix detection
论文作者
论文摘要
二进制矩阵的低等级表示功能强大,可以解散稀疏的个体属性关联,并已收到广泛的应用程序。现有的二进制矩阵分解(BMF)或共簇方法(CC)方法通常假设I.I.D背景噪声。但是,在实际数据中可以很容易地违反该假设,其中二进制条目的异质行或列的概率会导致不同的元素背景分布,并且瘫痪了现有方法的合理性。我们提出了一个二进制数据剥夺框架,即绑定,该框架通过估计图案的行或列混合物分布和不同背景的分布来优化真实模式的检测,并消除了更可能来自背景的二进制属性。绑定由行和列混合物分布的彻底得出的数学特性支持。我们对合成和现实世界数据的实验表明结合有效地消除了背景噪声,并大大提高了最先进的艺术BMF和CC方法的公平和准确性。
Low rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results in disparate element-wise background distribution, and paralyzes the rationality of existing methods. We propose a binary data denoising framework, namely BIND, which optimizes the detection of true patterns by estimating the row- or column-wise mixture distribution of patterns and disparate background, and eliminating the binary attributes that are more likely from the background. BIND is supported by thoroughly derived mathematical property of the row- and column-wise mixture distributions. Our experiment on synthetic and real-world data demonstrated BIND effectively removes background noise and drastically increases the fairness and accuracy of state-of-the arts BMF and CC methods.