论文标题
捕获重新调查的依赖性置信区间
Dependence-robust confidence intervals for capture-recapture surveys
论文作者
论文摘要
捕获征收(CRC)调查用于估计无法直接列举成员的人口的规模。 CRC调查已被用来估计199年感染的数量,使用毒品,性工作者,冲突人员伤亡和贩运受害者的人数。当获得$ K $捕获样品时,样本子集中的单位捕获计数自然用一个$ 2^k $ artingency表代表,其中一个元素(没有样品中出现的个体数量)仍然没有观察到。在没有其他假设的情况下,人口规模无法识别(即被确定的)。关于样本之间依赖性的严格假设通常用于实现点识别。但是,现实世界中的CRC调查通常使用便利样本,在这些方便样本中,无法保证假定的依赖性,并且这些假设下的人口规模估计可能缺乏经验信誉。在这项工作中,我们将部分识别理论应用于表明有关样本之间依赖性性质的薄弱假设或定性知识可用于表征针对真实种群规模的非平凡置信度。我们使用两种方法在成对捕获概率的边界下构建置信集:测试反演引导置信区间和轮廓可能性置信区间。仿真结果证明了每种方法的置信度良好。在一项广泛的现实研究中,我们将新方法应用于使用异质调查数据来估计比利时布鲁塞尔注射药物的人数的问题。
Capture-recapture (CRC) surveys are used to estimate the size of a population whose members cannot be enumerated directly. CRC surveys have been used to estimate the number of Covid-19 infections, people who use drugs, sex workers, conflict casualties, and trafficking victims. When $k$ capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a $2^k$ contingency table in which one element -- the number of individuals appearing in none of the samples -- remains unobserved. In the absence of additional assumptions, the population size is not identifiable (i.e. point-identified). Stringent assumptions about the dependence between samples are often used to achieve point-identification. However, real-world CRC surveys often use convenience samples in which the assumed dependence cannot be guaranteed, and population size estimates under these assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a non-trivial confidence set for the true population size. We construct confidence sets under bounds on pairwise capture probabilities using two methods: test inversion bootstrap confidence intervals, and profile likelihood confidence intervals. Simulation results demonstrate well-calibrated confidence sets for each method. In an extensive real-world study, we apply the new methodology to the problem of using heterogeneous survey data to estimate the number of people who inject drugs in Brussels, Belgium.