论文标题
亚种群累积偏差与全部人口的偏差
Cumulative deviation of a subpopulation from the full population
论文作者
论文摘要
评估亚种群治疗的公平性通常涉及将数值“得分”分配给所有人群中的所有个体,以使类似人获得相似的分数;例如,通过倾向得分或适当的协变量匹配是常见的。鉴于这样的分数,分数相似的个体可能会或可能不会取得相似的结果,而与亚种群中的个人成员资格无关。可视化不平等的传统图形方法被称为“可靠性图”或“校准图”,这些图将得分归为所有可能值的分区,并且每个垃圾箱都绘制了亚种群中仅个人的平均结果,以及所有个人的平均成果;将亚种群的图与全部人群的图进行比较,对亚种群的平均值如何偏离了整个人群的平均值。不幸的是,实际数据集仅包含有限的许多观测值,从而限制了垃圾箱的可用分辨率,因此,传统方法可以掩盖由于箱形而导致的重要变化。幸运的是,本文提出的那样,绘制了与整个人群的累积偏差避开了有问题的粗binning。累积图直接编码亚群偏差为图形斜线的斜率。即使割线线的恒定偏移无关紧要,坡度也很容易感知。累积方法避免了融合,从而使亚种群与全部人群的偏差平滑。这种累积聚合提供了高分辨率的图形方法和简单的标量摘要统计数据(类似于Kuiper和Kolmogorov和Smirnov的统计显着性测试,用于比较概率分布)。
Assessing equity in treatment of a subpopulation often involves assigning numerical "scores" to all individuals in the full population such that similar individuals get similar scores; matching via propensity scores or appropriate covariates is common, for example. Given such scores, individuals with similar scores may or may not attain similar outcomes independent of the individuals' memberships in the subpopulation. The traditional graphical methods for visualizing inequities are known as "reliability diagrams" or "calibrations plots," which bin the scores into a partition of all possible values, and for each bin plot both the average outcomes for only individuals in the subpopulation as well as the average outcomes for all individuals; comparing the graph for the subpopulation with that for the full population gives some sense of how the averages for the subpopulation deviate from the averages for the full population. Unfortunately, real data sets contain only finitely many observations, limiting the usable resolution of the bins, and so the conventional methods can obscure important variations due to the binning. Fortunately, plotting cumulative deviation of the subpopulation from the full population as proposed in this paper sidesteps the problematic coarse binning. The cumulative plots encode subpopulation deviation directly as the slopes of secant lines for the graphs. Slope is easy to perceive even when the constant offsets of the secant lines are irrelevant. The cumulative approach avoids binning that smooths over deviations of the subpopulation from the full population. Such cumulative aggregation furnishes both high-resolution graphical methods and simple scalar summary statistics (analogous to those of Kuiper and of Kolmogorov and Smirnov used in statistical significance testing for comparing probability distributions).