论文标题
关于“对撞机偏见破坏我们对COVID-19疾病风险和严重程度的理解”以及因果贝叶斯网络如何暴露和解决问题的说明
A note on 'Collider bias undermines our understanding of COVID-19 disease risk and severity' and how causal Bayesian networks both expose and resolve the problem
论文作者
论文摘要
Griffith等人的最新预印象强调了Covid19研究中的“撞机偏见”如何破坏了我们对疾病风险和严重程度的理解。这通常是由于仅限于接受过CoVID19测试的人的数据而引起的,其中医护人员的代表人数过多。例如,由于吸烟者在数据集中的代表性不足而引起的对撞机偏见可能(至少部分)解释了经验结果,这表明吸烟会降低COVID19的风险。我们扩展了Griffith等人的工作,更明确地使用图形因果模型来解释观察到的数据。我们表明,使用具有逼真的数据和假设的贝叶斯网络模型可以澄清和改进他们的吸烟示例。我们表明,“压力”等风险因素存在一个更根本的问题,与吸烟不同,在医疗保健工作者中,它不是较少的普遍性。在这种情况下,由于偏见数据集的撞机偏见以及“医疗保健工作者”是一个令人困惑的变量,因此研究可能会错误地得出结论,压力会减少而不是增加COVID19的风险。确实,“与Covid19人保持密切联系”降低了Covid19的风险。为了避免这种潜在的错误结论,对观察数据的任何分析都必须考虑到包括攻针和混杂因素在内的基本因果结构。如果分析师没有明确执行此操作,那么他们就特定风险因素对Covid19的影响得出的任何结论都可能存在缺陷。
An important recent preprint by Griffith et al highlights how 'collider bias' in studies of COVID19 undermines our understanding of the disease risk and severity. This is typically caused by the data being restricted to people who have undergone COVID19 testing, among whom healthcare workers are overrepresented. For example, collider bias caused by smokers being underrepresented in the dataset may (at least partly) explain empirical results that suggest smoking reduces the risk of COVID19. We extend the work of Griffith et al making more explicit use of graphical causal models to interpret observed data. We show that their smoking example can be clarified and improved using Bayesian network models with realistic data and assumptions. We show that there is an even more fundamental problem for risk factors like 'stress' which, unlike smoking, is more rather than less prevalent among healthcare workers; in this case, because of a combination of collider bias from the biased dataset and the fact that 'healthcare worker' is a confounding variable, it is likely that studies will wrongly conclude that stress reduces rather than increases the risk of COVID19. Indeed, "being in close contact with COVID19 people" reduces the risk of COVID19. To avoid such potentially erroneous conclusions, any analysis of observational data must take account of the underlying causal structure including colliders and confounders. If analysts fail to do this explicitly then any conclusions they make about the effect of specific risk factors on COVID19 are likely to be flawed.