论文标题
sqrquerier:跨国调查数据回收的视觉查询框架
SQRQuerier: A Visual Querying Framework for Cross-national Survey Data Recycling
论文作者
论文摘要
公众舆论调查构成了研究人们在比较角度研究人们的态度和行为的有力工具。但是,即使是全球调查,也仅提供部分地理和时间覆盖范围,这阻碍了全面的知识生产。为了扩大比较的范围,社会科学家转向涵盖类似主题但在不同人群和/或数年的数据集的变量的事前协调。可以将最终的新数据集作为单个来源分析,可以通过许多数据门户灵活访问。但是,此类门户几乎没有提供指导来探索具有用户注定需求的深入或查询数据。结果,社会科学家要为其研究有效识别相关数据并根据切片数据评估其理论模型仍然具有挑战性。为了克服它们,在调查数据回收(SDR)国际合作研究项目中,我们提出了SDRQUERIER,并将其应用于统一的SDR数据库,该数据库中有超过200万受访者在总共1,721个国家调查中采访了22个著名国际项目的一部分。我们设计了SDRQUER,以解决社会科学家通常面临的三个实用挑战。首先,基于BERT的模型通过研究问题或关键字提供自定义的数据查询。其次,我们提出了一种新的视觉设计,以展示不同级别的统一数据的可用性,从而帮助用户决定是否存在经验数据来解决给定的研究问题。最后,Sdrquerier披露了数据库中实质和方法学变量之间的基本关系模式,以帮助社会科学家严格评估甚至改善其回归模型。通过与多位社会科学家解决他们日常挑战的案例研究,我们证明了Sdrquerier的新颖性,有效性。
Public opinion surveys constitute a powerful tool to study peoples' attitudes and behaviors in comparative perspectives. However, even worldwide surveys provide only partial geographic and time coverage, which hinders comprehensive knowledge production. To broaden the scope of comparison, social scientists turn to ex-post harmonization of variables from datasets that cover similar topics but in different populations and/or years. The resulting new datasets can be analyzed as a single source, which can be flexibly accessed through many data portals. However, such portals offer little guidance to explore the data in-depth or query data with user-customized needs. As a result, it is still challenging for social scientists to efficiently identify related data for their studies and evaluate their theoretical models based on the sliced data. To overcome them, in the Survey Data Recycling (SDR) international cooperation research project, we propose SDRQuerier and apply it to the harmonized SDR database, which features over two million respondents interviewed in a total of 1,721 national surveys that are part of 22 well-known international projects. We design the SDRQuerier to solve three practical challenges that social scientists routinely face. First, a BERT-based model provides customized data queries through research questions or keywords. Second, we propose a new visual design to showcase the availability of the harmonized data at different levels, thus helping users decide if empirical data exist to address a given research question. Lastly, SDRQuerier discloses the underlying relational patterns among substantive and methodological variables in the database, to help social scientists rigorously evaluate or even improve their regression models. Through case studies with multiple social scientists in solving their daily challenges, we demonstrated the novelty, effectiveness of SDRQuerier.