论文标题

经济学家的合成数据生成

Synthetic Data Generation for Economists

论文作者

Koenecke, Allison, Varian, Hal

论文摘要

随着越来越多的科技公司进行严格的经济分析,我们面临数据问题:由于使用敏感,专有或私人数据,因此无法复制内部论文。读者可以假设所掩盖的真实数据(例如,内部的Google信息)确实产生了给定的结果,或者他们必须寻找可比的公共面向公众的数据(例如Google趋势),从而产生相似的结果。改善这个可重复性问题的一种方法是让研究人员根据其真实数据发布合成数据集;这允许外部各方复制内部研究人员的方法。在此简要概述中,我们探索了高水平的合成数据生成,以进行经济分析。

As more tech companies engage in rigorous economic analyses, we are confronted with a data problem: in-house papers cannot be replicated due to use of sensitive, proprietary, or private data. Readers are left to assume that the obscured true data (e.g., internal Google information) indeed produced the results given, or they must seek out comparable public-facing data (e.g., Google Trends) that yield similar results. One way to ameliorate this reproducibility issue is to have researchers release synthetic datasets based on their true data; this allows external parties to replicate an internal researcher's methodology. In this brief overview, we explore synthetic data generation at a high level for economic analyses.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源