论文标题
轨迹的无监督嵌入捕获了科学迁移的潜在结构
Unsupervised embedding of trajectories captures the latent structure of scientific migration
论文作者
论文摘要
人类的移民和流动性推动了主要的社会现象,包括流行病,经济,创新以及思想的传播。尽管人类的流动性和迁移受到整个历史上的地理距离的严重限制,但进步和全球化正在使语言和文化等其他因素越来越重要。最初是为自然语言设计的神经嵌入模型的进步,为驯服这种复杂性和开放的新途径提供了一个机会。在这里,我们演示了模型Word2Vec编码来自迁移轨迹的离散位置之间细微的关系的能力,从而产生准确,密集,连续和有意义的矢量空间表示。最终的表示形式提供了位置之间的功能距离,以及可以分布,重复使用和自身询问的数字双重距离,以了解迁移的许多方面。我们表明,Word2Vec编码迁移模式的独特功能源于其数学等效性与移动性重力模型。在专注于科学迁移的情况下,我们将Word2Vec应用于一个来自其出版记录中列出的隶属关系的科学家的300万迁移轨迹的数据库。利用利用其语义结构的技术,我们证明嵌入可以学习富含科学迁移的富裕结构,例如文化,语言和声望关系,在多个颗粒状层面上。我们的结果为使用神经嵌入来表示和理解科学内外的迁移提供了理论基础和方法论框架。
Human migration and mobility drives major societal phenomena including epidemics, economies, innovation, and the diffusion of ideas. Although human mobility and migration have been heavily constrained by geographic distance throughout the history, advances and globalization are making other factors such as language and culture increasingly more important. Advances in neural embedding models, originally designed for natural language, provide an opportunity to tame this complexity and open new avenues for the study of migration. Here, we demonstrate the ability of the model word2vec to encode nuanced relationships between discrete locations from migration trajectories, producing an accurate, dense, continuous, and meaningful vector-space representation. The resulting representation provides a functional distance between locations, as well as a digital double that can be distributed, re-used, and itself interrogated to understand the many dimensions of migration. We show that the unique power of word2vec to encode migration patterns stems from its mathematical equivalence with the gravity model of mobility. Focusing on the case of scientific migration, we apply word2vec to a database of three million migration trajectories of scientists derived from the affiliations listed on their publication records. Using techniques that leverage its semantic structure, we demonstrate that embeddings can learn the rich structure that underpins scientific migration, such as cultural, linguistic, and prestige relationships at multiple levels of granularity. Our results provide a theoretical foundation and methodological framework for using neural embeddings to represent and understand migration both within and beyond science.