论文标题

单词“自我网络”中的结构不变和语义指纹

Structural invariants and semantic fingerprints in the "ego network" of words

论文作者

Ollivier, Kilian, Boldrini, Chiara, Passarella, Andrea, Conti, Marco

论文摘要

来自人类学的认知模型良好的认知模型表明,由于限制了我们对社会互动的“带宽”的认知约束,人类根据常规结构组织其社会关系。在这项工作中,我们假设在其他认知过程(例如涉及语言生产的过程)中可以找到类似的规律性。为了调查这一说法,我们分析了一个数据集,其中包含一组Twitter用户(常规用户和专业作家)的推文。利用一种类似于揭示良好社会认知约束的方法,我们在结构和语义层面上都发现了规律性。在前者中,我们发现同心分层的结构(我们称之为自我的单词网络,类似于社会关系的自我网络)非常好捕捉了个人如何组织他们使用的单词。该结构中的图层的大小定期增长(相对于上一个相对于前一个时,大约2-3倍),而两个倒数第二个外部层始终占使用的单词的大约60%和30%,而与用户层总数的数量无关。对于语义分析,每个自我网络的每个环都由语义曲线描述,该语义轮廓捕获了与环中单词相关的主题。我们发现Ring#1在模型中具有特殊的作用。从语义上讲,它是指环中最不同和最多样化的。我们还表明,在最终戒指中重要的主题也具有在其他环以及整个自我网络中占主导地位的特征。在这方面,环#1可以看作是自我单词网络的语义指纹。

Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our "bandwidth" for social interactions, humans organize their social relations according to a regular structure. In this work, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). Leveraging a methodology similar to the one used to uncover the well-established social cognitive constraints, we find regularities at both the structural and semantic level. At the former, we find that a concentric layered structure (which we call ego network of words, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use. The size of the layers in this structure regularly grows (approximately 2-3 times with respect to the previous one) when moving outwards, and the two penultimate external layers consistently account for approximately 60% and 30% of the used words, irrespective of the number of the total number of layers of the user. For the semantic analysis, each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that ring #1 has a special role in the model. It is semantically the most dissimilar and the most diverse among the rings. We also show that the topics that are important in the innermost ring also have the characteristic of being predominant in each of the other rings, as well as in the entire ego network. In this respect, ring #1 can be seen as the semantic fingerprint of the ego network of words.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源