论文标题
蛋白质和单词共享进化的一般机制
General Mechanism of Evolution Shared by Proteins and Words
论文作者
论文摘要
复杂的系统,例如生活和语言,受到进化原则的约束。生物学与语言学之间的类比和比较\ cite {alphafold2,rosettafold,lang_virus,cell语言,教学,教师1,基因语言,蛋白质语言学,蛋白质语言学,词典,pro_dom的语法,复杂性,复杂性,基因组学,基因组学,基因组学,蛋白质模型和蛋白质序列和分析的构造和分析的构造; 进化。但是,到目前为止,尚未提出一般数学公式来阐明生活和语言共享的定量标志的起源。在这里,我们展示了蛋白质和单词共享的几种新的统计关系,这些关系激发了我们建立一种具有明确表述的一般进化机制,可以纳入旧特征和新特征。我们发现,可以通过最少的精力来确定在进化中生存的序列变化的原则来量化自然选择。此外,也可以通过引入功能连接网络来解释权力法行为的起源以及环境中的变化如何刺激新蛋白质和单词的出现。我们的结果不仅证明了遗传学和语言学对其不同层次结构之间的对应关系,还表明了复杂自适应系统演变的新基本物理特性。我们预计我们的统计检验可以用作定量标准,以检查序列的演化理论是否与真实数据的规律性一致。同时,它们的信件扩大了交换现有知识,刺激新解释的桥梁,并打开了潘多拉的盒子,以释放一些潜在的革命性挑战。例如,语言任意性是否与结构决定功能的教条冲突?
Complex systems, such as life and languages, are governed by principles of evolution. The analogy and comparison between biology and linguistics\cite{alphafold2, RoseTTAFold, lang_virus, cell language, faculty1, language of gene, Protein linguistics, dictionary, Grammar of pro_dom, complexity, genomics_nlp, InterPro, language modeling, Protein language modeling} provide a computational foundation for characterizing and analyzing protein sequences, human corpora, and their evolution. However, no general mathematical formula has been proposed so far to illuminate the origin of quantitative hallmarks shared by life and language. Here we show several new statistical relationships shared by proteins and words, which inspire us to establish a general mechanism of evolution with explicit formulations that can incorporate both old and new characteristics. We found natural selection can be quantified via the entropic formulation by the principle of least effort to determine the sequence variation that survives in evolution. Besides, the origin of power law behavior and how changes in the environment stimulate the emergence of new proteins and words can also be explained via the introduction of function connection network. Our results demonstrate not only the correspondence between genetics and linguistics over their different hierarchies but also new fundamental physical properties for the evolution of complex adaptive systems. We anticipate our statistical tests can function as quantitative criteria to examine whether an evolution theory of sequence is consistent with the regularity of real data. In the meantime, their correspondence broadens the bridge to exchange existing knowledge, spurs new interpretations, and opens Pandora's box to release several potentially revolutionary challenges. For example, does linguistic arbitrariness conflict with the dogma that structure determines function?