大脑：使智力和意识的多模式形式化

论文标题

大脑：使智力和意识的多模式形式化

Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness

论文作者

Liang, Paul Pu

论文摘要

拥有丰富的多模式内在语言是人类智力的重要组成部分，它可以实现多种必要的核心认知功能，例如多模式预测，翻译和产生。在有意识的Turing Machine（CTM）的基础上，这是Blum和Blum（2021）提出的意识的机器模型，我们描述了一种称为Brainish的多模式语言的Desiderata，其中包含单词，图像，音频和感觉与CTM处理器相互通信的表示形式结合在一起。我们在通过多模式人工智能的镜头进行操作之前定义了大脑的语法和语义，这是一个充满活力的研究领域，研究了处理和关联异质信号信息所需的计算工具。我们学习的一般框架涉及设计（1）单模式编码以分段并表示单型数据，（2）一个协调的代表空间，该空间与多模式输入之间的整体含义相关联，（3）解码器以将多态表示的解码器（以构图（用于融合）或生成一代（for for Adenation）或生成一代。通过讨论为了在CTM中实现意识以及实施简单版本的大脑和评估其对多模式预测和检索任务的能力，在几种真实世界图像，文本和音频数据集上表现出智能的能力，我们认为，这种内在语言对机器模型和意识的重要性很重要。

Having a rich multimodal inner language is an important component of human intelligence that enables several necessary core cognitive functions such as multimodal prediction, translation, and generation. Building upon the Conscious Turing Machine (CTM), a machine model for consciousness proposed by Blum and Blum (2021), we describe the desiderata of a multimodal language called Brainish, comprising words, images, audio, and sensations combined in representations that the CTM's processors use to communicate with each other. We define the syntax and semantics of Brainish before operationalizing this language through the lens of multimodal artificial intelligence, a vibrant research area studying the computational tools necessary for processing and relating information from heterogeneous signals. Our general framework for learning Brainish involves designing (1) unimodal encoders to segment and represent unimodal data, (2) a coordinated representation space that relates and composes unimodal features to derive holistic meaning across multimodal inputs, and (3) decoders to map multimodal representations into predictions (for fusion) or raw data (for translation or generation). Through discussing how Brainish is crucial for communication and coordination in order to achieve consciousness in the CTM, and by implementing a simple version of Brainish and evaluating its capability of demonstrating intelligence on multimodal prediction and retrieval tasks on several real-world image, text, and audio datasets, we argue that such an inner language will be important for advances in machine models of intelligence and consciousness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题