论文标题
部分可观测时空混沌系统的无模型预测
Analytical Engines With Context-Rich Processing: Towards Efficient Next-Generation Analytics
论文作者
论文摘要
随着现代数据管道继续收集,生产和存储各种数据格式,从传统和上下文源的来源(例如字符串,文本,视频,音频和日志)中提取和结合价值成为手动过程,在该过程中,此类格式不适合RDBMS。为了利用黑暗数据,域专家分析和提取见解并将其集成到数据存储库中。此过程可能涉及DBM,临时分析以及导致ETL,工程工作和次优性能的处理。尽管基于ML模型的AI系统可以自动化分析过程,但它们通常会进一步生成上下文富裕的答案。使用多个真理来源,要么以训练模型或以知识库的形式训练,进一步加剧了巩固感兴趣数据的问题。 我们设想了一个分析引擎与启用上下文分析的组件进行了优化。首先,由于无法提前清洁来自不同来源的数据或模型答案产生的数据,因此我们建议通过模型辅助相似性操作使用在线数据集成。其次,我们旨在在基于关系和模型的运营商之间进行整体管道成本和基于规则的优化。第三,随着从传统关系分析到生成模型推断的越来越多的异质硬件和同样异质的工作负载,我们设想了一个即时适应复杂的分析查询要求的系统。为了解决越来越复杂的分析问题,ML提供了有吸引力的解决方案,这些解决方案必须与传统的分析处理相结合,并从数十年的数据库社区研究中受益,以使最终用户轻松实现可伸缩性和绩效。
As modern data pipelines continue to collect, produce, and store a variety of data formats, extracting and combining value from traditional and context-rich sources such as strings, text, video, audio, and logs becomes a manual process where such formats are unsuitable for RDBMS. To tap into the dark data, domain experts analyze and extract insights and integrate them into the data repositories. This process can involve out-of-DBMS, ad-hoc analysis, and processing resulting in ETL, engineering effort, and suboptimal performance. While AI systems based on ML models can automate the analysis process, they often further generate context-rich answers. Using multiple sources of truth, for either training the models or in the form of knowledge bases, further exacerbates the problem of consolidating the data of interest. We envision an analytical engine co-optimized with components that enable context-rich analysis. Firstly, as the data from different sources or resulting from model answering cannot be cleaned ahead of time, we propose using online data integration via model-assisted similarity operations. Secondly, we aim for a holistic pipeline cost- and rule-based optimization across relational and model-based operators. Thirdly, with increasingly heterogeneous hardware and equally heterogeneous workloads ranging from traditional relational analytics to generative model inference, we envision a system that just-in-time adapts to the complex analytical query requirements. To solve increasingly complex analytical problems, ML offers attractive solutions that must be combined with traditional analytical processing and benefit from decades of database community research to achieve scalability and performance effortless for the end user.