论文标题
丰富的食谱表示作为支持食谱内容和准备过程中表达性多模态查询的计划
A Rich Recipe Representation as Plan to Support Expressive Multi Modal Queries on Recipe Content and Preparation Process
论文作者
论文摘要
食物不仅是人类的基本必要性,而且是推动社会健康和经济福祉的关键因素。结果,烹饪域是一种流行的用例,可以证明决策支持(AI)功能,以服务于Precision Health的功能,具有从信息检索接口到面向任务的聊天机器人的工具。这里的AI应该理解食物领域(例如食谱,成分)中的概念,耐受烹饪时遇到的失败(例如,黄油的褐变),处理基于过敏的替代品,并使用多种数据方式(例如,文本和图像)。但是,当今的食谱被处理为文本文档,这使机器难以阅读,推理和处理歧义。这需要更好地表示食谱,克服当前文本文档中存在的歧义和稀疏性。在本文中,我们讨论了以计划的形式构建一种可理解的丰富食谱代表(R3),从自然语言中可用的食谱中。 R3注入了其他知识,例如有关过敏原和成分图像的信息,每个原子烹饪步骤的可能失败和技巧。为了显示R3的好处,我们还提出了Treat,这是一种食谱检索工具,它使用R3对食谱的内容(计划对象 - 成分和烹饪工具),食物准备过程(计划动作和时间)以及媒体类型(图像,文本)执行多模式推理。 R3导致提高检索效率和新功能,这些功能在文本表示中是不可能的。
Food is not only a basic human necessity but also a key factor driving a society's health and economic well-being. As a result, the cooking domain is a popular use-case to demonstrate decision-support (AI) capabilities in service of benefits like precision health with tools ranging from information retrieval interfaces to task-oriented chatbots. An AI here should understand concepts in the food domain (e.g., recipes, ingredients), be tolerant to failures encountered while cooking (e.g., browning of butter), handle allergy-based substitutions, and work with multiple data modalities (e.g. text and images). However, the recipes today are handled as textual documents which makes it difficult for machines to read, reason and handle ambiguity. This demands a need for better representation of the recipes, overcoming the ambiguity and sparseness that exists in the current textual documents. In this paper, we discuss the construction of a machine-understandable rich recipe representation (R3), in the form of plans, from the recipes available in natural language. R3 is infused with additional knowledge such as information about allergens and images of ingredients, possible failures and tips for each atomic cooking step. To show the benefits of R3, we also present TREAT, a tool for recipe retrieval which uses R3 to perform multi-modal reasoning on the recipe's content (plan objects - ingredients and cooking tools), food preparation process (plan actions and time), and media type (image, text). R3 leads to improved retrieval efficiency and new capabilities that were hither-to not possible in textual representation.