数学方程式的文本描述

论文标题

数学方程式的文本描述

Textual Description for Mathematical Equations

论文作者

Mondal, Ajoy, Jawahar, C. V.

论文摘要

由于数学符号和表达式的巨大变化，文档图像中数学表达式或方程式的读数非常具有挑战性。在本文中，我们将数学方程式构成读取，这是生成文本描述的任务，该文本描述解释了该方程的内部含义。受到计算机视觉中自然图像字幕问题的启发，我们提出了一种数学方程描述（MED）模型，这是一种基于可训练的端到端可训练的深度神经网络方法，该方法学会生成用于阅读数学方程图像的文本描述。我们的MED模型由一个卷积神经网络作为编码器组成，该编码器提取输入数学方程图像的特征和带有注意机制的经常性神经网络，该机制生成了与输入数学方程式图像有关的描述。由于数学方程式图像数据集及其文本描述不可用，因此我们为实验目的生成了两个数据集。为了验证我们的MED模型的有效性，我们进行了一个真实的实验，以查看学生是否仅通过阅读或听他们的文本描述来编写方程式。实验得出的结论是，学生只能通过阅读文本描述来正确编写大多数方程式。

Reading of mathematical expression or equation in the document images is very challenging due to the large variability of mathematical symbols and expressions. In this paper, we pose reading of mathematical equation as a task of generation of the textual description which interprets the internal meaning of this equation. Inspired by the natural image captioning problem in computer vision, we present a mathematical equation description (MED) model, a novel end-to-end trainable deep neural network based approach that learns to generate a textual description for reading mathematical equation images. Our MED model consists of a convolution neural network as an encoder that extracts features of input mathematical equation images and a recurrent neural network with attention mechanism which generates description related to the input mathematical equation images. Due to the unavailability of mathematical equation image data sets with their textual descriptions, we generate two data sets for experimental purpose. To validate the effectiveness of our MED model, we conduct a real-world experiment to see whether the students are able to write equations by only reading or listening their textual descriptions or not. Experiments conclude that the students are able to write most of the equations correctly by reading their textual descriptions only.

下载PDF全文

下载文献需遵守相关版权规定

论文标题