论文标题
设计和实施自然语言处理的量子内核
Design and Implementation of a Quantum Kernel for Natural Language Processing
论文作者
论文摘要
自然语言处理(NLP)是试图使计算机可以访问人类语言的领域,它依赖于应用数学模型来表达符号语言的含义。一种这样的模型DiscoCat定义了如何表达单个单词及其组成性质的含义。该模型可以在量子计算机上自然实现,从而导致量子NLP(QNLP)。最近的实验工作使用量子机学习技术,使用量子编码的句子的期望值从文本到类标签映射。在计算句子的相似性方面已经完成了理论工作,但依赖于未实现的量子存储器存储。本文的主要目的是利用DiscoCat模型设计基于量子的内核函数,该功能可以由支持向量机(SVM)用于NLP任务。研究了两项相似性措施:(i)过渡幅度方法和(ii)交换测试。一个简单的NLP含义来自先前工作的分类任务用于训练嵌入词并评估这两种模型的性能。 Python模块Lambeq及其相关的软件堆栈用于实现。以前工作的明确模型用于训练单词嵌入,并获得了$ 93.09 \ pm 0.01 $%的测试准确性。结果表明,两种SVM变体的方法均达到了$ 95.72 \ pm 0.01 $%的较高测试准确性(I)和$ 97.14 \ pm 0.01 $%(II)。然后在由真实量子设备IBMQ_GUADALUPE定义的噪声模型下模拟交换测试。显式模型的准确度为$ 91.94 \ pm 0.01 $%,而交换测试SVM在测试数据集上达到了96.7%,这表明内核分类器对噪声有弹性。这些令人鼓舞的结果,并激发了对我们提出的内核QNLP范式的进一步研究。
Natural language processing (NLP) is the field that attempts to make human language accessible to computers, and it relies on applying a mathematical model to express the meaning of symbolic language. One such model, DisCoCat, defines how to express both the meaning of individual words as well as their compositional nature. This model can be naturally implemented on quantum computers, leading to the field quantum NLP (QNLP). Recent experimental work used quantum machine learning techniques to map from text to class label using the expectation value of the quantum encoded sentence. Theoretical work has been done on computing the similarity of sentences but relies on an unrealized quantum memory store. The main goal of this thesis is to leverage the DisCoCat model to design a quantum-based kernel function that can be used by a support vector machine (SVM) for NLP tasks. Two similarity measures were studied: (i) the transition amplitude approach and (ii) the SWAP test. A simple NLP meaning classification task from previous work was used to train the word embeddings and evaluate the performance of both models. The Python module lambeq and its related software stack was used for implementation. The explicit model from previous work was used to train word embeddings and achieved a testing accuracy of $93.09 \pm 0.01$%. It was shown that both the SVM variants achieved a higher testing accuracy of $95.72 \pm 0.01$% for approach (i) and $97.14 \pm 0.01$% for (ii). The SWAP test was then simulated under a noise model defined by the real quantum device, ibmq_guadalupe. The explicit model achieved an accuracy of $91.94 \pm 0.01$% while the SWAP test SVM achieved 96.7% on the testing dataset, suggesting that the kernelized classifiers are resilient to noise. These are encouraging results and motivate further investigations of our proposed kernelized QNLP paradigm.