基于Kurdish Sorani和Kurmanji的N-Gram模型的下一个单词预测

论文标题

基于Kurdish Sorani和Kurmanji的N-Gram模型的下一个单词预测

Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

论文作者

Hamarashid, Hozan K., Saeed, Soran A., Rashid, Tarik A.

论文摘要

下一个单词的预测是一种输入技术，它通过向用户选择下一个单词建议选择来简化输入的过程，因为在对话中打字会耗尽时间。先前的一些研究集中在库尔德语中，包括使用下一个词预测。但是，缺乏库尔德文本语料库提出了挑战。此外，例如，缺乏足够数量的库尔德语言的n-gram，五克是很少使用下一个库尔德单词预测的原因。此外，RSTUDIO软件中几个库尔德字母的不当显示是另一个问题。本文提供了一个库尔德语料库，创建了五个，并为库尔德·索拉尼（Kurdish Sorani）和库尔曼吉（Kurmanji）提供了独特的研究工作。 N-gram模型已用于下一个单词预测，以减少用库尔德语言打字时的时间。此外，关于下一个库尔德单词预测的工作很少。因此，N-Gram模型被用来准确建议文本。为此，使用R编程和RSTUDIO来构建应用程序。该模型准确96.3％。

Next word prediction is an input technology that simplifies the process of typing by suggesting the next word to a user to select, as typing in a conversation consumes time. A few previous studies have focused on the Kurdish language, including the use of next word prediction. However, the lack of a Kurdish text corpus presents a challenge. Moreover, the lack of a sufficient number of N-grams for the Kurdish language, for instance, five grams, is the reason for the rare use of next Kurdish word prediction. Furthermore, the improper display of several Kurdish letters in the Rstudio software is another problem. This paper provides a Kurdish corpus, creates five, and presents a unique research work on next word prediction for Kurdish Sorani and Kurmanji. The N-gram model has been used for next word prediction to reduce the amount of time while typing in the Kurdish language. In addition, little work has been conducted on next Kurdish word prediction; thus, the N-gram model is utilized to suggest text accurately. To do so, R programming and RStudio are used to build the application. The model is 96.3% accurate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题