Matrivasha：孟加拉语手写复合字符的多功能综合数据库

论文标题

Matrivasha：孟加拉语手写复合字符的多功能综合数据库

MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters

论文作者

Ferdous, Jannatul, Karmaker, Suvrajit, Rabby, A K M Shahariar Azad, Hossain, Syed Akhter

论文摘要

目前，对孟加拉手写复合角色的认识多年来一直是一个必不可少的问题。近年来，已经在机器学习和深度学习方面进行了基于应用程序的研究，这引起了人们的兴趣，最著名的是手写识别，因为它具有巨大的应用，例如Bangla OCR。 Matrrivasha，可以识别孟加拉的项目，手写了几个复合字符。当前，由于其变体应用程序，复合字符识别是一个重要的主题，并有助于创建旧表单和可靠性的信息数字化。但是不幸的是，缺乏全面的数据集，可以将所有类型的孟加拉化合物字符分类。 Matrrivasha是一种试图使复合角色保持一致的尝试，这很具有挑战性，因为每个人都有独特的写作形状。毕竟，Matrrivasha提出了一个数据集，该数据集打算识别Bangla 120（一百二十）复合字符，该字符由2552（两千五百五百五十二）组成，孤立的手写字符书面书面独特作家，这些字符是从孟加拉国内收集的。该数据集在地区，年龄和基于性别的书面相关研究方面面临问题，因为收集了样本，其中包括该地区的真实性，年龄组和相等数量的男性和女性。到目前为止，我们提出的数据集迄今是Bangla化合物字符的最广泛数据集。它旨在构建手写的孟加拉化合物的确认技术。将来，该数据集将公开使用以帮助扩大研究。

At present, recognition of the Bangla handwriting compound character has been an essential issue for many years. In recent years there have been application-based researches in machine learning, and deep learning, which is gained interest, and most notably is handwriting recognition because it has a tremendous application such as Bangla OCR. MatrriVasha, the project which can recognize Bangla, handwritten several compound characters. Currently, compound character recognition is an important topic due to its variant application, and helps to create old forms, and information digitization with reliability. But unfortunately, there is a lack of a comprehensive dataset that can categorize all types of Bangla compound characters. MatrriVasha is an attempt to align compound character, and it's challenging because each person has a unique style of writing shapes. After all, MatrriVasha has proposed a dataset that intends to recognize Bangla 120(one hundred twenty) compound characters that consist of 2552(two thousand five hundred fifty-two) isolated handwritten characters written unique writers which were collected from within Bangladesh. This dataset faced problems in terms of the district, age, and gender-based written related research because the samples were collected that includes a verity of the district, age group, and the equal number of males, and females. As of now, our proposed dataset is so far the most extensive dataset for Bangla compound characters. It is intended to frame the acknowledgment technique for handwritten Bangla compound character. In the future, this dataset will be made publicly available to help to widen the research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题