论文标题
Masakhane-非洲的机器翻译
Masakhane -- Machine Translation For Africa
论文作者
论文摘要
非洲有2000多种语言。尽管如此,非洲语言是自然语言处理(NLP)的一小部分可用资源和出版物。这是由于多种因素所致,包括:缺乏政府和资金的重点,可发现性,缺乏社区,纯粹的语言复杂性,难以复制论文以及没有比较技术的基准测试。为了开始解决已确定的问题,成立了非洲语言机器翻译的开源,整个大陆,在线研究工作的Masakhane。在本文中,我们讨论了建立社区和刺激非洲大陆研究的方法,并在解决影响非洲NLP的确定问题方面概述了社区的成功。
Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To begin to address the identified problems, MASAKHANE, an open-source, continent-wide, distributed, online research effort for machine translation for African languages, was founded. In this paper, we discuss our methodology for building the community and spurring research from the African continent, as well as outline the success of the community in terms of addressing the identified problems affecting African NLP.