论文标题
高级文本分析的代数方法
An Algebraic Approach for High-level Text Analytics
论文作者
论文摘要
文本分析任务(例如单词嵌入,短语挖掘和主题建模)正在提高需求以及对现有数据库管理系统的挑战。 在本文中,我们提供了一种基于关联阵列的新型代数方法。我们的数据模型和代数可以将关系运营商和文本运营商汇总在一起,这为具有关系数据和文本数据的混合数据源提供了有趣的优化机会。我们使用多个现实世界任务在文本分析中演示了其表现力。
Text analytical tasks like word embedding, phrase mining, and topic modeling, are placing increasing demands as well as challenges to existing database management systems. In this paper, we provide a novel algebraic approach based on associative arrays. Our data model and algebra can bring together relational operators and text operators, which enables interesting optimization opportunities for hybrid data sources that have both relational and textual data. We demonstrate its expressive power in text analytics using several real-world tasks.