论文标题
从微观XTS中提取可行的信息
Extracting actionable information from microtexts
论文作者
论文摘要
诸如Twitter之类的微博代表了强大的信息来源。这些信息的一部分可以汇总到单个帖子级别之外。这些汇总的信息中的一些是指为电子政务,公共安全或其他公共利益而应采取或应该采取的事件。此外,如果汇总的话,大量此信息可以以非平凡的方式补充现有信息网络。本论文提出了一种半自动方法,用于提取有效的可行信息,以实现此目的。首先,我们表明,内域和跨域场景都可以预测事件的时间。其次,我们建议一种方法,可以促进与分析师上下文相关性的定义,并使用此定义来分析新数据。最后,我们提出了一种将基于机器学习的相关信息分类方法与基于规则的信息分类技术集成的方法,以对微观Xt进行分类。自本研究项目的第一天以来,完全自动化的微电视分析一直是我们的目标。我们朝这个方向的努力向我们介绍了可以实现这种自动化的程度。我们主要首先开发了一种自动化方法,然后我们通过以自动化方法的各个步骤整合人类干预来扩展和改进它。我们的经验证实,先前的工作表明,设计,实现或评估信息系统中精心设计的人类干预或贡献可以改善其性能或实现其实现。当我们的研究和结果指导我们实现其必要性和价值时,我们受到了以前在设计人类参与方面的研究的启发,并定制了我们从人类投入中受益的方法。
Microblogs such as Twitter represent a powerful source of information. Part of this information can be aggregated beyond the level of individual posts. Some of this aggregated information is referring to events that could or should be acted upon in the interest of e-governance, public safety, or other levels of public interest. Moreover, a significant amount of this information, if aggregated, could complement existing information networks in a non-trivial way. This dissertation proposes a semi-automatic method for extracting actionable information that serves this purpose. First, we show that predicting time to event is possible for both in-domain and cross-domain scenarios. Second, we suggest a method which facilitates the definition of relevance for an analyst's context and the use of this definition to analyze new data. Finally, we propose a method to integrate the machine learning based relevant information classification method with a rule-based information classification technique to classify microtexts. Fully automatizing microtext analysis has been our goal since the first day of this research project. Our efforts in this direction informed us about the extent this automation can be realized. We mostly first developed an automated approach, then we extended and improved it by integrating human intervention at various steps of the automated approach. Our experience confirms previous work that states that a well-designed human intervention or contribution in design, realization, or evaluation of an information system either improves its performance or enables its realization. As our studies and results directed us toward its necessity and value, we were inspired from previous studies in designing human involvement and customized our approaches to benefit from human input.