论文标题

二进制回归与实例不同的成本:使用影响曲线的评估

Binarised Regression with Instance-Varying Costs: Evaluation using Impact Curves

论文作者

Dirks, Matthew, Poole, David

论文摘要

存在许多评估方法,每种方法都用于特定的预测任务,并且通常执行许多预测任务,包括分类和回归。在二进制回归中,二进制决策是从学习的回归模型(或实现因变量)中产生的,当应预测正面或负面的实例之间的划分取决于效用时,这很有用。例如,在采矿中,有价值的岩石和废岩之间的边界取决于各种金属的市场价格,随着时间的流逝而异。本文提出了影响曲线,以通过实例变化的成本评估二进制回归,其中某些实例比其他实例更差为正(或负面);例如,即使矿山希望保持两者盈利,扔掉高级金岩的扔掉比中级铜岩更糟。我们展示了如何为各种领域构建影响曲线,包括医疗保健,采矿和娱乐的示例。影响曲线优化所选效用函数所有实用程序的二进制决策,确定一个模型可能比另一个模型相比的条件,并定量评估竞争模型之间的改进。

Many evaluation methods exist, each for a particular prediction task, and there are a number of prediction tasks commonly performed including classification and regression. In binarised regression, binary decisions are generated from a learned regression model (or real-valued dependent variable), which is useful when the division between instances that should be predicted positive or negative depends on the utility. For example, in mining, the boundary between a valuable rock and a waste rock depends on the market price of various metals, which varies with time. This paper proposes impact curves to evaluate binarised regression with instance-varying costs, where some instances are much worse to be classified as positive (or negative) than other instances; e.g., it is much worse to throw away a high-grade gold rock than a medium-grade copper-ore rock, even if the mine wishes to keep both because both are profitable. We show how to construct an impact curve for a variety of domains, including examples from healthcare, mining, and entertainment. Impact curves optimize binary decisions across all utilities of the chosen utility function, identify the conditions where one model may be favoured over another, and quantitatively assess improvement between competing models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源