论文标题
与不同的内核Smoothorth的本地加权回归进行软件努力估算
Locally Weighted Regression with different Kernel Smoothers for Software Effort Estimation
论文作者
论文摘要
数十年来,估算软件工作一直是一个未解决的问题。阻碍构建准确估计模型的主要原因之一是具有复杂结构的软件数据的异质性质。通常,来自本地数据的建筑努力估计模型往往比使用整个数据更准确。先前的研究重点是使用聚类技术和决策树来生成可以帮助构建本地预测模型的本地和连贯数据。但是,由于查找最佳簇和处理嘈杂数据的局限性,这些方法在某些方面可能会缺乏。在本文中,我们使用了一种更复杂的局部方法,可以减轻这些局部加权回归(LWR)的缺点。该方法通过构建一个结合了基于K-Neart-Neghbor的模型中多个局部回归模型的估算模型,提供了一种有效的解决方案,可以从本地数据中学习。影响该方法准确性的主要因素是选择用于推导本地回归模型权重的内核函数的选择。本文研究了选择不同的内核对软件努力估计问题本地加权回归的性能的影响。经过7个数据集,10个内核,3个多项式度和4个带宽值的全面实验,总共有840个局部加权回归变体,我们发现:1)均匀的核函数不能超越非均匀的内核函数,以及2)核型,多项元素和频带效应,无需特定效果。
Estimating software effort has been a largely unsolved problem for decades. One of the main reasons that hinders building accurate estimation models is the often heterogeneous nature of software data with a complex structure. Typically, building effort estimation models from local data tends to be more accurate than using the entire data. Previous studies have focused on the use of clustering techniques and decision trees to generate local and coherent data that can help in building local prediction models. However, these approaches may fall short in some aspect due to limitations in finding optimal clusters and processing noisy data. In this paper we used a more sophisticated locality approach that can mitigate these shortcomings that is Locally Weighted Regression (LWR). This method provides an efficient solution to learn from local data by building an estimation model that combines multiple local regression models in k-nearest-neighbor based model. The main factor affecting the accuracy of this method is the choice of the kernel function used to derive the weights for local regression models. This paper investigates the effects of choosing different kernels on the performance of Locally Weighted Regression of a software effort estimation problem. After comprehensive experiments with 7 datasets, 10 kernels, 3 polynomial degrees and 4 bandwidth values with a total of 840 Locally Weighted Regression variants, we found that: 1) Uniform kernel functions cannot outperform non-uniform kernel functions, and 2) kernel type, polynomial degrees and bandwidth parameters have no specific effect on the estimation accuracy.