Estimation by analogy  is the method where the proposed project is compared with earlier projects of similar nature with ample information in project development. The advantage of analogy method  in comparison to other methods is that analogy is based on actual experience. However, it is not very effective  due to non existence of similar projects and the precision of accessible historical data. In , genetic algorithm is applied in analogy to minimize the time involved while selecting the historic projects. Liu et al  has proposed a statistical framework for the elimination of noise and achieve enhancement of results in analogy method. Despite the reality that analogy based method is one of the renowned methods of cost prediction, fuzzy technique using fuzzy numbers is employed to improve the accuracy in many areas like Control Engineering . Fuzzy Logic has been the cynosure of recent important research investigations. During the early nineties, fuzzy logic has assumed its significance in terms of its theoretical approach . Other techniques like fuzzy systems are resourceful in evaluating the effort using two-sided Gaussian membership function , by assigning the accurate degree of compatibility.
Now-a-days agile software development process has become famous in industries and substituting the traditional methods of software development. However, an accurate estimation of effort in this paradigm still remains a dispute in industries. Hence, the industry must be able to estimate the effort necessary for software development using agile methodology efficiently. For this, different techniques like expert opinion, analogy, disaggregation etc. are adopted by researchers and practitioners. But no proper mathematical model exists for this. The existing techniques are ad-hoc and are thus prone to be incorrect. One popular approach of calculating effort of agile projects mathematically is the Story Point Approach (SPA). In this study, an effort has been made to enhance the prediction accuracy of agile softwareeffortestimation process using SPA. For doing this, different types of neural networks (General Regression Neural Network (GRNN), Probabilistic Neural Network (PNN), Group Method of Data Handling (GMDH) Polynomial Neural Network and Cascade-Correlation Neural Network) are used. Finally performance of the models generated using various neural networks are compared and analyzed.
Softwareeffortestimation has been researched and developed both in algorithmic and machine learning method since the 1960s. Estimation based on expert judgement is one of the earliest and most widely used methods. Expert judgement is a process of estimating the software that results from an assessment process conducted by experts who are experienced in software projects. One of the well- known estimation techniques is Planning Poker which is often used in Agile software development methodologies  . There is also a Function Point, an estimation method proposed by  using the function points as a unit of size of the software to be developed. COCOMO or Constructive Cost Model is one of the most popular algorithmic method  . COCOMO I classified three project classes of Organic, Semidetached and Embedded  . The Use Case Point proposed by  estimates the effort of the software with several effort drivers including UCW, UUCW, ECF, TCF. The UCP itself is derived from the Function Points method using 20 or 28 productivity factors. Moreover, there is a regression analysis introduced by  and  which analyzes the relationship between two or more independent and dependent variables. Bayesian Belief Network (BNN) is a method with a causal-relationship approach described as directed acyclic graph. Nodes symbol represent discrete variables or random continuum, and
It means creating estimates for new projects by comparing the new projects to similar projects from the past. As the algorithmic techniques have a disadvantage of the need to calibrate the model. So, the alternative approach is “analogy by estimation”. But it requires considerable amount of computation. This process is much simple. But not all organizations have historical data to satisfactorily use analogy as means of estimation. ISBSG (International Software benchmarking Standards Group) maintains and exploits a repository of International Software Project Metrics to help software and IT business customers with project estimation; risk analysis, productivity, and benchmarking .
The augmentation method extended SMOTE [ 39 ] from classification to regression by attributing class imbalance from the most predictive SEE feature with the following procedures. First, Pearson correlation between each feature and effort is calculated, and the one with the largest correlation is considered to be the most predictive, which is usually a size-related feature or estimation of completion date or effort such as functional size and line of codes. Then, the entire training examples are cast into three classes (i.e. small, medium, and large) with similar amount of data size. For instance, if the functional size has the largest correlation to the effort with its minimum and maximum being 60 and 780 respectively, the entire data set would be divided into 3 parts according to the feature value of functional size as [60,300), [300,540) and [540,780). Finally, the conventional SMOTE was used to generate synthetic projects to small and medium classes to balance the data distribution. The entire data set size was thus increased. These synthetic projects together with the real ones were passed to analogy-based estimation method (i.e. k-NN) for the purpose of getting better prediction performance.
Expert estimation techniques have been widely consented in the software professionals; hence most of the re- search in the last decade has been focused on the expert estimation -. Jorgenson  evoked best practice guidelines and provided suggestions on how to implement them in software organizations. Jorgenson   suggested the practical guidelines for expert judgment based softwareeffortestimation, and Manifest on expert judgement and formal models. Hybrid approach for rule learning, induction, selection and extraction in fuzzy rule based systems was introduced, and the model combines fuzzy rule based system along with Genetic Algo- rithms (GA) and expert judgement automation using Pittsburgh approach . It was suggested to eliminate un- necessary linguistic terms. To increase the performance and reduce complexity of the Fuzzy logic based me- thods such as fuzzy membership function  is used. The GA based feature selection and machine learning me- thods have been used for parameters optimization in softwareeffortestimation . Random prediction is em- ployed for different dataset and Standardized accuracy was evaluated through random prediction . Software defect association mining suggested that high support and confidence levels may not result in higher prediction accuracy and a sufficient number of rules is a precondition for high prediction accuracy . Ensembles of learning machine are applied to improve software cost estimation . Menzies  suggested four kinds min- ing such as algorithm mining, landscape mining, decision mining and discussion mining to predictive modelling. Estimation by analogy can be significantly improved by dynamic selection of nearest neighbors in project data .
Another proposal  is the use of subset selection algorithm based on fuzzy logic for analogysoftwareeffortestimation models. Validation using two established datasets (ISBSG, Desharnais) shows that using fuzzy features subset selection algorithm in analogysoftwareeffortestimation contribute to significant results Another proposal based on same logic is by , who propose a hybrid system with fuzzy logic and estimation by analogy referred as Fuzzy Analogy. COCOMO´81 is used as dataset. The use of fuzzy set supports continuous belongingness (membership) of elements to a given concept (such as small software project)  thus alleviating a dichotomy problem (yes/no)  that caused similar projects having different estimated efforts. Fuzzy logic also improves the interpretability of the model allowing the user to view, evaluate, criticize and adapt the model.
Approaches based on analogy have shown promise in the field of softwareeffortestimation, and its use has increased among researchers in this area . Authors, such as , classify analogy-based technique as a ma- chine learning technique. This technique has been ad- vocated as a potential method for efficient effort estima- tion, since it allows modeling the complexity between the effort and the variables included in the context of the software project (e.g. team data, project data), el- ements which have a relationship that is normally not linear. Wen et al. carried out a systematic review of the literature in which they identified eight types of machine learning techniques. The Case-based reason- ing (CBR) and artificial neural networks (ANN) were the most used techniques for estimating effort, repre- senting 37% and 26%, respectively.
complicating factor is that the respondents may have interpreted pre-defined categories, e.g., analogy-based estimation, differently. We have grouped the estimation methods in three main categories: Expert judgment-based methods, model-based methods and “other”. Model based estimates include formal estimation models such as COCOMO, Use-Case-based estimation, FPA-metrics or other algorithm driven methods. In the category of “other” there are methods that are not “pure” estimation methods, e.g., capacity related and price-to-win-based methods, and methods than can be used in combination with other models (top-down and bottom-up). An overview is presented in Table 3. An ‘X’ in the table indicates this alternative was not an option in the survey. In the McAulay column, we have joined three different software cost model method alternatives of that study. The original study found that 11% applied Function Point Analysis, 2% lines of code based models and 0% Halstead Metrics.
Softwareeffortestimation is one of the major and the important task in the software engineering. It is used for calculating the amount of effort required for the development and the management of the software project. It is one of the challenging task because whenever we estimate the effort for a particular software project we overestimate or underestimate the actual effort required. Resources are very limited in amount so a careful estimation of the effort must be done. Estimates of effort is usually measured in terms of person- months for a software project. Effort estimates must be very accurate because it may lead to financial loss or loss of reputation of the organization. Various algorithmic and non- algorithmic methods are used to estimate the effort of a particular software project.
The IITRI study was significant because it analyzed the results of seven cost models (PRICE-S, two variants of COCOMO, System-3, SPQR/20, SASET, SoftCost-Ada) to eight Ada specific programs. Ada was specifically designed for and is the principal language used in military applications, and more specifically, weapons system software. Weapons system software is different then the normal corporate type of software, commonly known as Management Information System (MIS) software. The major differences between weapons system and MIS software are that weapons system software is real time and uses a high proportion of complex mathematical coding. Up to 1997, DOD mandated Ada as the required language to be used unless a waiver was approved. Lloyd Mosemann stated: The results of this study, like other studies, showed estimating accuracy improved with calibration. The best results were achieved by SEER-SEM model were accurate within 30 percent, 62 percent of the time.
Lava Prasad Kafle  has conducted a case study of 5 companies and reviewed over 50 papers spanning 30 years’ time period. Based on these studies it was suggested that testing data estimations need to be recorded, revised and historical evidence has to be taken into account for avoiding the underruns or over runs in testing. The study found that practically the companies were using the expert judgement and empirical evidence based models for estimatingthe verification and validation testing cost efforts.The observations which were summarized as main findings of the case study are- (a) The test effort are calculated similarly as Total Project effortestimation using expert judgement method and (b) Estimation error of testing effort seems to correlate closely to the estimation error of Total project.Study concluded saying that the companies can further improve the cost effortestimation errors of the verification and validation testing and projects with detailed analysis of their processes and by taking the help of consultants.
Salah satu aspek penting pada perencanaan dan manajemen proyek rancang bangun perangkat lunak adalah mengestimasi biaya sebuah proyek. Beberapa metode telah digunakan untuk mengestimasi biaya suatu proyek perangkat lunak, dan metode Analogy merupakan metode baru yang menghasilkan estimasi relatif akurat. Makalah ini menunjukkan hasil studi penyempurnaan metode Analogy, yang meliputi penentuan model matematis untuk memilih proyek sejenis sebagai acuan estimasi, serta penentuan estimasi effort dan biaya. Studi ini menghasilkan penyempurnaan teknik estimasi biaya dengan parameter biaya lebih lengkap sehingga hasil estimasi relatif lebih baik daripada metode Analogy baku.
Software engineers try to estimate effort & cost with an accuracy in the software industry. The major target of the software engineering community is to develop useful models that can explain precisely predict the effort. There are many empirical equation based softwareestimation models had been developed over the last two- four decades , that were based on efforts estimation like Jones and Software Productivity Research‟s, Checkpoint model, Putnam and Quantitative Software Measurement‟s , SLIM model, Park and PRICE Systems‟ PRICE-S model, Jensen and the SEER SEM model, Rubin and the Estimacs model and Boehm and the COCOMO model [Putnam, 1992, Jones, 1997, Park, 1988, Jensen, 1983, Rubin, 1983, Boehm, 1981, Boehm, 1995, Walkerden, 1997, Conte, 1986, Fenton, 1991, Masters, 1985, Mohanty, 1981]. These approaches impose a few restrictions, often violated by software engineering data and resulted in the development of inaccurate empirical models that do not perform very well when used for prediction of effort. This paper focuses on approximate effortestimation with the help of equation which is related to kilo line of codes (KLOC) and fuzzy multiplier.
Published surveys on estimation practice suggest that expert estimation is the dominant strategy when esti- mating software development eﬀort. For example, the study of software development estimation practice at Jet Propulsion Laboratory reported in Hihn and Habib- Agahi (1991a) found that 83% of the estimators used ‘‘informal analogy’’ as their primary estimation tech- niques, 4% ‘‘formal analogy’’ (deﬁned as expert judg- ment based on documented projects), 6% ‘‘rules of thumb’’, and 7% ‘‘models’’. The investigation of Dutch companies described in Heemstra and Kusters (1991) conclude that 62%, of the organizations that produced software development estimates, based the estimates on ‘‘intuition and experience’’ and only 16% on ‘‘formal- ized estimation models’’. Similarly, a survey conducted in New Zealand, Paynter (1996), reports that 86% of the responding software development organizations applied ‘‘expert estimation’’ and only 26% applied ‘‘automated or manual models’’ (an organization could apply more than one method). A study of the information systems development department of a large international ﬁnan- cial company Hill et al. (2000) found that no formal softwareestimation model was used. Jørgensen (1997) reports that 84% of the estimates of software develop- ment projects conducted in a large Telecom company were based on expert judgment, and Kitchenham et al. (2002) report that 72% of the project estimates of a software development company were based on ‘‘expert judgment’’. In fact, we were not able to ﬁnd any study reporting that most estimates were based on formal es- timation models. The estimation strategy categories and deﬁnitions are probably not the same in the diﬀerent studies, but there is nevertheless strong evidence to support the claim that expert estimation is more fre- quently applied than model-based estimation. This strong reliance on expert estimation is not unusual. Similar ﬁndings are reported in, for example, business forecasting, see Remus et al. (1995) and Winklhofer et al. (1996).
In , the authors compared the performance of different soft computing techniques such as PSO-Tuned COCOMO, Fuzzy Logic with traditional effortestimation structures. Their results showed that the proposed model outperformed traditional effortestimation structures for NASAs softwareeffort data set. In , decision trees based algorithm was used to perform the softwareeffortestimation. In addition, the authors presented an empirical proof of performance variations for several approaches that include Linear Regression, Artificial Neural Networks (ANN), and Support Vector Machines (SVM). Also the authors pointed to the suitability of the experimented ML approaches in the area of effortestimation. From their performance comparison results with other traditional algorithms, their results in terms of the error rate were better than other techniques.
Academic researchers and practitioners have long been searching for more accurate estimation models or methods. In 1987, Kemerer in  posed two research questions: a) are models that do not use source lines of code as accurate as those do? And b) are the models available in the open literature as accurate as proprietary models? Empirical data were gather to compare the performance of the four software cost estimation models: SLIM, COCOMO models, ESTIMAC and Function Points. However, the results were inconclusive.
6.1 Provide Feedback on Estimation Accuracy and Development Task Relations There has been much work on frameworks for “learning from experience” in software organizations, e.g., work on experience databases (Basili, Caldierea et al. 1994; Houdek, K et al. 1998; Jørgensen, Sjøberg et al. 1998; Engelkamp, Hartkopf et al. 2000) and frameworks for Post-Mortem (project experience) reviews (Birk, Dingsøyr et al. 2002). These studies do not, as far as we know, provide empirical results on the relation between type of feedback and estimation accuracy improvement. The only software study on this topic (Ohlsson, Wohlin et al. 1998), to our knowledge, suggest that outcome feedback, i.e., feedback relating the actual outcome to the estimated outcome, did not improve the estimation accuracy. Human judgment studies from other domains support this disappointing lack of estimation improvement from outcome feedback, see for example (Balzer, Doherty et al. 1989; Benson 1992; Stone and Opel 2000). This is no large surprise, since there is little estimation accuracy improvement possible from the feedback that, for example, “the effort estimate was 30% too low”. One situation were outcome feedback is reported to improve the estimation accuracy is when the estimation tasks are “dependent and related” and the estimator initially was under-confident, i.e., underestimated her/his own knowledge on general knowledge tasks (Subbotin 1996). In spite of the poor improvement in estimation accuracy, outcome feedback is useful, since it improves the assessment of the uncertainty of an estimate (Stone and Opel 2000; Jørgensen and Teigen 2002). Feedback on estimation accuracy should, for that reason, be included in the estimation feedback.
estimation process often involves empirical decisions (such as the choice of similarity measures in analogy based method) (Briand and Wieczorek 2002). Lately, Briand and Wieczorek (2002) defined a hierarchical scheme starting from two major classes (model-based methods, non- model-based methods) that are further divided into several sub-classes. The sub-classes contain further divisions and so on. Although the authors claimed that their system covers most types of estimation methods, the hierarchical system has a more complicated tree type structure with more intermediate nodes than other flatter systems and each intermediate node needs its own definition (such as „data driven‟ and „proprietary‟). Boehm et al. (2000) proposed a simpler but comprehensive framework consisting of six major classes: parametric models, expert judgment, learning oriented techniques, regression based methods, dynamic based models, and composite methods. Directly under each major class are the estimation methods and this system can include most types of estimation methods (Boehm et al. 2000). Our classification system is modified from Boehm‟s framework with the consideration to balance the number of recent publications under each major class.