Software Testing and Quality Assurance on Sampling Inspection through Statistical Learning Theory

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (

ISSN 2250-2459

, Volume 2, Issue 4, April 2012)

688

Software Testing and Quality Assurance on Sampling

Inspection through Statistical Learning Theory

K.K. Suresh

1

_{and Gandhiya Vendhan.S}

2

1 _{Professor and Head,}2_{Ph.D Research Scholar,}

Department of Statistics, Bharathiar University, Coimbatore. 1_{[email protected],}2_{[email protected].}

Abstract— In this paper, an engineering statistical model is proposed for the prediction of control and assurance in software engineering. This paper attempt statistical learning theory is to studies in a framework the properties of learning theory based on software engineering in quality testing using acceptance sampling statistical quality control and software testing and quality assurance on sampling inspection through learning theory.

Keywords—Software Testing, Learning Theory, Statistical Quality Control, Six Sigma Applications.

I

.

I

NTRODUCTION

Learning theory provides the theoretical basis for many of todays learning algorithms and is arguably one of the most beautifully developed branches of Statistical sciences in general. Providing the basis of new learning algorithms however was not the only motivation for developing statistical learning theory.

In this paper attempt to give a gentle non technical overview over the key ideas and insights of statistical learning theory. They do not assume that the reader has a deep background in mathematics, statistics or computer science. Given the nature of the subject matter however some familiarity with mathematical concepts and notations and some intuitive understanding of basic probability are required. There exist many excellent references to more technical surveys of the mathematics of statistical learning theory: the monographs by one of the founders of statistical learning theory (Vapnik, 1995, Vapnik, 1998), a brief overview over statistical learning theory of Scholkopf and Smola (2002), more technical overview papers such as Bousquet et al. (2003), Mendelson (2003), Boucheron et al. (2005), Herbrich and Williamson (2002), and the monograph Devroye et al. (1996).

The literature review of this paper is to provide an introduction to the study of supervised learning within the framework of regularization theory and statistical learning theory. For a detailed review of the theoretical aspects of this subject see Evgeniou et al. (1999). In supervised learning or learning from examples a machine is trained, instead of programmed, to perform a given task on a number of inputs and output pairs.

In a probabilistic setting a second fundamental problem studied by Statistical Learning Theory is how well the chosen function generalizes, or how well it estimates the output for new inputs. The review purpose of statistical learning theory is to provide a framework for studying the problem of inference that is of gaining knowledge, making predictions, making decisions or constructing models from a set of data. This is studied in a statistical framework that is there are assumptions of statistical nature about the underlying phenomena. Indeed a theory of inference should be able to give a formal definition of words like learning, generalization, over fitting and also to characterize the performance of learning algorithms so that ultimately it may help design better learning algorithms.

This paper is organized as follows. The first outline the key concepts of Statistical Learning Theory and Software Engineering in Sections II and III respectively. The present in Section IV Six Sigma concepts and Statistical Quality Control (SQC), two important learning techniques which can be theoretically justified within the proposed framework. In this section also discuss implementation issues and a few applications of six sigma and SQC which recently gained much attention from the analysis community. Finally, the paper describes our conclusions in Section V.

II.

S

TATISTICAL

L

EARNING

T

HEORY

(S

LT

)

The problem of learning theory in a statistical background distinguishing between experimental and structural possibility and introducing the concept of capacity control with acceptance sampling inspection through on statistical quality control.

A. Independent sampling

(2)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (

ISSN 2250-2459

, Volume 2, Issue 4, April 2012)

689

For example consider the example of pattern recognition for hand written digits. Given some images of hand written digits the task is to train a machine to automatically recognize new hand written digits. For this task the training set usually consists of a large collection of digits written by many different people. Here it is safe to assume that those digits form an independent sample from the whole population of all hand written digits. As an example where the independence assumption is heavily violated consider the case of drug discovery. This is a field in pharmacy where people try to identify chemical compounds which might be helpful for designing new drugs. Machine learning is being used for this purpose, the training examples consist of chemical compounds Xi with a label Yi which indicates whether this compound is useful for drug design or not. It is expensive to find out whether a chemical compound possesses certain properties that render it a suitable drug because this would require running extensive lab experiments. As a result only rather few compounds Xi have known labels Yi, and those compounds have been carefully selected in the first place. Here they cannot assume that the Xi is a representative sample drawn independently from some distribution of chemical compounds, as the labeled compounds are hand-selected according to some non-random process.

B. Bayesian method

Traditionally statistical inference is performed in a model based framework. As opposed to the agnostic approach taken in SLT, they assume that the underlying probability distribution comes from some particular class of probability distributions. This class is usually indexed by one or several parameters that are it has the form probability. For example one could consider the class of normal distributions, indexed by their means and variances.

C. Bayesian Approach to Statistics.

From a technical point of view the main difference between the Bayesian and the frequentist approach is that the Bayesian approach introduces some prior distribution on the parameter space. That is there define some distribution which for each parameter encodes how likely find it that this is a good parameter to describe our problem.

The important point is that this prior distribution is defined before they get to see the points from which we would like to learn. It should just encode our past experiences or any other kind of prior knowledge there might have.

As opposed to making complicated confidence statements like the ones in the standard SLT approach or the traditional frequentist approach to statistics, in the end zone has statements like with probability 95% we selected the correct parameter. This comes at a price though. The most vehement objection to the Bayesian approach is often the introduction of the prior itself. As indicated by certain consistency results for the Bayesian approach (Berger, 1985). However this argument is somewhat misleading. On a finite data set, the whole point of the prior is that it should bias our inference towards solutions that we consider more likely. So it is maybe appropriate to say that the Bayesian approach is a convenient method for updating our beliefs about solutions which we had before taking into the data. Among Bayesian practitioners it is generally accepted that even though priors are wrong, most of the time they are quite useful in that Bayesian averaging over parameters leads to good generalization behavior. One point in favor of working with prior distributions is that they are a nice tool to invoke assumptions on the underlying problem in a rather explicit way. In practice, whether or not we should apply Bayesian methods thus depends on whether they are able to encode our prior knowledge in the form of a distribution over solutions. All in all as in the standard SLT approach, the Bayesian framework implicitly deals with overfitting by looking at the trade-off between data fit and model complexity.

The literature on general Bayesian statistics is huge. A classic is Cox (1961) which introduces the fundamental axioms that allow to express beliefs using probability calculus. Jaynes (2003) puts those axioms to work and addresses many practical and philosophical concerns. Another general treatment of Bayesian statistics can be found in O’Hagan (1994). A gentle introduction into Bayesian methods for machine learning can be found in Tipping (2003), a complete monograph on machine learning with a strong focus on Bayesian methods is Bishop (2006).

(3)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (

ISSN 2250-2459

, Volume 2, Issue 4, April 2012)

690

D. Binomial Tails

There recall that the functions we consider are binary valued. So, if we consider a fixed function f, the distribution of Pnf is actually a binomial law of parameters Pf and n, since we are summing n i.i.d. random variables f(Zi) which can either be 0 or 1 and are equal to 1 with probability f(Zi) = Pf. Denoting p = Pf. They can have an exact expression for the deviations of Pnf from Pf, since this expression is not easy to manipulate there have used an upper bound provided by Hoeffding’s inequality. However there exist other sharper upper bounds.

III.

Q

UALITY

R

EVOLUTION

A quality movement started in Japan during the 1940s and the 1950s by William Edwards Deming, Joseph M.Juran and Kaoru Ishikawa. In circa 1947, W. Edwards Deming visited India as well and then continued on to Japan where he had been asked to join a statistical mission responsible for planning the 1951 Japanese census. During his said visit to Japan, Deming invited statisticians for a dinner meeting and told them how important they were and what they could do for Japan. In March 1950, he returned to Japan at the invitation of Managing Director Kenichi Koyanagi of the Union of Japanese Scientists and Engineers (JUSE) to teach a course to Japanese researchers, workers, executives and engineers on statistical quality control (SQC) methods. Statistical quality control is a discipline based on measurements and statistics. Decisions are made and plans are developed based on the collection and evaluation of actual data in the form of metrics rather than institution and experience. The SQC methods use seven basic quality management tools: Pareto analysis, cause-and-effect diagram, flow chart, trend chart, histogram, scatter diagram, and control chart.

In July 1950, Deming gave an eight-day seminar based on the Shewhart methods of statistical quality control for Japanese engineers and executives. He introduced the plan-do-check-act (PDCA) cycle in the seminar, which he called the Shewhart cycle. The Shewhart cycle illustrates the following activity sequence: setting goals, assigning them to measurable milestones and assessing the progress against those milestones. Deming’s 1950 lecture notes formed the basis for a series of seminars on SQC methods sponsored by the JUSE and provided the criteria for Japan’s famed Deming Prize. Deming’s work has stimulated several different kinds of industries such as those for radios, transistors, cameras, binoculars, sewing machines and automobiles.

Between circa 1950 and circa 1970, automobile industries in Japan in particular Toyota Motor Corporation came up with an innovative principle to compress the time

period from customer order to banking payment known as the “lean principle.” The objective was to minimize the consumption of recourses that added no value to a product. The lean principle has been defined by the National Institute of Standards and Technology (NIST) Manufacturing Extension Partnership program as a systematic approach to identifying and eliminating waste through continuous improvement flowing the product at the pull of the customer in pursuit of perfection. It is commonly believed that lean principles were started in Japan by Taiichi Ohno of Toyota but Henry Ford had been using parts of lean as early as circa 1920 as evidenced by the following quote (Henry Ford, 1926):

Walter Andrew Shewhart was an American physicist, engineer and statistician and is known as the father of statistical quality control. Shewhart worked at Bell Telephone Laboratories from its foundation in 1925 until his retirement in 1956. His work was summarized in his book Economic Control of Quality of Manufactured Product, published by McGraw-Hill in 1931. In 1938, his work came to the attention of physicist W. Edwards Deming, who developed some of Shewhart’s methodological proposals in Japan from 1950 onward and named his synthesis the Shewhart cycle.

In 1954 Joseph M. Juran of the United States proposed raising the level of quality management from the manufacturing units to the entire organization. He stressed the importance of systems thinking that begins with product requirement, design, prototype testing, proper equipment operations and accurate process feedback. Juran’s seminar also became a part of the JUSE’s educational programs. Juran spurred the move from SQC to TQC (total quality control) in Japan. This included companywide activities and education in quality control (QC) audits, quality circle and promotion of quality management principles. The term TQC was coined by an American, Armand V. Feigenbaum in his 1951 book Quality Control Principles, Practice and Administration. It was republished in 2004. By 1968, Kaoru Ishikawa, one of the fathers of TQC in Japan had outlined as shown in the following the key elements of TQC management.

A. Software Reliability

(4)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (

ISSN 2250-2459

, Volume 2, Issue 4, April 2012)

691

In statistical testing, a model is developed to characterize the population of uses of the software and the model is used to generate a statistical correct sample of all possible uses of the software. Performance on the sample is used as a basis for conclusions about general operational reliability.

B. Software Testing and Quality Assurance

The concept of software quality is very broad and therefore it is useful to look at it from different viewpoints. In 1984 Garvin analyzed the concept of quality of products in general not specifically software products from five viewpoints namely transcendental view, user view, manufacturing view, product view, and value based view. In the transcendental view quality is something that can be perceived only through experience. The user view concerns to what extent the product meets user needs and expectations. According to the manufacturing view quality is perceived through conformance to a manufacturing process. According to the product view good internal qualities of a product translate into good external qualities of products. The value based view concerns how much a user is willing to pay for a certain level of quality.

It is useful to measure quality for three reasons. First measurement allows us to develop baselines for quality. Second since quality improvement has an associated cost, it is important to know how much quality improvement is achieved for a certain cost. Finally it is useful to know the present level of quality so that further improvement can be planned.

Gilb’s technique for quality measurement stated in the following is useful in measuring quality factors which are not amenable to direct measurement: The quality concept is successively broken down into its component parts until each component can be expressed in terms of some directly measurable attributes.

Measuring the manufacturing view is easier. The two widely used metrics are defect count and rework cost. The first metric refers to how many defects have been detected whereas the second metric refers to how much it costs to fix the known defects. A defect count can be analyzed to give us a better idea about the development process. For example each defect can be traced to a phase of the software development process where the defect got introduced. Improvement should be made to the phases where a large fraction of the defects get introduced. Similarly improvements should be made to the phases where critical defects are introduced. A second example of analyzing defect count is to identify the modules containing a large number of defects.

A third example of analyzing defect count is separating defects detected during development from those detected during operation. If a large number of defects are found during operation one can conclude that the test process needs much improvement. A fourth example of using defect count is to be able to compare different modules in terms of defect density.

Similarly rework cost can be analyzed in two parts: prerelease rework cost and post release rework cost. The prerelease rework cost is a measure of development efficiency whereas the postrelease rework cost is a measure of the delivered quality of the system.

Another concept of software quality, commonly known as McCall’s quality factors was proposed by McCall, Richards and Walters in 1977. They studied the concept of software quality in terms of quality factors and quality criteria. A quality criterion is an attribute of a quality factor. McCall et al. identified 23 quality criteria. Some examples of quality criteria are modularity, traceability, simplicity and completeness. In a nutshell a quality factor represents a behavioral characteristic of a system and McCall et al. suggested 11 quality factors: correctness, reliability, efficiency, integrity, usability, maintainability, testability, flexibility, portability, reusability and interoperability. The quality factors are categorized into three classes: product operation, product revision and product transition. Product operation concerns correctness, reliability, efficiency, integrity and usability. Product revision concerns maintainability, testability and flexibility. Product transition concerns portability, reusability and interoperability.

A global initiative for understanding the concept of software quality has been performed by experts around the world. Their collaborative effort has led to the standardization of the quality concept by the ISO in the form of documents ISO 9126 and ISO 9000:2000. The document ISO 9126 is about quality characteristics whereas the ISO 9000:2000 document is a quality assurance standard. What is called quality factor in the McCall model is called quality characteristic in the ISO 9126 model. However there are several differences between the two models. The ISO 9126 model focuses on characteristics visible to the users whereas the McCall model emphasizes internal quality as well.

(5)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (

ISSN 2250-2459

, Volume 2, Issue 4, April 2012)

692

IV.

M

ATURITY

M

ODEL

W

ITH

S

IX

S

IGMA

A

PPLICATIONS

Another maturity model that is frequently adopted by organizations is known as the Six Sigma maturity model to address quality and customer satisfaction issues. Six Sigma was created by some of America’s most gifted CEOs people like Motorola’s Bob Galvin, Allied Signal’s Larry Bossidy and GE’s Jack Welch.

A. DMAIC

Six Sigma is a business driven, multifaceted approach to process improvement, reduced costs and increased profits. To achieve Six Sigma a process must not produce more than 3.4 defects per million opportunities. The Six Sigma DMAIC methodologies, consisting of the five steps define, measure, analyzes, improves and control is an improvement system for existing processes falling below specification and looking for incremental improvement. The Six Sigma DMADV methodology, consisting of the five steps defines measure, analyze, design and verify is an improvement system used to develop a new processes or products at Sig Sigma quality levels. It can also be employed if a current process requires more than just incremental improvement.

B. Statistical Quality Control (SQC)

The organizations further improve the quality of software products by driving the testing process with statistical sampling, measurement of confidence levels, trustworthiness and software reliability goals. This goal is stronger than the software quality evaluation goal at level. It may be recalled that the quality evaluation goal focuses on different kinds of software qualities such as functionality, reliability, usability, and robustness. Automated tools are used for defect collection and analysis. Usage modeling is used to perform statistical testing, where a usage model is selected from a subset of all possible usages of the software. From statistical testing one can conclude the general operational performance of the software product. Subgoals that support statistical quality control are as follows:

 The test group establishes high level measurable quality goals such as test case execution rate, defects arrival rate and total number of defects that can be found during testing.

 Managers ensure that the new quality goals from a part of the test plan.

 The test group is trained in statistical testing and analysis methods: Pareto analysis, cause-and-effect diagram, flow chart, trend chart, histogram, scatter diagram and control chart.

 User inputs are gathered for usage modeling.

C. Acceptance Testing

This paper study began with an introduction to two types of acceptance testing: user acceptance testing and business acceptance testing. Next the paper described acceptance criteria in terms of quality attributes. Formulation of acceptances criteria is governed by the business goals of the customer’s organization.

They presented an outline of an acceptance test plan and described in detail how to create such a plan. Emphasis must be put on the notion that the system works according to the customer’s expectations in developing an acceptance test plan rather than just passing comprehensive testing. Less emphasis is put on a system passing a comprehensive set of tests because rigorous testing is assumed to have already occurred during the system testing phase.

Next this paper discusses the execution of acceptance tests, which is an important as activity performed by the customer with much needed support from the developers. Three major activities were identified and discussed: (i) providing training to the customer’s test engineers, (ii) fixing problems during acceptance testing, and (iii) resolving issues concerning any discrepancy related to acceptance criteria. After that they described the generation of an acceptance test report which must be completed at the end of acceptance testing. Finally explained how user stories are used in XP as acceptance criteria and acceptance test cases are created. These tests are reviewed, automated and executed multiple times per day as a regression acceptance test suite in the presence of onsite customers.

V.

C

ONCLUSIONS

Statistical Learning theory provides a framework within which analysis tools can be developed and analyzed. Two important methods developed with appropriate choices of these two terms are statistical learning theory and six sigma Applications. In the Bayesian view of learning the data only serves to update one’s prior. They start with a probability distribution over hypothesis and end of up with a somewhat different distribution that reflects what have seen in between. For a subjective Bayesian, learning is thus nothing but an update of one’s beliefs which is consistent with the rules of probability theory. Statements regarding how well the inferred solution works are generally not made nor are they necessary for an orthodox Bayesian.

(6)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (

ISSN 2250-2459

, Volume 2, Issue 4, April 2012)

693

One can show that if the data generating mechanism is benign then it can assert that the difference between the training error and test error of a hypothesis from the class is small. Benign here can take different guises; typically it refers to the fact that there is a stationary probability law that independently generates all individual observations however other assumptions can also be incorporated. The class of hypothesis plays a role analogous to the prior however it does not need to reflect one’s beliefs. Rather the statements that we obtain are conditional on that class in the sense that if the class is bad then the result of our learning procedure will be unsatisfactory in that the upper bounds on the test error will be too large. Typically either the training error will be too large or the confidence term depending on the capacity of the function class will be too large. It is appealing however that statistical learning theory generally avoids metaphysical statements about aspects of the underlying dependency and thus is precisely by referring to the difference between training and test error. While the above are the two main theoretical of learning there are other variants some of which we have mentioned in this paper.

REFERENCE

[1] G.H. Walton, J.H. Poore, and C.J. Trammell, “Statistical Testing of Software Based on a Usage Model,” Software Practice and Experience, Vol.25, No. 1, January 1995, pp.97-108.

[2] J.A. Whittaker and M.G. Thomason, “A Markov Chain Model for Statistical Software Testing,” IEEE Transactions on Software Engineering, Vol.20, No.10, October 1994, pp.812-824.

[3] M. Harry, “Sig Sigma: The Breakthrough Management Strategy Revolutionizing the World’s Top Corporations”, Random House, New York, 2000.

[4] T. Pyzdek, “The Six Sigma Handbook”, McGraw-Hill Professional, New York, 2001.

[5] Vapnik, V.N., “Statistical Learning Theory”. Wiley, New York, 1998.

[6] V. Vapnik, A. Chervonenkis, “Necessary and sufficient conditions for uniform convergence of means to mathematical expectations”, Theory Prob. Applic. 26(3), 532-553, 1971.

[7] S. Mendelson, “Rademacher averages and phase transitions in Glivenko-Cantelli class”, IEEE transactions on information theory, 48(1), 251-263, 2002.

[8] S. Mendelson, “A few notes on statistical learning theory. In Advanced lectures in machine learning”, volume LNCS 2600, pages 1 – 40. Springer, 2003.

[9] S. Boucheron, O. Bousquet, and G. Lugosi. “Theory of classification: A survey of some recent advances. ESAIM” Probability and Statistics, 9:323–375, 2005.

[10] U. von Luxburg, and G. R¨atsch, “Advanced Lectures on Machine Learning”, pages 169–207. Springer, Berlin, 2003.

[11] N. Baskiotis, M. Sebag, M.-C. Gaudel, and S. Gouraud, “A Machine Learning Approach for Statistical Software Testing,” in Proceedings, International Conference on Artificial Intelligence, Hyderabad, India, Morgan Kaufman, San Francisco, January 6-12, 2007, pp.2274-2278.

[12] S. Prowell, C. Trammell, R. Linger, and J. Poore, Cleanroom, “Software Engineering”, Addison-Wesley, Reading, MA, 1999.