Deloitte Supply Chain Analytics Workbook

Full text

(1)Deloitte Consulting Advanced Analytics Group Presents: Supply Chain Analytics Unit 1 Workbook.

(2) Contents. Welcome to Supply Chain Analytics Unit 1 ............................................................ 1 How to Use this Workbook ..................................................................................... 2 Section 1 – Fundamentals of Operations Research (Part I) .................................. 3 Section 2 – Network Problems (Part I) ................................................................... 8 Section 3 – Applied Statistics (Part I) ................................................................... 17 Section 4 – Fundamentals of Operations Research (Part II) ............................... 34 Section 5 – Network Problems (Part II) ................................................................ 37 Section 6 – Applied Statistics (Part II) .................................................................. 44 Solutions............................................................................................................... 61.

(3) Deloitte Advanced Analytics Group. Welcome to Supply Chain Analytics Unit 1. One of Deloitte’s top priorities is to support the development of skills and knowledge that enable practitioners to provide the highest level of client service. In support of this objective, Deloitte’s Advanced Analytics Group (DAAG) created a set of courses and learning materials to expand the client service and technical capabilities of practitioners interested in Supply Chain Analytics. Supply Chain Analytics Unit 1 is comprised of six courses that serve as prerequisites for Unit 2. Unit 2 introduces advanced topics in Supply Chain Analytics such as Network, Inventory and Transport Optimization. Unit 1 provides a the foundation and knowledge needed to solve business problems outlined in Unit 2. The Unit 1 courses should be taken in the order they are presented.. Supply Chain Analytics Unit 1 Workbook 1.

(4) Deloitte Advanced Analytics Group. How to Use this Workbook. This workbook is designed to support the Unit 1 Supply Chain Analytics training and to provide tools and information needed to support the training. This workbook will: • Summarize key learning objectives • Provide an opportunity for reflection and a framework for understanding what can occur on client engagements • Provide application based activities to embed learning and make it practical • Point to resources and tools that will assist in applying learning objectives As you proceed through Unit 1, have this workbook available to complete all of the activities and maximize the impact of the learning. The Course Information and Activities section has suggested activities to help you apply what you are learning and prepare you for Unit 2.. Supply Chain Analytics Unit 1 Workbook 2.

(5) Deloitte Advanced Analytics Group. Section 1 – Fundamentals of Operations Research (Part I). Basic Concepts of Linear Programming Overview of Linear Programming Linear Programming is a technique which is used to arrive at an optimal decision, which is affected by various factors and constraints. Linear Programming problems consist of two parts: Objective Function and Constraints. An objective function can be maximized or minimized. Constraints are usually in the form of inequalities. Constraints exist because certain limitations restrict the range of a variable’s possible values. Approach to Problem Solving • Identify the objective of the problem • Identify the decision variables and constraints on them • Write the objective function and constraints in terms of the decision variables • Add any implicit constraints • Arrange the equations into an organized format Assumptions in Linearity • Proportionality • Additivity • Divisibility • Certainty. Exercise Question 1.1: A diet is to contain at least 200 grams of carbohydrates, 100 grams of fat and 150 grams of protein. Two foods A and B are available. Food A costs $2 per pound and food B costs $4 per pound. A pound of food A contains 10 grams of carbohydrates, 20 grams of fat and 15 grams of protein. A pound of food B contains 25 grams of carbohydrates, 10 grams of fat and 20 grams of protein. Formulate the problem as a Linear Programming problem so as to find Supply Chain Analytics Unit 1 Workbook 3.

(6) Deloitte Advanced Analytics Group. the minimum cost for a diet that consists of a mixture of these two foods and also meets the minimum requirements.. Food Type. Carbohydrates. Fat. Protein. Cost ($) per gram. A. 10. 20. 15. 2. B. 25. 10. 20. 4. Requirement. 200. 100. 150. Review the correct answer in the Solutions section.. Exercise Reflection: Use the space below to what you have learned about solving the previous Linear Programming problem. ________________________________________________________________ ________________________________________________________________. The general form for maximized objective function and constraints in Linear Programming is represented as follows.. Supply Chain Analytics Unit 1 Workbook 4.

(7) Deloitte Advanced Analytics Group. Linear Programming Optimization Methods Graphical Method of Solution Graphical method is a simple way to solve Linear Programming problems when there are two decision variables x1, and x2. We usually take these decision variables as x, y instead of x1, x2. The graphical method includes two major steps: • The determination of the solution space that defines the feasible region • The determination of the optimal solution from the feasible region. Defining the Feasible Region. The following three steps are used to determine the feasible solution of a Linear Programming problem: 1. Since the two decision variables x and y are non-negative, consider only the first quadrant of the xy-plane 2. Draw the line for each constraint • Each line divides the first quadrant into two regions • Area under constraint 1: All the points in this area satisfy the equation 3x + 4y ≤ 12 • Area under constraint 2: All the points in this area satisfy the equation 5x + 3y ≤ 15 3. Each point within the feasible solution meets all the constraints Thus, the intersection of the two areas is the feasible area or feasible solution of the Linear Programming problem.. Supply Chain Analytics Unit 1 Workbook 5.

(8) Deloitte Advanced Analytics Group. Optimal Solution The optimal solution to a Linear Programming problem occurs at the corners of the feasible region. Another way to reach the optimal solution is to plot the objective function for some arbitrary value, like 6x + 5y = 12. Since we want to maximize 6x + 5y, we plot another line for 6x + 5y = 20. • This line is parallel to the first line and is moving in the direction of increase of the objective function line. If we want to maximize 6x + 5y, then we move it in the increasing direction • We can move the line until it comes out of the feasible region. The last point it will touch before it leaves the feasible region is the corner point (2,3) • This point is the feasible point that has the highest value of the objective function and is optimal. Exercise Question 1.2: Using the graphical method of solution of a Linear Programming problem, find the feasible solution for the problem of a decorative item dealer whose Linear Programming problem is to maximize profit function.. Objective Function: Z = 50x + 18y Constraints: 2x + y ≤ 100 x + y ≤ 80 x ≥ 0, y ≥ 0. Review the correct answer in the Solutions section.. Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Linear Programming problem using the graphical method. Supply Chain Analytics Unit 1 Workbook 6.

(9) Deloitte Advanced Analytics Group. ________________________________________________________________ ________________________________________________________________. Business Applications Linear programming is used to facilitate decision-making in business when there are multiple trade-offs involved and an optimal outcome needs to be arrived at in the face of various conditions. While it has roots in operations and supply chain, it has applications across business functions/service lines. Marketing Application. Financial Applications. Production Management. Product-Mix Application. • Media Selection. • Portfolio Selection. • SKU Rationalization. • Market Research. • Financial Planning. • A Make-or-Buy Decision. • Blending Problems. • Production Scheduling • Workforce Assignment. Supply Chain Analytics Unit 1 Workbook 7.

(10) Deloitte Advanced Analytics Group. Section 2 – Network Problems (Part I). Overview of Network Problems Fundamentals of Network Flow Problems Network Flow Problems are applied to business issues that can be formulated in a network structure with nodes and arcs, and solved using special purpose algorithms.. Common Terms Node. Specific location in a network that can be of various types such as origin, destination, and transshipment nodes.. Arc. Connector of two nodes, and the path between nodes along which materials move. Arcs can be one-way or two-way in nature.. Flow. Movement of materials / resources between nodes along an arc.. Capacity. Limitations on the amount of materials that can flow through an arc. Arcs can possess both lower and upper capacity constraints.. Business Applications of Network Flow Problems Distribution and transportation systems • Telecommunication networks • Oil & gas • Aerospace • Manufacturing • Telecommunications. Supply Chain Analytics Unit 1 Workbook 8.

(11) Deloitte Advanced Analytics Group. Illustrative Examples of Business Applications of Network Flow Problems Applications. Sample Business Application. Physical Analog of Nodes. Physical Analog of Arcs. Flow. Distribution Networks. What quantity of goods should be sent from which plant given demand at a distribution center (DC)?. Plants, Distribution Centers, Warehouses. Road, Rail and Air Routes. Materials, Goods, Finished Products. Transportation. What is the maximal number of vehicles that can be routed through a road system?. Intersections, Airports, Rail Yards. Highways, Airline Routes, Railbeds. Passengers, Freight, Vehicles, Operators. What is the optimal assignment of jobs to machines?. Machines, Jobs. Processing Time. Assignment and Sequencing of Jobs. Systems. Manufacturing Scheduling. Overview of Solution Methods Solution Methods for Network Flow Problems Common network flow problems can be solved primarily using three methods: Rule-Based Algorithms. Problem-specific, optimal, less flexible. Linear Programming-Based Optimization. More flexible, time-consuming, commercial solver toolbased. Heuristics. Easy, but may be sub-optimal. Considerations for Choice of Solution Method Solution Driven Considerations. Resource Driven Considerations. • Problem Size. • Availability of Solvers. • Problem Complexity. • Availability of Trained Resources. • Desired Accuracy Levels. • Cost Implications. • Impact of Assumptions. • Available Time. Supply Chain Analytics Unit 1 Workbook 9.

(12) Deloitte Advanced Analytics Group. Shortest Path Problem Overview of the Shortest Path Problem The Shortest Path Problem is a network problem with the primary objective of finding the shortest route between any pair of nodes in a network. There are multiple forms of this problem, and most forms have corresponding specific algorithms that are more efficient than the standard algorithm. Decision: Which arcs to travel on? Objective: Minimize the distance (or time) from the origin to the destination.. Dijkstra's Algorithm – Standard Form Step 1. Assign a permanent label [0,S] to the starting node (Node 1) (0 indicates the distance from the node to itself, and S indicates that it is the starting node). Step 2. Assign tentative labels to the nodes that can be reached directly from Node 1 (In a label, the first number is the direct distance from Node 1, and the second number is the preceding node in the route from Node 1). Step 3. Identify the tentatively labeled node with the shortest distance value, and declare that node permanently labeled If all nodes are permanently labeled, go to step 5. Step 4. For each non-permanently labeled node that can be reached from the new permanently labeled node: If a node has a tentative label, calculate the shortest distance from Node 1 through the new permanently labeled node. If this is less than the existing distance, reset and permanently label the node. Go to step 3 If the node is not yet labeled, create a tentative label indicating the shortest distance from Node 1 through the new permanently labeled node. Go to step 3. Step 5. The permanent labels identify the shortest route from Node 1 to the respective node, and the preceding node in the shortest route To find the shortest route to Node 1, work backwards along preceding nodes until Node 1 is reached. Supply Chain Analytics Unit 1 Workbook 10.

(13) Deloitte Advanced Analytics Group. Linear Programming Method – Standard Form xij = binary variable indicating whether the arc between the ith and jth nodes is chosen cij = The distance or length of arc (i,j). Variations to Shortest Path Problem • Single-Source Shortest Path Problem • Single-Destination Shortest Path Problem • All-Pairs Shortest Path Problem Sample List of Algorithms for Shortest Path Problem. Business Applications of the Shortest Path Problem. • Dijkstra's Algorithm. • Floyd-Warshall Algorithm. • The Shortest Path Algorithms determine the path with the least weight (weight can be cost, distance, time, etc.) between any pair of nodes in a network. Some business applications of these algorithms include:. • Johnson's Algorithm. • Flight reservations. • Perturbation Theory. • Internet packet routing. • Bellman-Ford Algorithm • A* Search Algorithm. • Driving directions • Telecom network routing. Supply Chain Analytics Unit 1 Workbook 11.

(14) Deloitte Advanced Analytics Group. Exercise Question 2.1: Choose the arcs to travel on such that the distance between node 1 and node 8 is minimized.. Review the correct answer in the Solutions section. Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Shortest Path problem. ________________________________________________________________ ________________________________________________________________. Minimum Spanning Tree Problem Overview of the Minimum Spanning Tree Problem The Minimum Spanning Tree Problem is typically used in a given network to connect all the nodes of the network such that the total weight of all the arcs used to achieve this objective is minimized. A Minimum Spanning Tree will provide the optimal set of arcs with minimal total arc cost, time, distance or other similar measure. Decision: Which arcs to choose such that all nodes are connected to the network? Objective: Minimize the total weight of the arcs chosen.. Supply Chain Analytics Unit 1 Workbook 12.

(15) Deloitte Advanced Analytics Group. Linear Programming Method – Standard Form To solve this problem, the network is divided into all possible combinations of two subsets of the network such that each set of subsets together makes up the total network. xij = The arc between the ith and jth nodes in a network of n nodes cij = The distance or length of arc (i,j) A = Every possible subset of nodes within the network B = Complement of A. Note: The optimal solution will use (n-1) arcs to connect a network of n nodes. Using more than (n-1) arcs will potentially result in redundant arcs and/or the formation of loops.. Variations to the Minimum Spanning Tree Problem • Optimum Communication Spanning Tree • Steiner Trees Sample List of Algorithms for Minimum Spanning Tree Problem • Prim’s Algorithm. Business Applications of the Minimum Spanning Tree Problem. • Boruvka’s Algorithm. The Minimum Spanning tree problem is used to determine the smallest spanning tree that is needed to connect a set of nodes in a network. The typical variables include distance, cost, time, etc. Some business applications of this problem include:. • Reverse-Delete Algorithm. • Design of telecommunications networks. • Edmonds’ Algorithm. • Airline routing. • Kruskal’s Algorithm. • Design of lightly used transportation network to minimize the total cost of providing the links • Finding routes with maximum bottleneck capacity in a computer network • Network design of high voltage electrical transmission lines. Exercise: Identify the Minimum Spanning Tree for the below given sample problem.. Supply Chain Analytics Unit 1 Workbook 13.

(16) Deloitte Advanced Analytics Group. Question 2.2: Fly High Airlines wants to establish connectivity to all the major ports in the country leveraging the shortest distance route. Connect all the ports in the network such that the overall distance of the network is minimized.. Review the correct answer in the Solutions section. Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Shortest Path problem. ________________________________________________________________ ________________________________________________________________. Maximal Flow Problem Overview of the Maximal Flow Problem The Maximal Flow Problem is used to determine the maximum amount of flow of a given item (vehicles, fluid, materials, etc.) that can enter and exit a network in a specific period of time. Flow is transmitted through each node in the network as efficiently as possible. Typically, each arc is subject to certain flow restrictions (vehicles per hour, gallons per hour) and the maximum capacity restriction is referred to as the flow capacity for that arc. In its simplest form, it is assumed that for each node, inflow to the node is equal to the outflow from the node (no inventory). In this case, capacity restrictions are not assigned to the nodes. Decision: How much flow on each arc? Objective: Maximize flow through the network from an origin to a destination.. Supply Chain Analytics Unit 1 Workbook 14.

(17) Deloitte Advanced Analytics Group. Linear Programming Method – Standard Form To solve this problem, add a new arc from node n (output node) back to Node 1 (input node). This arc denotes the total flow over the route network. The flow over this arc must be maximized. Each variable is associated with each arc that represents the quantity of flow through that arc, and there is a constraint for flow through each node.. xij = The flow across arc from the ith to the jth node. uij = Maximal capacity on arc from the ith to the jth node.. Supply Chain Analytics Unit 1 Workbook 15.

(18) Deloitte Advanced Analytics Group. Variations to Maximal Flow Problem • Capacity Constraints • Max-Flow Min-Cut Theorem Sample List of Algorithms for Maximal Flow Problem. Business Applications of the Maximal Flow Problem. • Ford Fulkerson Algorithm. The Maximal Flow Problem can be used to determine the optimal flow of materials (such as vehicles, oil, etc.) through each arc of a given network such that the amount of flow through the entire network is maximized. Some business applications of this problem include:. • Edmonds-Karp Algorithm • Dinitz Blocking Flow Algorithm • General Push-Relabel Maximum Flow Algorithm. • Oil flow through a pipeline network • Project selection • Airline scheduling • Material flow through a company’s distribution network • Water supply through a system of aqueducts. Exercise Question 2.3: The local water conservation authority is constructing new ducts for water supple in the city. The capacity of each duct is provided in the below given network representation. Choose the route to maximize the water flow from node 1 to node 9.. Review the correct answer in the Solutions section.. Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Shortest Path problem. ________________________________________________________________ ________________________________________________________________. Supply Chain Analytics Unit 1 Workbook 16.

(19) Deloitte Advanced Analytics Group. Section 3 – Applied Statistics (Part I). Statistical Tools Statistical Tools There are several statistical tools and packages that are commercially available to solve statistics problems. Even common software like Microsoft Excel have “Add-Ins” with significant statistical capabilities.. Using Statistical Tools Primary tools used to solve the different statistical problems: Course Topic. Primary Tools. Simple Regression and Correlation. MS Excel, SPSS, SAS, Systat. Multiple Regression and Correlation. MS Excel, SPSS, SAS, Systat. Time Series Analysis and Forecasting. SPSS, SAS, Systat. Discriminant and Logit Analysis. SPSS, SAS, Systat. Factor Analysis and Clustering. SPSS, SAS, Systat. Key analysis supported by Analysis ToolPak: • Regression • Sampling • Rank and percentile • t-Test: Two Sample for Means • Correlation • Covariance. Supply Chain Analytics Unit 1 Workbook 17.

(20) Deloitte Advanced Analytics Group. Probability Distributions Random Variables A variable is random if it assumes different values as a result of the outcome of a random experiment, for example, a coin toss. There are two types of random variables: Discrete Random Variable. Continuous Random Variable. A discrete random variable is one for which the number of possible outcomes can be counted, and for each possible outcome, there is a measurable and positive probability.. A continuous random variable is one for which the number of possible outcomes is infinite, even if lower and upper bounds exist.. Example: Number of days it rains in a given month, number of patients visiting a clinic on each day of the previous week.. Example: The actual amount of daily rainfall between zero and 10 inches is an example of a continuous random variable because the actual amount of rainfall can take on an infinite number of values.. Expected Value of a Random Variable The expected value of a random variable can be obtained by multiplying each value that the random variable can assume with the probability of occurrence of that value, and then adding up all these products. Expected Value of a Random Variable E(x) = x1P1 + x2P2+..+….+xnPn x = Value of the Random Variable P = Probability of Occurrence of that Value n = A numeric integer from 1 to infinity Exercise Question 3.1: Suppose Jim goes to two movies 10% of all weekends, he goes to one movie 40% of the time, and he goes to no movies 50% of the time. What is the expected value for the number of movies he goes to during a weekend?. Review the correct answer in the Solutions section.. Supply Chain Analytics Unit 1 Workbook 18.

(21) Deloitte Advanced Analytics Group. Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Binomial Distributions problem. ________________________________________________________________ ________________________________________________________________. Probability Distributions Probability distributions arise from experiments where the outcome is subject to chance. A Probability Distribution describes the probabilities of all the possible outcomes for a random variable, such as getting tails on the toss of a coin or the probability that a call center representative will convert a sale on a given call. Characteristics • The probability of all possible outcomes must sum to one • It is a listing of the probabilities of all the outcomes that could result if an experiment was conducted. Example A simple Probability Distribution is that for the roll of one fair die, there are six possible outcomes and each one has a probability of 1/6, so they sum to one. The Probability Distribution of all the possible returns on the S&P index is a more complex version of the same idea. A frequency distribution is different from a Probability Distribution. Frequency distribution is the process of listing all the observed frequencies of all outcomes in an experiment while it was conducted. A Probability Distribution is a listing of the probabilities of all the outcomes that could result if the experiment was conducted.. Supply Chain Analytics Unit 1 Workbook 19.

(22) Deloitte Advanced Analytics Group. Types of Probability Distributions The nature of the experiment dictates which Probability Distribution may be appropriate for modeling the resulting random outcomes. There are two types of probability distributions: Discrete Probability Distribution (Appropriate for discrete random variables). Continuous Probability Distribution (Used for continuous random variables). Discrete Probability Distributions can assume only certain outcomes. The outcomes are mutually exclusive. Examples:. Continuous Probability Distributions can assume an infinite number of values within a given range. Examples:. The number of students in a class. The time it takes an executive to drive to work. The number of children in a family. The length of an afternoon nap. The number of cars entering a carwash in a hour. The length of time of a particular phone call. The distance students travel to class. Number of home mortgages approved by Coastal Federal Bank last week. Types of Discrete Probability Distributions Binomial: Binomial Distributions describe discrete data resulting from an experiment known as the Bernoulli Process. Poisson: Poisson Distributions express the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independent of the time since the last event.. Binomial Distributions Standard Formula for Binomial Distributions. Where the following standard notations apply p(r) = Probability of r successes in n trials p = Characteristic probability or probability of success q = 1 – p = Probability of failure r = Number of successes desired n = Number of trials undertaken. μ = Population mean σ = Standard deviation ! denotes “factorial”; 5! = 5*4*3*2*1 = 120. Supply Chain Analytics Unit 1 Workbook 20.

(23) Deloitte Advanced Analytics Group. Exercise Question 3.2: The probability of converting a sale on any given call for an outbound call center representative is 0.6. If the representative takes 6 calls per hour, what is the probability that he/she will convert exactly 2 sales? Solution: Apply the binomial formula just discussed with the following values: n=6 r = 1, 2, 3,….,6 p = 0.6 q = 0.4. The Binomial Distributions for a variety of situations can be calculated in this manner and are illustrated below.. Reflective Question: What is the probability that he/she will convert up to 5 sales per hour?. Exercise Reflection: Use the space below to note the important things you have learned about solving problems using Binomial Distributions. ________________________________________________________________ ________________________________________________________________. Supply Chain Analytics Unit 1 Workbook 21.

(24) Deloitte Advanced Analytics Group. Poisson Distributions Standard Formula for Poisson Distributions. f(x) = Probability of x occurrences in an interval λ = Mean number of occurrences in an interval e = 2.71828 μ = Population mean. σ = Standard deviation. Types of Continuous Probability Distributions - Normal Distributions A Probability Distribution is called continuous if its cumulative distribution function is continuous.. Description • A Normal Distribution, also known as the Gaussian distribution, describes continuous data where the random variable can assume any value within a given range, and the Probability Distribution is continuous • The Normal Distribution is very important in statistics as it has properties that make it applicable to a wide variety of situations, and it comes close to matching the observed frequency distributions of many phenomena • The areas under the curve represent probabilities, and the total area under the normal curve is 1.00 • As the tails never reach the horizontal axis the theoretical model can assign impossible empirical values, but not much accuracy is lost by ignoring values far out in the tails • Although the Normal Distribution is continuous, it can be used to approximate discrete distributions whenever np and nq are at least 5. Supply Chain Analytics Unit 1 Workbook 22.

(25) Deloitte Advanced Analytics Group. Characteristics. • The curve is bell-shaped and has a single peak (unimodal) • The mean of the normally distributed population lies at the center of the normal curve • Due to its symmetry, the mean, median and mode are of the same value • The tails of the Normal Distribution extend indefinitely and never touch the horizontal axis. Standard Deviation. The areas under the curve represent probabilities, and the total area under the normal curve is 1.00. It can be noted that: • Approximately 68% of the values in a normally distributed population lie within +/- 1 standard deviation from the mean • Approximately 95.5% of the values in a normally distributed population lie within +/- 2 standard deviation from the mean • Approximately 99.7% of the values in a normally distributed population lie within +/- 3 standard deviation from the mean. Supply Chain Analytics Unit 1 Workbook 23.

(26) Deloitte Advanced Analytics Group. Testing for Normality. Graphical Method: • Comparing the histogram plotted for all residuals (error terms) to a normal curve is a quick test of normality of data • The normal probability plot is a formal graphical tool to confirm normality. In a normal probability plot, the data is plotted against a theoretical Normal Distribution in such a way that the points should form an approximate straight line. Departures from this straight line indicate departures from normality • Other rigorous statistical methods used to test for normality include, Pearson’s Chi-Square Test, Anderson-Darling Test, and Shapiro-Wilk Test • When removing data that lies two to three standard deviations from the mean, always go back and verify that other metrics (spend, revenue, etc.) are not disproportionately affected or reduced Testing data for normality is critical since assuming data distribution is normal and including only +2σ or +3σ may lead to exclusion of important data points.. Other Common Probability Distributions Distribution. Description. Continuous Uniform Distribution:. •. Continuous Uniform Distribution [U(a,b)], is a family of Probability Distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. •. Probability Density Function: f(x) = 1 / (b-a) for a< x < b ; 0 for x > a or x <b. •. Population Mean = (a + b) /2. Supply Chain Analytics Unit 1 Workbook 24.

(27) Deloitte Advanced Analytics Group. Distribution. Exponential Distribution:. Description •. Variance = (b – a)2 / 12. •. Standard Deviation = (b – a) / √12. •. One of the most common applications of this distribution is to generate random numbers •. Exponential Distribution represents a process in which events occuring continuously and independently at a constant average rate. •. Probability Density Function: f(x) = λe- λx for x > 0 = 0 for x < 0. where λ is the parameter of distribution called rate parameter and λ>0 •. Population Mean = 1 / λ. •. Variance = 1 / λ2. •. Standard Deviation = 1 /λ. •. Service times of bank tellers, call center agents etc. may be modeled as Exponential Distributions. Other applications include situations where certain events occur with a constant probability per unit. Supply Chain Analytics Unit 1 Workbook 25.

(28) Deloitte Advanced Analytics Group. Distribution. Description length. Gamma Distribution. •. Gamma Distribution is a two-parameter family of continuous Probability Distributions. It has a scale parameter θ and a shape parameter k. •. Probability Density Function: f(x;k,θ) = (xk-1*e-x/ θ ) / θ k Γ(k) for x > 0 and k, θ > 0. •. Population Mean = k θ. •. Variance = kθ2. •. Standard Deviation = θ √k. •. The Gamma Distribution is frequently used to model waiting times; for instance, in life testing, the waiting time until death is a random variable which is frequently modeled with a Gamma Distribution. Student’s t Distribution. •. Student's t-Distribution (or simply the tdistribution) is a Probability Distribution used to model normally distributed population when the sample size is small. •. Probability Density Function:. f(x) = Γ(Ʋ+1)/2___ * ( 1 + t2/(Ʋ) Supply Chain Analytics Unit 1 Workbook 26.

(29) Deloitte Advanced Analytics Group. Distribution. Description –(Ʋ+1)/2. √(Ʋπ)Γ(Ʋ/2) where Ʋ is the number of degrees of freedom and Γ is the gamma function •. Population Mean = 0 for Ʋ > 1, otherwise undefined. •. Variance = Ʋ / (Ʋ - 2) for Ʋ > 2, otherwise undefined. •. Standard Deviation = √ [Ʋ / (Ʋ - 2)] for Ʋ > 2, otherwise undefined. •. Student’s t-Distribution is used when population standard deviation is required to be estimated from the data. Sampling Techniques Overview Sampling is the part of statistical practice concerned with the selection of individual observations intended to yield knowledge about a population of concern, especially for the purposes of statistical inference. The stages of the sampling process are: • Defining the population of concern • Specifying a sampling frame, a set of items or possible events to measure • Specifying a sampling method for selecting items or events from the frame • Determining the sample size • Implementing the sampling plan • Sampling and data collecting • Reviewing the sampling process. Supply Chain Analytics Unit 1 Workbook 27.

(30) Deloitte Advanced Analytics Group. Central Limit Theorem The Central Limit Theorem states that the sampling distribution of the mean approaches normality as the sample size increases. • This relationship between the shape of a Population Distribution and the shape of the sampling distribution of the mean is called the Central Limit Theorem • The importance of this theorem is that it permits us to use sample statistics to make inferences about population parameters without knowing anything about the nature of the distribution for that population other than what we can get from the sample The charts below illustrate that the distribution of sample means reach normality as the sample size increases. Since we know the Normal Distribution characteristics, which are described by just two parameters (mean and standard deviation), we can now better estimate the characteristics of the entire population. n=1. n=5. Supply Chain Analytics Unit 1 Workbook 28.

(31) Deloitte Advanced Analytics Group. n = 10. n = 25. Types of Sampling Techniques There are two types of sampling techniques: • Judgment Sampling • Random Sampling. Methods of Random Sampling • Simple Random Sampling • Systematic Sampling • Stratified Sampling • Cluster Sampling Common examples of sampling bias are: • Data Mining Bias • Sample Selection Bias • Survivorship Bias • Look-Ahead Bias • Time Period Bias. Supply Chain Analytics Unit 1 Workbook 29.

(32) Deloitte Advanced Analytics Group. Supply Chain Analytics Unit 1 Workbook 30.

(33) Deloitte Advanced Analytics Group. Hypothesis Testing Description of Hypotheses Testing Hypothesis testing is a method of making statistical decisions using experimental data. It decides whether experimental results contain enough information to cast doubt on conventional wisdom.. Steps in Hypothesis Testing • Begin with an assumption or a hypothesis that is made about a population parameter • Collect sample data and conduct statistical analysis for the sample, which is then used to determine the likelihood that the hypothesized population parameter is correct. Key Concepts • Null Hypothesis: This is the statement of the assumed or hypothesized value of the population parameter before we begin sampling. This assumption is called the null hypothesis and is denoted by H0. Null hypothesis is the default, conservative assumption. The test is trying to see if the data sufficiently proves the alternate. • Alternate Hypothesis: Whenever the null hypothesis is rejected, the conclusion that is accepted is called the alternate hypothesis and is denoted by HA/ H1 • Significance Level: It is defined as the probability of making a decision to reject the null hypothesis when the null hypothesis is actually true (i.e. a false negative) • Two-Tailed and One-Tailed Tests: A two-tailed test will reject the null hypothesis if the sample mean is significantly higher or lower than the hypothesized population mean (rejection). This can be contrasted with the onetailed test where there is only one rejected area • Standard Error: The standard error of a method of measurement or estimation is the standard deviation of the sampling distribution associated with the estimation method Five-Step Process for Hypotheses Testing Step 1:. State your hypotheses. Decide whether this is a two-tailed or one-tailed test. Select a level of significance appropriate for this decision.. Step 2:. Decide which distribution (t or z) is appropriate (from the table below) and find the critical values for the chosen level of significance from the appropriate table.. Step 3:. Calculate the standard error of the sample statistic. Use the standard error to convert the observed value of the sample statistic to a standardized value.. Step 4:. Sketch the distribution and mark the position of the standardized sample value and the critical values for the test.. Step 5:. Compare the value of the standardized sample statistic with the critical values for this test and interpret the result.. Supply Chain Analytics Unit 1 Workbook 31.

(34) Deloitte Advanced Analytics Group. Decision Table for Distribution Selection. Exercise Question 3.4: Fizz-O, a leading cola manufacturing and distribution company, is considering expanding its operations in New Jersey and Delaware. As a part of developing its expansion strategy, Fizz-O wants to establish if the average annual consumption of cola in these two states is different from that of the entire US. Fizz-O’s marketing team has already conducted a survey across 400 people (identified using random sampling) in each of the two states, and determined the state-wise cola consumption levels – in NJ, sample average = 1.6 gallons/year and 2.0 gallons/year in DE,. It is known that the average cola consumption across the US is 1.2 gallons/year with standard deviation = 6. Solution: Step 1: State your hypotheses and decide whether this is a two-tailed or onetailed test. Define the Hypothesis:. Decide Test:. This is a two-tailed test because our business decision is impacted if the statelevel annual average consumption is either higher than or lower than the national annual average consumption. Reflective Question: What will be the appropriate level of significance for this decision?. Supply Chain Analytics Unit 1 Workbook 32.

(35) Deloitte Advanced Analytics Group. Exercise Reflection: Use the space below to note the important things you have learned about solving the previous Hypothesis testing problem. ________________________________________________________________ ________________________________________________________________. Supply Chain Analytics Unit 1 Workbook 33.

(36) Deloitte Advanced Analytics Group. Section 4 – Fundamentals of Operations Research (Part II). Basic Concepts of Integer Programming Integer Programming Concepts When all the variables are integers, the integer program is called All Integer Program. When some, but, not all of the variables are integers, the integer program is called a Mixed Integer Program. In many applications of Integer Programming, one or more integer variables are required to equal either 0 or 1. Such variables are called binary variables. If all variables are 0-1 variables, it is a 0-1 Integer Program. The Linear Program that results from dropping the integer requirements is called the Linear Program Relaxation of the Integer Program.. Cost of Production In many fixed cost applications, the cost of production has two components: • Set up, which is a fixed cost • Variable Cost, which is directly related to the production quantity Set up cost is included in a model for a production application using binary variables (1 to produce, 0 not to produce).. Exercise Question 4.1: Three raw materials are used to produce three products (in tons): a fuel additive, a solvent base, and a laundry detergent.. Supply Chain Analytics Unit 1 Workbook 34.

(37) Deloitte Advanced Analytics Group. The company has 20 tons of Material A, 5 tons of Material B, and 21 tons of Material C, and is interested in determining the optimal production quantities for the upcoming planning period.. Solution Step 1: Formulate the Linear Program. Step 2: Conversion to Integer Programming Form. Reflective Question: What will be the final Cost Model for the problem?. Exercise Reflection: Use the space below to note the important things you have learned about solving problems using the Integer Programming model. ________________________________________________________________ ________________________________________________________________. Supply Chain Analytics Unit 1 Workbook 35.

(38) Deloitte Advanced Analytics Group. Sensitivity Analysis Sensitivity Analysis Sensitivity analysis is the study of how the changes in the coefficients of a Linear Program affect the optimal solution.. Optimization Using Excel Solver Excel Solver is a Linear Programming solving option used by Microsoft Excel. You can install Microsoft Excel Solver by selecting the Microsoft Office Button > Excel Options > Add-Ins > Solver Add-In. Some of the other integer linear programs software packages available on the market are: • MPSX – MIP • OSL • CPLEX • LINDO. Supply Chain Analytics Unit 1 Workbook 36.

(39) Deloitte Advanced Analytics Group. Section 5 – Network Problems (Part II). Minimum Cost Flow Problem Overview of the Minimum Cost Flow Problem The Minimum Cost Flow Problem is used to send flow from a set of supply nodes to a set of demand nodes through the arcs of a network, at minimum total cost, and without violating the lower and upper bounds on flows through the arcs. This problem is used for moving only one product / commodity at a time. Decision: Which arcs are to be used, given the lower and upper bounds or each arc? Objective: Minimize total cost. For each arc: x = Cost of transportation per unit y = Lower capacity constraint z = Upper capacity constraint. Supply Chain Analytics Unit 1 Workbook 37.

(40) Deloitte Advanced Analytics Group. For each supply node: [a = Available supply of commodity X]. For each demand node: [b = Demand for commodity X]. Linear Programming Method – Standard Form i = index for origins, i = 1, 2, 3…m ; j = Index for destinations, j = 1, 2, 3…n cij = Cost per unit shipped from origin i to destination j; si = Supply or capacity in units at origin i dj = Demand in units at destination j; lij = Lower bound on the flow from origin i to destination j uij = Capacity on the flow from origin i to destination j xij = Number of units shipped from origin i to destination j, where xij is only defined for arcs that exist in the network. Variations to Minimum Cost Flow Problem • Assignment Problem • Transshipment Problem • Transportation Problem • Shortest Path Problem • Maximal Flow Problem • Unbalanced Minimum Cost Flow Problems Sample List of Algorithms for Minimum Cost Flow Problem • Negative Cycle Algorithm • Successive Shortest Path Algorithm • Primal-Dual Algorithm • Out-of-Kilter Algorithm. Supply Chain Analytics Unit 1 Workbook 38.

(41) Deloitte Advanced Analytics Group. Transportation Problem Overview of the Transportation Problem The Transportation Problem can be used to minimize the cost of shipping goods from multiple origins to multiple destinations. It is typically used in distribution planning, where the quantity of goods available at a supply location is limited, and the quantity of goods required at each demand location is known. This is a more specific form of the Minimum Cost Flow problem. Decision: How much to ship along each arc between any origin and destination? Objective: Minimize shipping cost.. Linear Programming Method – Standard Form i = Index for origins, i = 1, 2, 3….m j = Index for destinations, j = 1, 2, 3….n xij = Number of units shipped from origin i to destination j cij = Cost per unit shipped from origin i to destination j si = Supply or capacity in units at origin i dj = Demand in units at destination j. Supply Chain Analytics Unit 1 Workbook 39.

(42) Deloitte Advanced Analytics Group. Variations to Transportation Problem Assignment Problem: • All supply and demand values equal 1 and the amount shipped over each arc is either 0 or 1 • Primarily used for assignment of resources to specific tasks such as project staffing in large corporations and deployment of armed forces personnel Supply vs. Demand Problem: • Total supply is not equal to total demand • For this sample business problem, you can create a dummy supply and demand node which acts as a catch-all for the excess supply and demand Other Problems: • Objective function is maximized rather than minimized (e.g., profit criterion) • Routes that have specified capacity restrictions or minimums • Some routes may be unacceptable Sample List of Algorithms for Transportation Problem • Northwest Corner rule • Minimum Cost Method • Vogel’s Approximation Method • Stepping Stone Method • Modified Distribution Method. Multi-Commodity Flow Problem Overview of the Multi-Commodity Flow Problem The Multi-Commodity Flow Problem is a Network Flow Problem that has multiple commodities flowing through a network, where each commodity has different supply and demand nodes and each arc route has capacity restrictions. In finite time, only approximate algorithms can be used. Decision: How much quantity of each commodity should be sent through each arc, given supply-demand and capacity constraints? Objective: Flow assignment that satisfies the constraints.. Supply Chain Analytics Unit 1 Workbook 40.

(43) Deloitte Advanced Analytics Group. For each arc: x = Cost of transportation per unit (varies for each commodity) y = Lower capacity constraint z = Upper capacity constraint For each supply node: a = Available supply of commodity A b = Available supply of commodity B c = Available supply of commodity C For each demand node: d = Demand for commodity A e = Demand for commodity B f = Demand for commodity C. Linear Programming Method – Standard Form K = Index for number of commodities, K = 1, 2, 3…k ckij = Cost per unit of commodity k along arc (i,j) uij = Capacity on arc (i,j) ski = Available supply of commodity k at node i dkj = Required quantity (demand) of commodity k at node j xkij = Flow of commodity k along arc (i,j), where xij is defined only for those arcs that exist in the network. Supply Chain Analytics Unit 1 Workbook 41.

(44) Deloitte Advanced Analytics Group. Variations to Multi-Commodity Flow Problem • Minimum Cost Multi-Commodity Flow Problem: This problem is applied where there is a cost associated with sending flow on each arc that needs to be minimized • Maximum Multi-Commodity Flow Problem: This problem is applied where there are no hard demands on each commodity, but the total throughput has to be maximized • Maximum concurrent flow problem: This problem is applied where the task is to maximize the minimal fraction of the flow of each commodity to its demand Sample List of Algorithms for Multi-Commodity Flow Problem • Dantzig-Wolfe Decomposition • Frank-Wolfe Algorithm • Lagrangian Relaxation • Augmented Lagrangian Relaxation • Proximal Decomposition. Which Problem to Choose? Common Limitations of Network Flow Problems • Most business problems may not perfectly fit into the format of a particular Network Flow Problem. These problems can be used as a basis for conceptualizing other heuristics • Most special purpose algorithms can be used to solve only single objectives. It may be necessary to use Linear Programming or other heuristics if there are additional constraints or objectives • When arc values in a particular network are negative, customized algorithms need to be used • Depending on the nature of the business problem, objectives may need to be maximized or minimized. For example, given that the Shortest Path Algorithm always identifies a minimum value solution, it may not be ideal to apply the algorithm to situations that involve a profit criterion. Supply Chain Analytics Unit 1 Workbook 42.

(45) Deloitte Advanced Analytics Group. Dynamic Programming Overview of the Dynamic Programming Dynamic Programming is a unique problem solving approach that decomposes a large, complex problem into multiple smaller problems that are easier to solve. The Dynamic Programming approach results in the optimal solution for the large problem once all the smaller problems have been solved. Linear Programming Method – Standard Form xn = State variables, which represent input to stage n (output from stage n + 1) dn = Decision variable at stage n tn = Stage transformation function that determines the stage n output rn = Return function for stage n, which represents the payoff or value for a stage N = Number of stages in the dynamic program. N varies from 1 to N. The general expression for the stage transformation function is xn-1 = tn (xn, dn) The general expression for the return function is rn (xn, dn). Supply Chain Analytics Unit 1 Workbook 43.

(46) Deloitte Advanced Analytics Group. Section 6 – Applied Statistics (Part II). Simple Regression and Correlation Overview of Simple Regressions and Correlations Regressions and correlations deal with the determination of relationships between variables. Both regression and correlation analyses help to determine the nature and strength of a relationship between variables. Regression analysis is used to develop an estimating equation, which is a mathematical description of the relationship between a known variable and an unknown variable. Correlation analysis is used to determine the degree to which the variables are related. In essence, correlation analysis is used to decide how well the estimating equation actually describes the relationship.. Causality between Variables There is usually a causal relationship between the dependent and independent variables. For example, as the relationship between advertising spends and sales – an increase in advertising spends causes an increase in sales.. Supply Chain Analytics Unit 1 Workbook 44.

(47) Deloitte Advanced Analytics Group. Supply Chain Analytics Unit 1 Workbook 45.

(48) Deloitte Advanced Analytics Group. Scatter Diagram A Scatter Diagram is a diagram in which the data is plotted on a chart. Some of the uses of Scatter Diagram are: • Helps visually identify if there are any patterns to indicate that the variables are related • Identifies the kind of line and required estimation equation that describes the relationship Different types of scatter diagrams are:. Supply Chain Analytics Unit 1 Workbook 46.

(49) Deloitte Advanced Analytics Group. Equation for a Straight Line. Equation for a Straight Line To fit a regression line mathematically, it is necessary to “fit” a line such that it minimizes the total square error between the estimated points on the line and actual observed points that were used to draw it. Squaring the errors magnifies (or penalizes) larger errors, and cancels the effect of positive and negative values.. Supply Chain Analytics Unit 1 Workbook 47.

(50) Deloitte Advanced Analytics Group. Formulas for the Method of Least Squares. The slope of a line (b) obtained using linear least squares fitting is called the Regression Coefficient.. Estimating the Regression Equation Several statistical packages are readily available that estimate the regression equation and provide the coefficients.. Supply Chain Analytics Unit 1 Workbook 48.

(51) Deloitte Advanced Analytics Group. Example: Microsoft Excel. For any set of values for X and Y, Excel can be used to rapidly plot the linear trend line and derive the regression equation.. Standard Error of Estimate The measure of reliability is called the Standard Error of Estimate. Standard Error of Estimate is denoted by se. It measures the variability, or scatter, of the observed values around the regression line. Statistical packages calculate the se and provide the value as the output.. Formula for Standard Error. Interpretation The se can be used to form bounds around the regression line as follows: • 68% of the points can be found within a band of +/- 1 se around the regression line • 95.5% of the points can be found within a band of +/- 2 se around the regression line • 99.7% of the points can be found within a band of +/- 3 se around the regression line. Supply Chain Analytics Unit 1 Workbook 49.

(52) Deloitte Advanced Analytics Group. Correlation Analysis Correlation Analysis is a statistical tool that is used to describe the degree to which one variable is linearly related to another. It is used in conjunction with regression analysis to measure how well the regression line explains the variation of the dependent variable.. The sign of r indicates the direction of the relationship between the two variables. • r2 = 1 and r = 1, means that the two variables are perfectly correlated and the slope of the line is positive • r2 = 0 and r = 0, means that the two variables are not at all correlated • r2 = 1 and r = -1, means that the two variables are perfectly negatively correlated and the slope of the line is negative For example, if r2 = 0.45, it means that only 45% of the total variation in the dependent variable is explained by the regression line. It is important to note that r2 measures only the strength of a linear relationship between two variables.. Supply Chain Analytics Unit 1 Workbook 50.

(53) Deloitte Advanced Analytics Group. Multiple Regression and Correlation Overview of Multiple Regression More than one variable is used to estimate the dependent variable to increase the accuracy of the estimate. For example, there is a positive relationship between demand for sunglasses and various demographic characteristics (age, income) of the buyers – that is, demand varies directly with changes in their characteristics. This process is called multiple regression and correlation, and is based on the same assumptions and processes we discussed in simple regression.. Example Sale of Beer = β0 + β1*(Temperature) + β2(NASDAQ Levels) + β3(Price of Beer) + β4 + β5 + …... Three Step Process – Multiple Regression and Correlation Analysis Step 1: Describe the Multiple Regression Equation Step 2: Examine the Multiple Regression Standard Error of Estimate Step 3: Use Multiple Correlation Analysis to determine how well the regression equation describes the observed data and refine the model by adding or changing the terms as necessary. Assumptions Some of the key assumptions in Multiple Regression Analysis are : • Normality • Linearity • Reliability • Homoscedasticity. Standard Estimating Equation for Multiple Regression. Supply Chain Analytics Unit 1 Workbook 51.

(54) Deloitte Advanced Analytics Group. The multiple regression equation contains several types of terms that are introduced based on the situation. Some of the types are: • Linear Terms: Terms that affect the independent variable linearly – X1, X2 • Non-Linear Terms: Terms that affect the dependent variable non-linearly – X32 • Dummy Variables: Terms that represent qualitative factors like gender and can have discrete values or levels • Interaction Variables: Terms that represent combined effect of the two independent variables on the dependent - X1X2 Sample Multiple Regression Equation. Dummy / Binary Variables Dummy, or Binary variable regression models involve usage of categorical (nonquantitative) variables with two or more levels. The number of dummy variables used is one less than the number of levels of the categorical variable.. Examples • Gender is a categorical variable with two levels that can be coded as 0 and 1 • States in the U.S. is a categorical variable with 50 possible levels. Interaction Variables An interaction variable is a variable often used in regression analysis, formed by the multiplication of two independent variables. An interaction regression model is used when response to one independent variable varies at different levels than those of another independent variable. Multiple Regression model equation with interaction term:. where β3, X1, X2 are the interaction terms. Supply Chain Analytics Unit 1 Workbook 52.

(55) Deloitte Advanced Analytics Group. Standard Error of Estimate for Multiple Regression. Supply Chain Analytics Unit 1 Workbook 53.

(56) Deloitte Advanced Analytics Group. Overview and Effect of Multicollinearity (Model Issue) • Multicollinearity is a statistical phenomenon in which two or more predictor variables in a Multiple Regression model are highly correlated thereby violating the linearity assumption required • While conducting Multiple Regression analysis, the regression coefficients become less reliable as the degree of correlation between the independent variables increases • In contrast to simple regression where each variable is highly significant, in Multiple Regression, the variables are collectively very significant but individually not significant • Although it may still be possible to make estimations when Multicollinearity is present, results may change erratically in response to small changes in the model or the data • This is particularly important as it is possible to accurately predict how the dependent variable will change as you tweak any of the independent variables that are correlated with another independent variable. Indicators of Multicollinearity (Model Issue) • Large changes in regression coefficients when an independent variable or additional observations are added • The model as a whole does a good job explaining the data, but none/few of the coefficients are statistically significant by themselves • Variance Inflation Factor (VIF) of > 5; where VIF = 1/ (1-R2). Common Remedies for Multicollinearity (Model Issue) • Drop one of the variables that is causing the Multicollinearity at the risk of imminent bias in the remaining variables • Obtain more data. Overview and Effects (Heteroscedasticity & Error Trends) • Ideally, residuals or error terms are randomly scattered around 0 (the horizontal line), providing a relatively even distribution. Heteroscedasticity is indicated when the residuals are not evenly scattered around the line. Example: The error term could vary or increase with each observation, something that is often the case with cross-sectional or time series measurements • Heteroscedasticity does not mean your coefficients are wrong, but rather that the model becomes less accurate as you increase term values • Heteroscedasticity often occurs when there is a large difference among the sizes of the observations • Seeing other trends (i.e. nonlinear relationship) in the model will clue you in to missing model terms. Supply Chain Analytics Unit 1 Workbook 54.

(57) Deloitte Advanced Analytics Group. Detection and Remedy (Heteroscedasticity & Error Trends). • Residual plots (plot of error terms) in Multiple Regression Analysis allows visual detection of heteroscedasticity • Dealing with heteroscedasticity is reasonably straightforward but a little technical. Techniques are widely available and can be found through textbooks, SMEs, etc. • For dealing with other error trends, you need to add additional terms to your model. For example, if you see a parabola in the error terms, you should try adding an x2 term Exercise: Using what you’ve just learned, interpret the output for the following problem: Question 6.1: For this problem, we’ll return to the case of Moondrop Airline Corporation (MAC) from the Network Problems course. MAC has expanded its operations to cover 15 terminals and has recently conducted a survey across these terminals for the month of February. The information collected covers sales, spend on promotions, number of competing airlines at that terminal and the number of passengers who flew for free.. Solution Step 1: Input. Supply Chain Analytics Unit 1 Workbook 55.

(58) Deloitte Advanced Analytics Group. Step 2: Output. Reflective Question: To arrive at the multiple regression equation, how are the coefficients interpreted?. Exercise Reflection: Use the space below to note the important things you have learned about solving problems using the Multiple Regression and Correlation Analysis. ________________________________________________________________ ________________________________________________________________ Supply Chain Analytics Unit 1 Workbook 56.

(59) Deloitte Advanced Analytics Group. Factor Analysis, Clustering & Discriminant Analysis Overview of Factor Analysis Factor analysis (also called PCA – Principal Component Analysis) is a statistical method used for data reduction and summarization. Observed variables are represented in terms of variables which are unobserved (factors). It investigates whether a number of variables of interest are linearly related to a small number of unobserved factors.. Benefits of Factor Analysis The primary benefit of Factor Analysis is that a large number of correlated variables can be reduced to a manageable level: • Fewer number of factors results in ease of interpretation and reduced complexity • Effects of Multicollinearity are eliminated as the factors are orthogonal to each other Commercially available statistics packages can be used to conduct this analysis. Excel doesn’t have the capability to conduct Factor Analysis.. Factor Tables The parameters (coefficients) of linear function between unobserved variables and unobserved factors are provided in the output table: Factors Variables Luxury. Factor 2 Factor 3. Prestige. 0.7655. 0.1242. 0.3343. Strong Brand. 0.9876. 0.3423. 0.5684. Variable 3. 0.4566. 0.4533. 0.8977. Variable 4. 0.3424. 0.9856. 0.3455. …. 0.4666. 0.6753. 0.3453. World-class Service. 0.7643. 0.2342. 0.5564. Value for Money. 0.1226. 0.4674. 0.7896. …. 0.6773. 0.3433. 0.8996. …. 0.3453. 0.8772. 0.3453. Supply Chain Analytics Unit 1 Workbook 57.

(60) Deloitte Advanced Analytics Group. Variables Each variable is weighted proportionately to its involvement in the factor. The more involved a variable, the higher the score (positive or negative depending on the direction of relation). Scores on multiple variables of each sample can be converted to a limited number of factors using a linear equation derived from a factor loading table:. Factors The factor scores are unobserved and abstract; therefore, its direct interpretation is not available. Once we have factor scores, we can use them as independent variables in regression as follows:. Types of Factor Analysis • Exploratory Factor Analysis • Confirmatory Factor Analysis. Applications of Factor Analysis Some of the business situations in which Factor Analysis is used are: • Behavioral sciences and psychometrics • Social sciences • Marketing • Product management • Operations research • Other applied sciences that deal with large quantities of data. Supply Chain Analytics Unit 1 Workbook 58.

(61) Deloitte Advanced Analytics Group. Overview of Clustering. Clustering is used to identify the intrinsic grouping in a set of objects and classify them into relatively homogenous groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters.. Cluster Dendogram A dendogram is a graphical representation of a hierarchy of nested cluster solutions starting from a one-cluster solution all the way through to an n-cluster solution. Drawing a perpendicular line through the dendogram corresponding to a particular distance shows the cluster solution at that level of distance.. Method of Clustering • Hierarchical Methods • Partitioning Methods. Applications of Clustering • Market segmentation • Market structure analysis • Petroleum geology • Data mining • Pattern recognition • Image analysis • Biology and numerical taxonomy. Overview of Discriminant Analysis The objective of Discriminant Analysis is to classify objects (people, items, etc.) into two or more groups based on the features of the objects.. Supply Chain Analytics Unit 1 Workbook 59.

(62) Deloitte Advanced Analytics Group. Approaches to Discriminant Analysis Discriminant analysis is an analysis of dependence method where the dependent variables are categorical in nature, dividing the set of observations into mutually exclusive and collectively exhaustive groups. A categorical variable classifies objects into categories (e.g., good/bad, high/medium/low, etc.). Typically, G – 1 variables (each a binary indicator) describe membership in G mutually exclusive and collectively exhaustive groups. The output of discriminant analysis is an equation (similar to the regression equation) involving independent variables which calculate the discriminant score, and also a cut-off score to identify membership of each of the items into groups. Commercially available statistics packages can be used to conduct this analysis.. Discriminant Analysis Tool Output – Standard Form The score for each object can be calculated for which we want to predict the group membership using canonical discriminant function. The decision to which group the object belongs is made by comparing the score with a calculated cutoff score. Canonical Discriminant Function Coefficients:. Functions at Group Centroids:. For each object, the discriminant score can be calculated using the equation. This score can be compared to the cut-off score to determine into which group the item can be classified. Common Methods of Discriminant Analysis • Fisher’s Approach • Mahalanobis' Approach Applications of Discriminant Analysis • Product management. • Bankruptcy prediction. • Marketing research. • Credit scoring. • Face recognition. Supply Chain Analytics Unit 1 Workbook 60.

(63) Deloitte Advanced Analytics Group. Solutions. Solution 1.1: Let the diet contain x units of A and y units of B. Total cost = 2x + 4y Objective Function: Minimize Z = 2x + 4y Constraints: 10x + 25y ≥ 200 20x + 10y ≥ 100 15x + 20y ≥ 150 x ≥ 0, y ≥ 0. Solution 1.2: Step 1: Since x>0, y>0, we consider only the first quadrant of the xy – plane Step 2: We draw straight lines for the equation 2x+ y = 100 x + y = 80 To determine two points on the straight line 2x + y = 100 Put y = 0, 2x = 100 x = 50 (50, 0) is a point on the line 2 put x = 0 in (2), y =100 (0, 100) is the other point on the line 2 Plotting these two points on the graph paper draw the line which represent the line 2x + y =100.. Supply Chain Analytics Unit 1 Workbook 61.

(64) Deloitte Advanced Analytics Group. This line divides the 1st quadrant into two regions, say R1 and R2. Choose a point say (1, 0) in R1. (1, 0) satisfy the inequation 2x + y required region for the constraint 2x + y. 100. Therefore R1 is the. 100.. Similarly draw the straight line x + y = 80 by joining the point (0, 80) and (80, 0). Find the required region say R1', for the constraint x + y. 80.. The intersection of both the region R1 and R1' is the feasible solution of the Linear Programming problem. Therefore every point in the shaded region OABC is a feasible solution, since this point satisfies all the constraints including the nonnegative constraints.. Supply Chain Analytics Unit 1 Workbook 62.

(65) Deloitte Advanced Analytics Group. Solution 2.1:. Solution 2.2:. Supply Chain Analytics Unit 1 Workbook 63.

(66) Deloitte Advanced Analytics Group. Solution 2.3:. Solution 3.1: If the possible outcomes for an experiment are a1, a2, . . .,an, and if the probabilities of these outcomes are p1, p2, . . ., pn then the expected value is E = a1 p1 + a2 p2 + . . . an pn Expected Value E = 0(0.50) + 1(0.40) + 2(0.10) = 0.6. Supply Chain Analytics Unit 1 Workbook 64.

(67)

No results found