Big Data Process Analytics for Continuous Process Improvement in Manufacturing

(1)

2015 IEEE International Conference on Big Data (Big Data)

Big Data Process Analytics for Continuous Process Improvement in Manufacturing

Nenad Stojanovic

NISSATECH INNOVATION CENTRE DOO Nis, Serbia [email protected]

Marko Dinic

NISSATECH INNOVATION CENTRE DOO Nis, Serbia [email protected]

Ljiljana Stojanovic

FZI Forschungszentrum Informatik

am KIT

Karlsruhe, Germany [email protected] Abstract: One of the most important challenges in

manufacturing is the continuous process improvement that requires new insights about the behavior/quality control of processes in order to understand the optimization/improvement potential. The paper elaborates on usage of big data-driven clustering for an efficient discovering of real-time unusualities in the process and their route-cause analysis. Our approach extends traditional clustering algorithms (like k-Means) with methods for better understanding the nature of clusters and provides a very efficient big data realization. We argue that this approach paves the way for a new generation of quality management tools based on big data analytics that will extend traditional statistical process control and empower Lean Six Sigma through big data processing. The proposed approach has been applied for improving process control in Whirlpool (washing machine tests, factory in Italy) and we present the most important finding from the evaluation study.

Keywords: Big data, manufactoring, quality control I. INTRODUCTION

Due to dynamically changing business environment and especially permanently increasing competition, one of the most important challenges in manufacturing nowadays is the continuous process improvement (CPI) - defined as an ongoing activity aimed at improving processes, products and services through sustainable changes over a period of time.

Most CPI strategies incorporate the Lean Six Sigma1 principle, which is a combination of techniques and tools from both the Six Sigma Methodologies and the Lean Enterprise2_{. The Six Sigma methodology is based on the} concept that a "process variation” can be reduced using statistical tools. The goal of Lean is to identify and eliminate non-essential and non-value added steps in a business process in order to streamline production, improve quality and gain customer loyalty.

Lean Six Sigma practitioners have been improving processes for years through statistical analysis of process data in order to identify the critical parameters and the

1 _{https://en.wikipedia.org/wiki/Lean_Six_Sigma} 2 _{https://en.wikipedia.org/wiki/Lean_enterprise}

variables that have the most impact on the performance of a value stream and to control their variations. However, the main constraint is the complexity of the statistical calculations that should be applied on the (large) datasets. A well-know example is the multivariate statistical analysis. Standard quality control charting techniques (e.g., Shewhart charts, X-bar and R charts, etc.) are applicable only to single variable and cannot be applied to modern production processes with hundreds of important variables that need to be monitored. Indeed, the diversity of process measurement technologies from conventional process sensors to images, videos, and indirect measurement technologies has compounded the variety, volume, and complexity of process data3_{. For example, it is typical in a modern FAB} (semiconductor manufacturing) that over 50,000 statistical process control charts are monitored to control the quality of over 300 manufacturing steps in the fabrication of the chip [1].

Moreover, generated data can be extremely big: e.g. in-process monitoring in additive manufacturing (3D printing) produces 100MB – 1GB data, whereas in-process geometry inspection generates 1- 10GB data volume per part4_.

Although process operations are rich in data, without effective analytical tools and efficient computing technology to derive information from data, it is often the case that data is compressed and archived for record keeping and only retrieved for use in emergency analysis after the fact rather than being used in a routine manner in the decision-making process.

In this paper we present a novel big data approach for continuous process improvement that exploits above mentioned advantages for enabling better understanding of a (dynamic) nature of a process and boosting innovations.

We argue that by performing Big data analytics on the past process data we can model what is (statistically analyzed) usual/normal for a selected period and check the variations from that model in the real-time (as Six Sigma requires). Additionally, these data-driven models can support the root-cause analysis that should provide insights what can be eliminated as a waste in the process (as Lean 3 _{http://wenku.baidu.com/view/59f0c1bd84254b35effd346f.html?} re=view

4 _{Sigma Labs, In-process Quality Assurance, Industrial 3D} Printing Conference

(2)

requires). However, due to the above mentioned variety and volume of data, the analytics must be a) robust – dealing with differences efficiently and b) scalable - realized in an extremely parallel way.

The proposed approach has been applied for improving process control in Whirlpool factory in Italy based on washing machine tests. In this paper we present the most important finding from the evaluation study.

The paper is organized in the following way: in Section 2 we present some challenges for big data analytics, whereas in Section 3 we present our approach for big data clustering. Section 4 provides details about the case study and Section 5 summarizes the results.

II. CHALLENGES FOR BIG DATA ANALYTICS FOR PROCESS IMPROVEMENT

In the nutshell of improving a process is the understanding of the nature of the process – what is its normal/usual behavior? However, the pace of change is continuously increasing and introduces new computational challenges for continuous process improvement. There are two main issues that challenge traditional Lean Six Sigma approach for continuous improvement:

x the number of parameters that can be measured in a process and corresponding size of data to be analyzed is exploding (note that it is strongly influenced by the supply-chain networks, which expands the space of interest dramatically) and

x process variations cannot be checked against predefined (expert) rules – the dynamics of the process context requires changing the rules to be applied.

Therefore the detection of variations is not anymore the question of optimizing statistics formulas, but rather the challenge for defining what is “normal/usual” in the dynamically changing business environment. This is where Big Data comes to the game:

x by being inherently data-driven, big data processing of manufacturing data is able to generate valid models of process behavior.

x by being very scalable, big data processing is able to work in high-dimensional spaces of interests with low latency.

x by using unsupervised learning, big data processing can continuously improve/adapt the performances of the underlying task.

In this paper we present how variations in a manufacturing process can be detected using unsupervised data analytics, namely big data clustering.

We define following three requirements that should be satisfied in a big data approach for detecting anomalies

x R1: Precision: how to define what is similar (metrics) – what is usual?

x R2: Interpretation: how to understand why something is not similar – why it is unusual?

x R3: Scalability: how to ensure that by using as much as possible data, the results of processing will be calculated as fast as possible?

III. OUR APPROACH FOR BIG DATA CLUSTERING Clustering algorithms tend to identify groups of similar objects and produce partitions for a given dataset. A number of clustering algorithms exist, from partitioning algorithms such as K-means5_{, over hierarchical algorithms that form a} tree of clusters (dendogram) by performing clustering on different levels, to density based algorithms such as DBSCAN [2] that group objects based on the neighborhood of each object. All of these algorithms have their own purpose, advantages and weaknesses, so a great caution is needed while choosing the appropriate clustering method.

K-medoids6_{is a partitioning algorithm, similar to} K-means, that uses medoids to represent clusters. Unlike the K-means algorithm where centroids are used to represent clusters, in case of K-medoids, medoid is one of the objects from the dataset that is the best representative of the cluster. PAM (Partitioning Around Medoids) is the most common realization of this algorithm. The basic steps of K-medoids algorithm are initialization, assignment of objects to closest medoid and new medoid selection for each of the clusters. Medoid selection is the most expensive procedure of the algorithm. FAMES is a medoid selection algorithm that tries to overcome this problem.

K-medoid algorithms try to find optimal medoids in the dataset, while finding a single medoids requires O(n2₎ distance calculations. This makes this algorithm practically unusable for bigger datasets. FAMES (FAst MEdoid Selection) [3] represents an improvement of K-medoids algorithm by offering a fast selection of good representatives.

Our solution represents a combination of scalable K-means|| [4] (K-means parallel) initialization and K-medoids like algorithm that relies on FAMES for medoid selection. K-means|| represents an improvement of K-means++ [5] algorithm. The major downside of the K-means++ is its inherent sequential nature, which makes it difficult to use in case of big data because it requires k passes over the data to find good initial set of centers. On the other hand, K-means|| is able to find good initial medoids in niterations, where nis usually much smaller than k. Another good property of this initialization approach is that it can be easily distributed therefore achieving greater speed than K-means++ and can be used in the case of much bigger datasets.

The solution we propose represents an improvement of aforementioned FAMES algorithm in a way that it can be used for clustering large datasets in a parallel, distributed fashion over a cluster of machines, such as Hadoop clusters. FAMES in its original form was designed to execute on a single machine, so the main challenge was to parallelize (distribute) the execution of this algorithm in a form of

MapReducejobs so it could run on a cluster of machines. This includes defining a number of independent steps

5 _{https://en.wikipedia.org/wiki/K-means_clustering} 6 _{https://en.wikipedia.org/wiki/K-medoids}

(3)

(independent primarily in the sense of data dependency) that could be performed in parallel and combining the results of that steps to get the final result. Hadoop cluster consists of a number of machines that represent commodity hardware

and execute MapReduce[6] jobs. MapReduce job splits the input data-set into independent chunks which can be processed by Mappers in a parallel manner, sorts and shuffles the intermediate results and sends them to

Reducers, which aggregate outputs from multiple Mappers

producing final result. This concept powers Hadoop and enables it to process data too big to fit into memory of a single computer. Our solution is implemented as a sequence of MapReducejobs. To determine the appropriate number of clusters elbow method7_{is used.}_{Elbow method} _comes from assumption that the optimal number of clusters can be determined by performing clustering in a certain number of iterations increasing the number of clusters in each iteration and observing the total distance between cluster representatives and objects assigned to their clusters. The number of clusters is increased while a significant improvement in clustering quality exists. Also, clustering is performed multiple times for the same number of clusters to get the clustering of greatest quality (which depends on initial medoids). The steps of our solution are presented below.

Adapt for clustering (produce standard input for the algorithm for different datasets)

while (ratio of two sequential clustering errors is below specified limit)

{

for (specified number of repetitions for the same number of clusters)

{

perform initialization generate random seed

selecting initial medoids using K-means|| algorithm perform clustering

for (specified number of clustering iterations) {

assignment of points to medoids medoid selection

finding first pivot finding second pivot finding X and m

selecting new medoids as points closest to Xm }

final assignment to medoids }

}

Figure 1 illustrates the clustering approach. We explain it shortly here. To be able to work with different datasets we reserve space for an Adapter, which has the purpose of adapting the dataset of interest to the uniform input of the algorithm. Each of the phases on the diagram is a

MapReducejob. The Mappersmostly have the purpose of finding distance between all the objects in the dataset and a small subsample (or possibly only one object), while the 7 _{https://en.wikipedia.org/wiki/Determining_the_number_of_clust} ers_in_a_data_set

Reducersconsider the distances provided by Mappers and find the best representatives for the next stage. Combiners

are also used, when possible, to optimize the procedure. The whole dataset is provided as an input to MapReduce job, while the small subsample is given through the distributed cache, a Hadoop feature that allows to distribute small amount of data to all machines in the cluster. All the data is transferred between phases in a form of Writables,a specific construct of serialization customized for the needs of Hadoop, and written to HDFS in a form of SequenceFiles. This allows faster data transfer and less data to be written to HDFS.

The first step in K-medoids algorithm is to select initial medoids. As it has already been explained, initialization is performed using K-means|| algorithm.

Initialization block contains two units. Generate random seed unit selects specified number of objects as seed for initialization. Select initial medoids unit is a simplification of the actual process of selecting initial medoids. This part represents the concrete realization of K-means|| algorithm. Firstly, a number of iterations is performed to create a sketch of data. This sketch contains objects that are good representatives of the whole dataset. This representatives are found in a way that they are spread apart as widely as possible. The difference between means|| approach and means++ approach is that K-means|| uses oversampling to select good representatives in less iterations. Secondly, when the sketch is created, we assign weights to all the objects (based on the distance between the objects in sketch and all the other objects) and select the final initial medoids. In essence, that leads to selection of spread apart medoids that have many objects around them, which seems appropriate, since we want to have initial medoids in coresof clusters.

Clustering block contains two parts – assigning points to medoids and selecting new medoids. Assigning points to medoidsunit assigns each object to its closest medoid based on specified distance measure (in first iteration it calculates distances to initial medoids). During this phase starting points from FAMES algorithm are also selected for each of the clusters. Medoid selection block is the actual (adapted for Hadoop) implementation of FAMES algorithm. In each cluster a starting point was selected during medoid assignment phase, as explained. Each of the following units work in the same way for each of the clusters. Finding first pivotcalculates distances from objects in cluster to starting point of the cluster and select first pivot as the point farthest from starting point. Finding second pivotworks in a similar way, with the difference that it considers distances from first pivot. Finding X and Munit considers distances of objects to first pivot, distances of objects to second pivot and distance between first and second pivot and finds Xi

distances and M distance as explained in [3]. Finally, we have all we need to select new medoids. Selecting new medoids as points closest to Xm unit finds new medoids based on aforementioned distances and the previously found M distance.

This whole procedure is repeated to find the clustering of best quality and to select the appropriate number of clusters, as presented in the pseudo-code shown previously.

(4)

As a result, for each cluster we get its medoid and average distance of objects to their clusters medoid.

The algorithm offers a number of parameters that allow fine tuning of clustering procedure. Following parameters exist:

x Adapter – allows adaption of the dataset to standard input of clustering algorithm;

x Distance measure – allows specification of custom distance measures;

x Initial number of clusters – used for determining the number of clusters in elbow method;

x Step – by how much to increase the number of clusters in next iteration of execution;

x Maximal number of clusters – if convergence threshold hasn’t been reached this will limit the maximal number of clusters;

x Convergence threshold – determines when to stop increasing number of clusters;

x L – how many objects to add to sketch in each iteration of initialization;

x Number of iterations for sketch selection – how many iterations while adding to sketch from which initial medoids will be selected;

x Number of iterations per clustering – how many times to repeat the procedure of assigning points to medoids and selecting new medoids;

x Number of repetitions – how many times to repeat clustering for same number of clusters to get clustering of better quality.

(5)

IV. USE CASE: WHIRLPOOL WASHING MACHINE TESTING The problem setup for washing machine tests for Whirlpool use case is as follows:

x Washing machines functional tests are provided by Whirlpool;

x Functional test is performed on every washing machine that was assembled and it is used to examine if machine functions properly;

x Various parameters are measured, such as power, speed and water inlet;

x The size of the data is very large (too large to be processed on a single machine);

x The goal is to detect anomalies – washing machines that behaved strangelyduring functional test;

x By detecting anomalies during functional tests quality of production can be increased;

x Data is provided in a compressed form, so the first step is to decompress it;

x Based on analysis a solution for detecting anomalies automatically should be implemented.

The goal is to find a way to define normal parameter values and implement a solution that will be able to identify unusual patterns in functional tests. Dataset provided by Whirlpool contains three parameters:

x Power;

x Speed;

x Total water inlet.

Initial analysis of the dataset was performed in order to find correlations between parameters, as presented in Figure 2. The goal of our analysis is to describe normal behavior and discover anomalies as behavior that does not conform to the defined model. To do that, we are using clustering approach described in the previous section. We elaborate shortly on the arguments to use that approach.

First of all, we have defined the process of detecting anomalies as follows:

x Cluster the data using some clustering algorithm that will not only produce clusters, but will also produce cluster representatives;

x Cluster representatives and cluster variances form the model;

x Each new measurement is compared to the existing model and the dissimilarity from the model determines its status as anomalous or normal.

We must notice that the dataset being analyzed has some specificities, so it is necessary to take another look at the dataset, this time considering multiple tests at the same time.

Power parameter values of multiple tests are presented on Figure 3. We can notice that even though all the graphics present the value of the same parameter, they may have very different shapes. This has a huge impact on our analysis. There is a need for robust measure that is able to cluster time series based on the shape of the series.

(6)

Figure 3. Power parameter of multiple tests

Another question occurs due to the latest condition, clustering time series based on shape. How should the cluster representative look like for a cluster that contains series of different shape? Algorithms such as aforementioned K-means produce cluster representatives as some kind of a meanor an averageof all the elements in the cluster as a representative. But that would not be an acceptable solution in our case for the following reason – different tests may be performed under different conditions. By conditions we mostly mean different timings. In case of one test centrifuge can be started part of a second earlier than in the other, or can have slightly greater duration. That

part of a secondmakes meanof series impossible to use as a representative. Because of that, we turn to another group of clustering algorithms, algorithms that select one of the objects from the cluster as a representative of the cluster, such as K-medoids. That object is the one that is the most similar to all the other objects in its cluster, and it’s usually called a medoid.This is the reason we selected K-Medoids as clustering algorithm.

First of all, we consider initial medoids selection. This question has more relevance than it may seem. Quality of final clustering heavily depends on initial medoid selection due to phenomenon of local minimum. One option would be to select kobjects randomly from the dataset, but that can result in poor clustering. That is why we use a different kind of initialization in our implementation. The idea is to select objects that are placed somewhere in the core of existing clusters as initial medoids. In this way, only a small number of iteration is needed to produce final clusters, and its role is just to refine initial clustering. This makes our algorithm faster and more precise than in case of random initialization.

Second, we consider the question of objects similarity. To perform clustering it is necessary to define a measure

that will describe how similar objects are. Similarity measure is often compared to distance measure, since the objects are observed in N-dimensional space. Maximal similarity between two objects can have a value of 1, which means that distance between them is minimal, that is, equal to 0. There are different kinds of distance measures, such as

Euclidean,Manhattan or cosine, but we will see that they are not very useful in our case. The reason for that is that we are comparing time series that can be shifted in time, or skewed, and distance measures like Euclidean do not tolerate this. That is why we approach to another kind of distance measures that considers shapes of two signals. The measure is called Dynamic Time Warping(DTW) [4] and it is able to find the optimal alignment between two signals. To show this we will observe two very similar time series, presented on Figure 4. It should be noticed that one signal is actually a modification of another created by shifting the first signal by a certain time interval. The figure represents a comparison ofEuclidean,cosineand DTWdistances of the two signals.

It can be noticed that Euclidean distance is very large, and so is the cosinedistance. DTW, on the other hand, gives distance equal to zero. This implies that DTWis immune to the phenomenon of shift, no matter how big it is. This could be very useful in our case, since different conditions are used while performing functional tests. We could interpret this shift in the following way – counter clockwise rotation was started later in the case of second signal than in the case of the first signal, so every next step in the test (for example, centrifuge being started) also gets shifted. But this doesn’t mean that something is wrong with the other test. The values are normal, they are just shifted in time.

(7)

Figure 4. Comparison of two signals – red signal is shifted even more

V. RESULTS

In this section we present the main conclusion from the performance test.

We have initially considered Powerparameter and took a small sample to validate our methods. We used a variant of K-medoidsthat we have mentioned and got five clusters, while three clusters where clusters singletons. Cluster singleton is a cluster that contains only one test. We consider these clusters anomalous, since the number of objects they contain is very small, so they differ from the rest of the dataset. Medoids that were produced are presented on Figure 5, while clusters are presented from Figure 6 to Figure 10.

By observing the shape of signals contained in the sample we can conclude that there really are two groups of signals. At the same time we can notice that signals belonging to clusters singletons have different shape than signals that exist in non-singleton clusters. We must emphasize that our primary goal is not to find anomalies while clustering, since this can be a long-term operation, but to generate a model that can be used in real timeto detect anomalies. Even so, we may detect suspicious tests in the dataset, like in our example, so the best solution is to report them, and remove them from the dataset, for safety reasons.

The initial sample was good for validation of the approach and the implementation, but afterwards all the tests provided by Whirlpool were analyzed. The dataset currently provided contains about 15.000 functional tests, but a much larger amount of data is expected (amount that demands a Hadoop cluster for processing – learning what represents normal behavior).

Figure 5. Generated medoids

Figure 6. First cluster

(8)

Figure 8. First anomaly

Figure 9. Second anomaly

Figure 10. Third anomaly

Clustering results for the whole dataset are now presented from Figure 11 to Figure 14. Again, there are clusters of normal behaviors and there is a cluster of potential anomalies (cluster presented on Figure 12).

Based on results we conclude that our algorithm represents a good solution for the problem of detecting anomalies in functional tests provided by Whirlpool. We even got a confirmation that we were able to detect tests that represent a problem that really exists in one of Whirlpool facilities (Figure 11 represents an example of such tests), which they are aware of. We developed a solution that is able to compare test series based on their shape, and to cluster tests based on it. We also used that similarity and clusters being produced to determine which tests look unusual comparing to normal examples found in the dataset.

.

(9)

Figure 12. Second cluster (large dataset)

(10)

Figure 14. Fourth cluster (large dataset)

VI. CONCLUSION

The paper elaborates on using big data-driven clustering for an efficient discovering of real-time unusualities in the process and their route-cause analysis. Our approach extend traditional clustering algorithms (like k-Means) with methods for better understanding the nature of clusters and provides a very efficient big data realization. We argue that this approach paves the way for new generation of quality management tools based on big data analytics that will extend traditional statistical process control and empower Lean Six Sigma through big data processing. The proposed approach has been applied for improving process control for washing machine tests in Whirlpool factory in Italy and the results are very promising. We have started a large-scale case study for the presented washing machine functional tests that should prove the feasibility of the approach for production environment.

ACKNOWLEDGMENT

This work is partly funded by the European Commission projects FI PPP FITMAN8_{“Future Internet Technologies for} MANufacturing” (604674) and FP7 STREP ProaSense9 “The Proactive Sensing Enterprise” (612329).

8 _{http://www.fitman-fi.eu/} 9 _{http://www.proasense.eu/}

REFERENCES

[1] Qin SJ, Cherry G, Good R, Wang J, Harrison CA. Semiconductor anufacturing process control and monitoring: a Fab-wide framework. J Process Control 2006;16: 179–91.

[2] http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf

[3] Adriano Arantes Paterlini, Mario A. Nascimento, Caetano Traina Jr., Using Pivots to Speed-Up k-Medoids Clustering, JOURNAL OF INFORMATION AND DATA MANAGEMENT Vol 2, No 2 (2011) [4] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf [5] https://www.math.uwaterloo.ca/~cswamy/papers/kmeansfnl.pdf [6] http://static.googleusercontent.com/media/research.google.com/en//ar

chive/mapreduce-osdi04.pdf

[7] Black, A. W.; P. Taylor: Automatically clustering similar units for unit selection in speech synthesis. In: Proc. Eurospeech ’97