A Wafer Map Yield Prediction Based on Machine Learning for Productivity Enhancement

(1)

Abstract—Manufacturing productivity in the semiconductor industry is a key factor in determining the competitiveness of manufacturers. In order to enhance productivity, evaluating the productivity of wafer maps prior to production and optimizing the productivity of wafer maps is one of the most effective solutions. The productivity of a wafer map is evaluated in advance by considering various factors affecting wafer productivity such as: gross dies, shot counts, lithography throughputs, mask field occupancy (MFO), prices, etc. Manufacturing process information is not determined at the initial wafer map design stage. Predicting the yield of new wafer maps before fabrication is a difficult challenge due to lack of process information. However, a yield prediction model is required to precisely evaluate the productivity of new wafer maps, because the yield is directly related to the productivity and the design of wafer map affects the yield. In this paper, we propose a novel yield prediction model based on deep learning algorithms. Our approach exploits spatial relationships among positions of dies on a wafer and die-level yield variations collected from a wafer test without process parameters. By modeling these spatial features, the accuracy of yield prediction significantly increased. Furthermore, experimental results showed that the proposed yield model and approach helps to design wafer maps with up to 8.59% higher productivity.

Index Terms—yield modeling, wafer map, wafer productivity, semiconductor manufacturing, machine learning

I. INTRODUCTION

RODUCTIVITY enhancement has been a constant challenge to memory manufacturing for improving profitability. Memory manufacturers try to increase productivity with various methods such as technology innovation, process optimization, yield improvement, defect elimination, etc. One of the most effective solutions for enhancing productivity without additional capital investment is to design a wafer map for maximum productivity. A wafer map, also known as a photo shot map, is made by placing each die on a wafer. Fig. 1 shows an example of a photo shot map. The design of wafer maps differs for each product due to the die size difference depending on product, even when using the same technology. Depending on the design of a wafer map, the map This work was supported by Samsung Electronics Co., Ltd. This paper is an extended and updated version of one previously published in the Proceedings of ASMC 2018. (Corresponding author: Sung-Ju Jang.)

The authors are with the Design Technology Team, Samsung Electronics Company Ltd., Hwaseong 18448, South Korea (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]).

may affect various factors of productivity such as: gross dies, shot counts, lithography throughput, mask field occupancy (MFO), yield, etc. [1]–[3]. Therefore, when designing a wafer map, overall productivity can be improved by designing the optimal productive wafer map considering all these various factors before manufacturing. However, among them, yield prediction of a new wafer map before manufacturing at the wafer map design stage is a difficult challenge due to process parameters that are commonly used in yield prediction models are not yet determined.

Yield is a key indicator in evaluating manufacturing performance and directly relates to productivity [4]. Accurate yield prediction models in memory manufacturing are very important for improving productivity, customer satisfaction, enhancing profitability, etc. [5]. Although many prior works have been proposed for yield modeling of wafer maps, most of them are focused on predicting yields of wafers using process or testing parameters in the manufacturing stage. Dong et al. proposed a yield prediction model incorporating spatial defect clusters and functional test data based on a fused LASSO algorithm and predicted final die yields with a logic regression model [6]. O’Donoghue et al. observed that failure die distributions on wafer maps have a probability distribution irrespective of products and can be modeled by not only radial wafer regions but also by dividing the wafer map into segment regions on the wafer map [7].

Among preceding works, the most attractive method is the

A Wafer Map Yield Prediction Based on

Machine Learning for Productivity Enhancement

Sung-Ju Jang, Jong-Seong Kim, Tae-Woo Kim, Hyun-Jin Lee, SeungBum Ko

P

(2)

radial yield degradation (RYD) model [8], which is based on radial defects. By modeling yield distributions according to die positions of wafer maps of various products, it is possible to predict yields of dies on a new wafer map. Although it is a one-dimensional approach that considers yields and distances from die positions to the center of a wafer map, the RYD model without process data has shown the potential to predict the yield of a newly designed wafer map. Nonetheless, when calculating a distance from a die to the center of the wafer, the RYD model is sensitive on where to set the reference point of a die position such as: inner, center, and outer of the die. This is important because the model performance depends on a reference point. In addition, as more training data is used to improve accuracy and avoid over-fitting of the model, it becomes more challenging to construct the model using only a simple regression method. Therefore, efficiently modeling yield distributions depending on die positions regardless of reference points is important to predict the yield of newly designed wafer maps.

We propose a machine learning based yield model for predicting the yield of new wafer maps using spatial die features before fabrication without process parameters. As a result of the yield enhancement efforts, yield distributions of wafer maps in the stable condition are similar [8], even though die features and wafer map designs are different. Thus, by learning the yield distributions of previous products utilizing machine learning algorithms, the proposed model can predict the yield of new wafer maps. In the proposed approach, we exploited die geometric features on a wafer map rather than using one point as a reference point. Features of a die are decomposed into nine spatial features: the width and height, distances from the center of the wafer and six geometric coordinates of a die on the wafer map. By extending our previous study [9], we improved the prediction accuracy using only five spatial die features, which are the distance from the center of the wafer and four geometric coordinates of a die on the wafer map, with deep neural network (DNN) architectures. We discuss here accurate yield prediction models resulting from our proposed methodology, which is utilizing die yield data of various products collected from a wafer test after fabrication and present in the form of a wafer bin map (WBM) as shown in Fig. 2. The proposed model using DNNs was efficiently trained by exploiting correlations between spatial features of dies and die-level yield variations on the wafer map and is able to accurately predict the yield of new wafer maps which consist of new die sizes and new die positions. Furthermore, our model can contribute to design a wafer map with optimal productivity from various wafer map possibilities.

II. RELATED WORK

A. Wafer Yield Models

In semiconductor manufacturing industry, the yield is a key metric of manufacturing performance and crucially affects the profitability of the enterprise. Yields are influenced by various factors such as: abnormal equipment, unstable processes,

defective circuits, physical spoil, etc. Various yield prediction methodologies were introduced for reducing the root causes of yield degradation and enhancing productivity. Semiconductor yield models have been widely used in the scheduling, optimization, and control of the fabrication process, etc. [10]. Generally, yields are classified into: line yield, die yield, and final yield [5]. The die yield, among them, is the ratio of dies on the wafer map that passed a functional test and die yield failures are divided into two categories: functional failure and parametric failure [4], [11]. The die yield model for improving the wafer map productivity has received noticeable attention as an important research topic as technology and process evolve. However, most of them are used to solve issues in the manufacturing stage by utilizing process data such as: detection of abnormal manufacturing processes, analysis of low-yield wafers, defect reduction and improvements of final die yields [11].

At the wafer map design phase, i.e. before manufacturing, only a few specifications are decided such as: die size and die positions of a wafer map of the new product. Process parameters or test parameters are determined after a wafer map has been fixed. Numerous yield prediction models using process parameters or test data in fabrication cannot be used at the early wafer map design stage, and only a few studies are available at this phase. Ferris-Prabhu et al. observed radial defects due to yield degradations at the edge of the wafer, and proved that it is not caused by design, technology, or random defects and occur across all the products [8]. Teet et al. also focused on radial defects emphasizing the relationship between the radial degradation and yield at the periphery of the wafer map. Their study presented a simplified model for the radial yield loss according to the die size and showed the relationship between distances from the center of the wafer map and yields [11].

These models are based on the RYD, a one-dimensional Fig. 2. Example of a wafer bin map. the BIN code is 1 if the die is passed wafer test and 0 if it fails, and all dies have a pass or fail information in the wafer bin map.

(3)

approach which considers distances from locations of dies to the center of a wafer map. The accuracy of models using a one-dimensional approach depends on the reference point such as: inner, center, and an outer point of a die. However, the wafer has a two-dimensional circular wafer structure, we focus on modeling entire parts of the wafer map and modeling entire geometric die features on a wafer map to more accurately model the structure of these wafers. It means that model two-dimensional spatial design features of dies and distributions of failed dies in a wafer. The related work concerning the yield and the productivity of wafer maps is discussed in the next subsection.

B. Yield and Productivity of Wafer Maps

The productivity and the yield are inseparable in memory manufacturing. Generally, since the productivity of wafers improves as the yield increases in manufacturing, the yield is directly related to the productivity, and the wafer productivity, can be expressed as [4]:

#

productivity

W

=

Die Yield



(1) where #Die is the number of dies on a wafer map and Yield is the yield of wafers. Additionally, Chien et al. proposed an expression of the wafer productivity considering wafer exposure performance and expressed that yield is directly related to the productivity [2]. The wafer productivity formula proposed by Chien et al. as follows:

productivity

DieArea

W

Yield

WaferArea

=



(2)

where die area is the gross die area on the wafer map, wafer area is the whole wafer area, and yield is the yield of wafers.

Wafer maps or photo shot maps are made by placing dies with a regular grid on a wafer. Wafer maps affect various factors of productivity such as: gross dies, shot counts, throughputs of lithography equipment, mask field occupancy (MFO) and etc. A way to design a wafer map is important because the design of a wafer map cannot change after the start of fabrication. Therefore, the wafer map is designed considering these various factors. In recent years, the ROI-Based wafer productivity model (RBP) based on return on investment (ROI) was proposed by Kim et al. [3]. The RBP model estimates the productivity of a wafer map, which considers the lithography throughput, the number of shots, yields, the number of gross dies, selling prices, and the fixed cost as parameters of the model. The formula of the RBP model as follows:

(# ) #

#

TP Die Yield PP WP FC Shots Wprod =  _{WP TP FC}  − _Shots− 

 +  (3)

where TP is the lithography throughput, #Die is the number of

gross dies, Yield is the yield of a wafer map, PP is the price of a product, WP is the price of a wafer, FC is the fixed cost, and #Shots is the number of shots. Kim et al. also proposed wafer map design optimization method with differential evaluation algorithms and the possibility to search the optimal productivity wafer map within a reasonable time. It was a meaningful study that can be used when designing a wafer map from various die size possibilities and determining a wafer map design with specific die size. However, the RBP model simply estimates the productivity of wafer maps using a fixed yield of 0.8 regardless of the design of a wafer map. Yields of wafer maps are undoubtedly an important factor for productivity evaluation. Because yields of different wafer map designs that have the same die size cannot be equal, it is important to predict the yield depending on the wafer map design. Also, when two wafer maps that have the same productivity factors have different designs, the yield of the wafer map, which is a key factor, must be predicted in order to select a more productive wafer among these wafer maps.

Consequently, the yield is one of the most important factors for the productivity evaluation of wafer maps. Accurately predicting the yield of wafer maps leads to precise productivity evaluation and proper selection of wafer maps from various wafer map possibilities.

III. WAFER MAP YIELD MODELING

The idea of the proposed model is to model the yield of wafer maps of previous products for predicting the yield of new wafer maps. In order to predict the yield of new wafer maps at the wafer map design phase, the model should be constructed only with limited information such as spatial design features of the wafer map, unlike conventional yield prediction models. Also, the target is to create a more accurate model considering the two-dimensional spatial wafer structure instead of the conventional one-dimensional approach.

In this section, we introduce the main features and structures of deep learning algorithms used in the proposed yield prediction model. Subsequently, variables used in the input variable of the proposed model and preprocessing process used a s training data and test data are explained.

A. Deep Neural Networks

Deep neural networks (DNNs) are one of the machine-learning algorithms and describe a hierarchical model using layers of neurons to form a network [12]. DNNs are able to learn complex non-linear relationships between the input and output variables by their multi-layer architecture. In addition, they also provide relatively fast runtime speed compared to other non-linear models [13]. Generally, a neural network (NN) is constructed with an input layer, a hidden layer, and an output layer. DNNs consist of two or more hidden layers between the input and output layers and fully connected between layers. Fig. 3 shows general NN structures. DNNs perform automatic feature extraction using back-propagation algorithms without user intervention, unlike generalized regression models.

(4)

Fig. 3. The structrure of the Neural Network (NN)

For constructing the proposed yield prediction model, a simple linear regression method makes it hard to identify correlations between spatial features of dies and die-level yield variations on the wafer map. To solve this problem and fully exploit these spatial features and die-level yield distributions, we employ DNNs, which is capable of approximating the non-linearity of these complex relationships without user intervention and is effectively trained on large data using the back-propagation algorithm, as the learning algorithm for a yield prediction model of wafer maps.

In recent years, there are various deep learning algorithms such as: Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), etc. We used DNNs, which is more suitable for implementing the proposed yield model. DNNs are known to deliver conceptually good performance, but they have various disadvantages such as vanishing gradient, over-fitting, and training performance, etc. Recently, in order to resolve various issues of DNNs, ReLU, PReLU, and Maxout were introduced as an activation function by its effectiveness and dropout present to prevent over-fitting issues [14]. In addition, the batch normalization (BN) method proposed to alleviate vanishing gradient problem [15] and is a fundamental way to accelerate learning speed by stabilizing the entire training process to remove internal covariance shift. This method works by adding a batch normalization layer before inputting to a hidden layer and using the output of the batch normalization layer as input for the activation function. The BN method is necessary to improve the performance of the model and is used in various deep learning models. Furthermore, various optimizers are introduced to optimize neural networks such as AdaGrad, RMSProp, and Adam optimizer which combines the advantages of AdaGrad and RMSProp optimization methods [16]. As a result of constant efforts to solve the problem, deep Learning models are widely used in

automatic speech recognition, image recognition,

recommendation systems, natural language processing, etc. The performance of DNN-based model is depending on hyperparameter options. Thus, we implement various models to find the optimal architecture for our problem while tuning the number of hidden layers and number of nodes of each layer, optimizers and initialization methods. Detailed hyperparameter

tuning options and results for designing the optimal model architecture are discussed in Section IV.A

B. Input Variables

In this section, we discuss the variables concerning die features on a wafer map: the position of a die and yield of a die. These variables are used to train our yield model and predict yields of each die as input variables of our proposed models. Input variables of our model include only spatial design features of a wafer map as mentioned above since process parameters and test parameters that can influence yields are unknown at an early design stage of a wafer map. Fig. 4 shows the spatial features of dies on the wafer map we use. Those consist of five spatial features: the distance from the center of a wafer and four geometric coordinates of a die on the wafer map. Based on our previous study [9], all features of a wafer map available at the wafer map design stage are nine spatial features: such as width, height, distance from the center of wafer map and six geometric coordinates of a die.

However, we use five input variables because four geometric features can represent in die size features and it will reduce the training time and the risk of over-fitting and improve inference accuracy. Thus, unlike the RYD model that uses only one feature as the distance from a die to the center of the wafer, our proposed model is trained with die-level yield and five spatial features of a die as input variables and predicts die-level yield with five spatial features of a die.

C. Data Pre-processing

In this subsection, we discuss the pre-processing of yield data used as training data for our model. To improve prediction accuracy and avoid over-fitting, the proposed model trains by exploiting large yield data of various products. Training the model can be difficult if the yield deviation of each product is different in training data of products. Thus, it is necessary to normalize yield data of all products for effective training. These die-level yield data acquired from a wafer test after fabrication, which is expressed in the form of a WBM already shown in Figure 2. A WBM has passed or fails information of each die, which is a functional test result and is the crucial information to

(5)

Fig. 4. The spatial die features of the proposed method: for i-th die 𝑑𝑖, four

geometric coordinates of a die and the distance of a die from the center of a wafer, 𝑑𝑖𝑥𝑗

start tracking the process problems.

For exploiting spatial yield distributions on the wafer map of various products, we first calculate average yields for each die position in each product, respectively. The yield of each die is calculated as follows: , 1 , ,

1 (

_{i k}

)

_iN _{wafertest i k}

Yield D

Y

N

=



(4)

where Di,k is the i-th die on the wafer of the k-th product, N is

the number of wafers of the k-th product, and Ywafertest,i,k is a

wafer test result of the i-th die of the k-th product, which 1 for pass and 0 for fail.

Secondly, the average yield of each product is need for calculating normalized yield and is result of average yield of all dies on a wafer. The average yield of a product derived as follows:

𝑌𝑖𝑒𝑙𝑑

𝑘

=

∑𝑁𝑖=1𝑌𝑖𝑒𝑙𝑑(𝐷𝑖,𝑘)

#𝑊𝑎𝑓𝑒𝑟 (5)

Lastly, the yield of each die on a wafer map of a product is normalized by dividing the average yield for a product, and this pre-processing is executed individually for each product. The normalized yield of the i-th die of the k-th product is calculated as follow: ,k , , 1 ( ) ( ) 1 ( ) i i k N j k j MeasuredYield D NormaliedYield D MeasuredYield D N = =



(6)

where MeasuredYield is die-level yield of the wafer test. Finally, a training data is expressed as follow:

(𝑑𝑖𝑥1,𝑘, 𝑑𝑖𝑥2,𝑘, ⋯ , 𝑑𝑖𝑥4,𝑘, 𝑑𝑖𝑥5,𝑘, 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑𝑌𝑖𝑒𝑙𝑑(𝐷𝑖,𝑘)) (7)

and the data of all products are combined into training data.

IV. EXPERIMENTAL DESIGN

A. Yield Prediction Model

To demonstrate the effectiveness of the proposed approach to accurately predict yields of new wafer maps, we conducted experiments with real-world yield data collected from a wafer test after fabrication. A total of 100,000 wafer yield data from five products were used to train our model. Our model was tested with two products that preprocessed 3,000 wafer yield data. We implemented several architectures with various hyperparameter tuning options for finding the optimal architectures for proposed approach. Using the batch normalization method as the default option to improve the performance of the model, we designed fully-connected neural networks with 6, 9, 12 and 15-hidden layers. Each hidden layers consisted of 20, 30, 40 and 50 nodes.

To implement the learning strategies, the proposed model was trained according to the following conditions. The hidden and output layers of the network used ReLU as activation functions, each of the hidden layers in the network was followed by a batch normalization layer in order to reduce internal covariate shift in the network [14]. In addition, all the weights of the network were initialized with Xavier and He initialization method for efficient training. For training the network, back-propagation was conducted using the Adam [16], RMSProp and Stochastic Gradient Descent (SGD) optimizer with the learning rate 0.001, 0.0001, 0.00001 and mini-batch size of 1000. In addition, for verifying the effectiveness of proposed input variables, we also construct a model with nine spatial features of a die based on [9]. All the experiments were executed based on GPU-accelerated Tensorflow 1.14 in Python.

The experimental conditions are summarized as follows:  # of hidden layers: 6, 9, 12, 15

 # of nodes in each hidden layer: 20, 30, 40, 50  Initializer: He and Xavier

 Learning rate: 0.001, 0.0001, 0.00001  Optimizer: SGD, RMSProp, Adam  Activation Function: ReLU  GPU-accelerated Tensorflow: 1.14

 Training data: 100,000 wafers from 5 products  Test data: 3,000 wafers from 2 products

To compare the performance of the proposed model, we implemented multivariate polynomial regression (MPR) model, random forest regression (RFR) model and support vector regression (SVR) model. The spatial-oriented yield model using multivariate polynomial regression was used to compare the prediction accuracy of yield variations. SVR is one of the regression methods that add an ε-insensitive loss function for solving regression problems to the support vector machine (SVM) used for predicting classification problem with good performance [20]. In this experiments, we used the kernel function with the highest performance among linear, polynomial, and Gaussian RBF as a comparison model.

(6)

Fig. 5. The fully-connected neural networks with one-input layer, 12-hidden layers, 40 nodes in each hidden layer, and one-output layer. Input values are spatial features of a die. Output is the predicted yield 𝑦𝑑𝑖 for the die.

Our model uses the Xavier initialization method, the Adam optimizer, the adaptive learning rate and the root mean square loss function for improving convergence speed.

TABLEI.

THE COMPARISON RESULTS OF THE PERFORMANCE BETWEEN PREDICTION MODELS

Test Products A B Models RMSE R2 Pearson

Correlation RMSE R2 Correlation Pearson

MPR 0.092 0.389 0.639 0.077 0.385 0.658

RFR 0.066 0.731 0.857 0.068 0.652 0.861

SVR 0.043 0.792 0.880 0.049 0.779 0.920

DNNs-5 0.009 0.997 0.999 0.011 0.993 0.996

DNNs-9 [9] 0.009 0.994 0.996 0.011 0.982 0.989

MPR: Multivariate Polynomial Regression, Random Forest Regression, SVR: Support Vector Regression

Random forest is a type of ensemble learning method used for classification and regression analysis. A RFR is an ensemble regression that produces multiple decision trees, using a randomly selected subset of training samples and variables and showed good performance. The same variables as the input variables used in the proposed model were used for the comparison models.

To summarize, we construct various DNN models to find the optimal architecture of the proposed method and compare the performance using regression models based on machine learning algorithm.

B. Performance Measurement

The prediction accuracy of models was evaluated in terms of the root mean square error (RMSE), R-squared and the Pearson correlation coefficient. The RMSE is frequently used as a measure of accuracy, and it was used to compare the difference between actual yields and predicted yields of the test product. The RMSE of between actual yields and predicted yields is defined as the square root of the mean squared error. The smaller the RMSE value, the more accurate the yield prediction of the model. Second, R-squared indicates a statistical indicator of regression model. Lastly, the Pearson correlation coefficient was used to estimate the linear relationship and direction between actual yields and predicted yields by the die position. A correlation of 1 means a perfect increasing linear relationship, and -1 vice versa. In this experiment, the closer the correlation is to 1, the more accurate the yield distribution modeling and prediction according to die positions will be.

C. Wafer Map Optimization

In order to verify that our model is useful for the productivity evaluation of new wafer maps and help to design a wafer map with the optimal productivity, we apply our proposed model to the WPO model using differential evolution algorithm proposed by Kim et al. [3]. The WPO model, as mentioned in Section II.B, will find a wafer map with maximized productivity, i.e. maximized ROI, considering gross dies, shot counts, lithography throughputs, prices, and fixed yield. It is different from typical wafer map design approach which is maximizing the number of gross dies on the wafer (MGP). Yield of the wafer map was fixed at 0.8 in the WPO model, but we modified it using our proposed yield model (WPOY) to evaluate the productivity of wafer maps considering yield

variations according to die positions and die sizes of a wafer map. When calculating the ROI of candidate wafer maps in finding the optimal wafer map, the ROI was calculated based on the proposed yield prediction model. The program for designing a wafer map was executed in the same manner as the condition experimented in [3], and we also choose die areas in the range of 20mm2_{and 200mm}2_{which are also equivalent}

conditions.

V. EXPERIMENT RESULTS

A. Yield Prediction Model

To find the optimal architecture for our approach, we tested 72 architectures with various hyperparameter options addressed above conditions in Section IV.A. Based on experimental results, we selected the optimal architecture which shown in Fig. 5. The proposed model consisted of 12-fully connected layers trained with a back-propagation algorithm, and used the Xavier initialization method, the Adam optimizer, the learning rate 0.0001 and the root mean square loss function for

(7)

TABLEII.

THE COMPARISON RESULTS OF PRODUCTIVITY BETWEEN MGP WITH DBY MODEL AND RBP WITH DBY MODEL IN VARIOUS DIE FEATURES

Size

(mm2₎ AR MFO _(%)

MGPY WPOY _Enhancement (%) #GD #Shots Norm. Y Productivity #GD #Shots Norm. Y Productivity

20 0.7 74.59 3271 119 0.908 22.805 3268 118 0.909 22.836 3.080 1.0 81.59 3275 108 0.912 23.370 3273 108 0.912 23.375 0.510 1.2 83.92 3271 107 0.914 23.432 3267 105 0.914 23.511 7.990 1.43 83.92 3270 105 0.915 23.544 3269 105 0.916 23.560 1.670 1.5 97.90 3271 92 0.915 24.109 3268 91 0.916 24.156 4.640 50 0.7 87.41 1281 98 0.910 8.680 1279 98 0.913 8.697 1.676 1.0 69.93 1284 122 0.920 8.418 1277 119 0.925 8.469 5.093 1.2 93.24 1281 92 0.923 8.929 1274 87 0.927 8.997 6.740 1.43 69.93 1281 122 0.921 8.412 1277 118 0.926 8.498 8.599 1.5 69.93 1284 119 0.921 8.481 1283 119 0.922 8.483 0.183 100 0.7 69.93 630 118 0.870 3.403 621 114 0.886 3.445 4.238 1.0 69.93 626 117 0.895 3.470 625 116 0.896 3.512 4.269 1.2 69.93 628 119 0.888 3.471 624 116 0.899 3.520 4.880 1.43 69.93 630 118 0.883 3.465 621 113 0.899 3.511 4.630 1.5 69.93 626 118 0.892 3.465 627 118 0.891 3.489 2.358 200 0.7 46.62 303 157 0.794 0.813 302 159 0.809 0.841 2.825 1.0 46.62 304 162 0.809 0.839 300 161 0.833 0.880 4.122 1.2 93.24 304 84 0.804 1.081 302 85 0.830 1.131 4.936 1.43 46.62 304 160 0.812 0.850 303 157 0.825 0.884 3.338 1.5 46.62 304 158 0.808 0.849 300 156 0.827 0.872 2.313

improving convergence speed. The input layer of the proposed model consisted of five nodes, and the output layer has one node: a predicted yield, and each hidden layer consisted of 40 nodes. Input values are spatial features of a die. Output is the predicted yield, 𝑦𝑑𝑖, for a die of the wafer map.

Table I shows the comparison results of the accuracy between models in the test data. As shown in this table, the performance of the proposed model (DNNs-5) was superior to the comparison models regardless of measurement methods for test products. DNNs-5 model which used five input variables outperformed DNNs-9, which was introduced in our previous study [9]. This indicates that the proposed model was well-trained and die yield distributions according to die positions and avoided over-fitting issues while improving performance. Consequently, test products used in the experiment have different die sizes and aspect ratios, and the results of the experiments show high accuracy regardless of die sizes and aspect ratios.

B. Wafer Optimization Results

Table II shows the experimental results of the productivity enhancement. To compare the performance of our modified wafer productivity model (WPOY), which integrated proposed yield prediction model, we used the maximum gross die productivity model with proposed yield prediction model (MGPY), which is a traditional wafer map design strategy. The WPOY shows up to 8.59% higher productivity than MGPY, i.e., profits per costs increase by up to 8.59% with the optimal wafer map of the WPOY model using proposed yield prediction model. This result indicates that our proposed model more effectively models spatial features on the wafer map and actual die-level yield data collected from a wafer test than comparison regression models based on machine learning. Thus, it can be

used as a yield prediction model for pre-evaluating the productivity of wafer maps when designing a new wafer map.

VI. CONCLUSION

In order to improve manufacturing productivity, the wafer map productivity is pre-evaluated by considering various factors affecting productivity in the wafer map design stage. Since the yield is directly related to the productivity and the design of wafer map affects the yield from a manufacturing point of view, yield is one of the factors that must be considered to accurately evaluate productivity of new wafer maps. Various studies have been conducted on the yield model to predict the yield of the wafer map. However, yield prediction of a newly designed wafer map is a difficult challenge due to process parameters that are commonly used in yield prediction models are not yet determined. In this paper, we proposed a yield prediction model for pre-evaluating the productivity of new wafer maps before manufacturing. The proposed approach exploits spatial relationships among positions of dies, distances from dies to the center of a wafer and die-level yield variations collected from a wafer test. We adopted fully-connected deep neural networks to effectively train with massive actual yield data without user intervention and accurately predict yield of wafer maps. The proposed method improves yield prediction accuracy by using only five spatial die features. Furthermore, our yield prediction model helps to design a wafer map with productivity improved up to 8.59% when pre-evaluating productivity with our proposed model. Consequently, our approach accurately predicts yield of new wafer maps with new die sizes and positions. It is an effective solution to improve productivity of wafer maps at an early design stage of a wafer map and contribute to design a wafer map with optimal productivity from various wafer map possibilities.

(8)

REFERENCES

[1] D. K. De Vries, “Investigation of gross die per wafer formulas,” IEEE

Transactions on Semiconductor Manufacturing, vol. 18, no. 1, pp. 136–

139, 2005.

[2] C. F. Chien, C. Y. Hsu, and K. H. Chang, “Overall Wafer Effectiveness (OWE): A novel industry standard for semiconductor ecosystem as a whole,” Computers and Industrial Engineering, vol. 65, no. 1, pp. 117– 127, 2013.

[3] J.-S. Kim, C. W. Ahn, T.-W. Kim, H.-J. Lee, and J.-B. Lee, “A market-oriented wafer map optimization methodology using Differential Evolution to maximize wafer productivity,” 28th Advanced

Semiconductor Manufacturing Conference (ASMC), 2017, pp. 393–398.

[4] W. Maly, “Prospects for Wsi: A Manufacturing Perspective,” Computer, vol. 25, no. 4, pp. 58–65, 1992.

[5] N. Kumar, K. Kennedy, K. Gildersleeve, R. Abelson, C. M. Mastrangelo, and D. C. Montgomery, “A review of yield modelling techniques for semiconductor manufacturing,” International Journal of Production

Research, vol. 44, no. 23, pp. 5019–5036, 2006.

[6] H. Dong, N. Chen, and K. Wang, “Wafer yield prediction using derived spatial variables,” Quality and Reliability Engineering International, vol. 33, no. 8, pp. 2327–2342, 2017.

[7] G. O. Donoghue and C. A. Gómez-uribe, “A Statistical Analysis of the Number of Failing Chips Distribution,” IEEE Transactions on

Semiconductor Manufacturing, vol. 21, no. 3, pp. 342–351, 2008.

[8] A. V Ferris-prabhu, L. D. Smith, H. A. Bonges, and J. K. Paulsen, “Radial Yield Variations in Semiconductor Wafers,” IEEE Circuits and

devices magazine, vol. 3, no. 2, pp. 42–47, 1987.

[9] S.-J. Jang et al., “A wafer map yield model based on deep learning for wafer productivity enhancement,” in Proc. 28th SEMI ASMC, Saratoga Springs, NY, USA, 2018, pp. 29–34.

[10] C. H. Stapper and E. Junction, “Fact and fiction in yield modeling,”

Microelectronics Journal, vol. 20, no. 1–2, pp. 129–151, 1989.

[11] S. P. Cunningham, C. J. Spanos, and K. Voros, “Semiconductor Yield Improvement: Results and Best Practices,” IEEE Transactions on

Semiconductor Manufacturing, vol. 8, no. 2, pp. 103–109, 1995.

[12] D. Teets, “A model for radial yield degradation as a function of chip size,” IEEE Transactions on Semiconductor Manufacturing, vol. 9, no. 3, pp. 467–471, 1996.

[13] J. urgen Schmidhuber, “Deep Learning in Neural Networks: An Overview,” Neural networks, vol. 61, pp. 85–117, 2017.

[14] D. M. Bates and D. G. Watts, Nonlinear Regression Analysis and Its

Applications, vol. 85, no. 410. 1988.

[15] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.

[16] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in

International Conference on Machine Learning, 2015, pp. 448–456.

[17] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,”

arXiv preprint arXiv:1412.6980, pp. 1–15, 2014.

[18] J. Kim, S. Jang, T. Kim, H. Lee and J. Lee, "A Productivity-oriented Wafer Map Optimization using Yield Model based on Machine

Learning," in IEEE Transactions on Semiconductor Manufacturing, 2018

[19] Karsoliya, Saurabh. "Approximating number of hidden layer neurons in multiple hidden layer BPNN architecture." International Journal of Engineering Trends and Technology 3.6, pp. 714-717, 2012 [20] Drucker, H., C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik,

“Support vector regression machines”, In M. Mozer et al. (Eds), Advances in Neural Information Processing Systems, vol. 9, pp. 155-161, 1997