Verification and Evaluation of Exploratory Statistical Methods in the Analysis of Relative Data in a Sanitation Industry in the State of Goiás with the Application of Exhaust Production

(1)

Verification and Evaluation of Exploratory Statistical Methods in the Analysis of Relative Data in a Sanitation Industry in the State of Goiás with the

Application of Exhaust Production

Isis Juliane Arantes Granja

Química E Mestranda Em Engenharia De Produção E Sistemas Pela Pontifícia Universidade Católica De Goiás, Puc-Go;

E-mail: [email protected]

José Elmo De Menezes

Doutor Em Estatística Pela Universidade De São Paulo (2005) E-mail: [email protected]

Abstract

This research aims to analyze the application of exploratory statistical methods, in particular the PCA and HCA methods, in the analysis of data related to the production management in a chemical industry of Household sanitation. The chemical industries of cleaning products, also known as sanitizing industries, come in a great market growth and consequently competitiveness, thus forcing the improvement in their productive performance. Twelve months of the year 2016 were established with samples. The results were analyzed through graphs and statistical tables, which results in three large groups of samples. In this way, a methodology for the implementation of lean production was consolidated in obtaining the improvement of productivity and performance in an industry of sanitation sector in the State of Goiás.

Keywords: Exploratory Statistical Methods, Lean Production, Sanitation.

Introduction

The chemical industry of cleaning products, known as sanitizing industry, has grown year after year, the industry market, consequently becomes more competitive, so the production process has to keep up with this growth in sales, being forced to develop new tools for process control, low cost in production, reduction in waste generation. However, most of these industries present production problems, technological and socioeconomic, important and important factors for performance and success in organizations.

Production planning and control (PCP) is put in a way where it functions as a tool that helps in solving problems by aiding the search for strategies. According to Corrêa and Gianesi (1996), "the PCP of companies has the function of planning, scheduling and controlling production, defining what should be produced, when, in what quantity and what resources to be used." Thus, it is fundamental to structure the department so that production planning and control are done while respecting the capacity limits of the resources, while meeting demand (STEFANELL, 2010). The growing quest for improving the efficiency of industrial processes and increasing investments by companies to achieve this goal have made quality a key factor in all areas, be they products or services. This increase in importance in

(2)

Data in a Sanitation Industry in the State of Goiás with the Application of

Exhaust Production 451

efficiency in search of quality is due to the fact that consumer behavior has undergone a great change, being the quality determining factor in the decision to purchase the products and services. As a direct consequence, companies need to adapt through continuous improvement and improvement of production processes. It is required that the quality to be employed becomes an essential part of the business strategy, ensuring competitiveness in the global market. In this context, quality is being incorporated into companies that seek to establish a relationship of reliability and credibility with their suppliers and consumers (COSTA, 1998).

To become a reality, several instruments are used in the improvement in the search for the improvement of the efficiency of productive processes. Among these instruments, Statistical Process Control (SPC) is used in many industries, not necessarily as a set of random statistical techniques, but as an effective tool for analyzing, planning, deciding and executing, in order to achieve the proposed objectives (EPPRECHT, 1998). The multivariate statistical analysis methods can be useful tools in the search of the efficiency of productive processes, since they provide a first sintering of the information, with respect to measurements of position and dispersion of the data (KONRATH, 2002). When the objective of the study is the simultaneous description of more than two variables, it becomes necessary to use multivariate statistical methods. In this sense, we can distinguish methods involving exploratory data analysis and experimental planning (CRIVISQUI, 1993).

Multivariate data analysis methods have largely demonstrated their effectiveness in the study of complex data sets. These are called multidimensional methods, as opposed to methods of descriptive statistics that treat no more than one or two variables at a time. Therefore, they allow confrontations between several variables, which is infinitely richer than their separate examination. The simplified representations of large data tables that these methods allow to obtain have manifested themselves as a remarkable synthesis instrument. In this process of evaluation generated by the statistical control of processes, several reports are generated and analyzed jointly by the industry, factory supervision and industry management for discussion of results. Based on the analysis of the data generated during the statistical control, decisions can be taken to optimize the efficiency of the productive processes (ESCOFIER; PAGÉS, 1992).

The PCP covers several areas in the production process, being they, stock of raw materials, finished products and sales. Production requires a daily fixed schedule and in turn the sales needs a stock where it covers all the orders that are issued and placed on the shipment for deliveries to the consumer. Noting that this flow between production and sales, the PCP must verify the forecast of demand and delimit its production directing its productive process in order to avoid the lack of products in stock and making its productive process reliable and giving competitiveness to the industry. The industries that have implemented lean production use the tools to continually improve their processes, identify wastes and seek to eliminate them. The seven wastes identified in lean production are: overproduction, unnecessary movements, waiting in the process, transportation, defects, excess inventory and improper processing, aiming at increasing efficiency, reducing costs, and improving customer response time, as well as improving quality, increased profitability and external image (VERRIER et al., apud BERGMILLER and MCCRIGHT, 2009). However, the cleaning products segment has low added value in its products, thus having large sales for a reasonable profit margin for the industries, thus the importance of tools that help reduce production costs (PAIVA et al.

al, 2013). When analyzing the behavior of the Brazilian consumer by demographic regions, it is verified that in 2013, the consumer of the Center-West region of Brazil is the one that has the most consumption in cleaning products (ABIPLA apud KANTAR WORLDPANEL, 2013).

The administration of production (PA) is consolidated as a pragmatic way to solve the real problems related to manufacturing processes. It presents a high degree of importance in management decisions, but the company does not exist for the purpose of controlling and planning production.

Organizations as a "whole", driven on economic concepts, effectively subsists to generate profits, vastly increase their capital. This effective planning and control translates into "being competitive"

(3)

and services, some organizations will succeed and others will not. It is at this point the great difference of those who will be able to do so and those who will not be able to be the greater or lesser capacity of each of those who offer to offer what interests most to the demanding markets, expressing their competitiveness in relation to " competition. The ability of the organization to overcome its ability to overcome competition in those aspects of performance that market niches most value is attributed to being "competitive" (Corrêa, Gianesi and Caon, 2009). Analysis of main components and described as a method to produce linear combinations of variables X1, X2, ..., Xp, for which data are available, with the purpose of summarizing the main aspects of the variation in the variables X with the variation of one number of these linear combinations. Linear combinations are the major components. They take the form Z = a1X1 + a2X2 + ... + apXp, with the restriction that a1² + a2² + ... ap² = 1 (HAIR & COLS, 2005). The first linear combination is the first major component. This has the property of having the largest possible variance. The second major component has the property of having the largest possible variance and being uncorrelated with the first component. The other major components are similarly defined, with the ith major component having the largest possible variance given that it is uncorrelated with the first major components i-1.

The main components are calculated to find the eigenvalues and eigenvectors of the sample covariance matrix for the X variables, usually after the X variables have been standardized to have zero means and one variance, so that the covariance matrix and also the correlation matrix for the X variables (MANLY, 2008). If the analysis is performed using the correlation matrix, then the sum of the eigenvalues is equal to p, the number of variables X. For further analysis only the first few major components are used, from which the sum of their variances is a high percentage (for example, 80% or more) of the sum of the variances for all "p" components. Alternatively, if the analysis is performed on the correlation matrix, then the major components with variances larger than ones can be used because they have variances that are larger than the variances of the individual standardized X variables. The reasons for implementing a cluster analysis are discussed. These include defining the true underlying groups and finding a small number of objects (one per group) that covers the full set of conditions for a larger set of objects. Two types of grouping are described. One results in a dendrogram of its similarities. The other involves an interactive partitioning procedure to find the best set of n-groups for a set of data, starting with arbitrary groups and improvising them by moving individuals between them (HAIR & COLS, 2005). There are a variety of agglomeration hierarchical clustering algorithms. Those based on distances from the nearest neighbor, distance from the furthest neighbor, and group means are described.

They begin with all individuals in groups formed only by themselves and gradually fuse them into a group. Hierarchical divisive methods are also briefly described, although they are not used with the same frequency as agglomerative methods. Divisive methods begin with all objects in a group and gradually separate the objects until each is in a group of an element. Problems in detecting clusters with unusual shapes are discussed. Measures of distance between objects are discussed in particular Euclidian. The need for standardization of variables is also mentioned (MANLY, 2008).

Procedures

Data collection, executed directly, using the commercial software of the company. These data were collected continuously and systematically (from January to December of the year 2016). Once the data were obtained, they were carefully analyzed descriptively. A 12X10 data matrix was assembled. Being 12 samples and 10 quantitative variables related to the production of household sanitizers, in the studied period. Then, using the "Statistical" software, the data were standardized to match the weights of the various variables.

Thus, we performed an analysis of the results obtained through the exploratory statistical methods, PCA and HCA. For the pre-treatment and analysis of the data the software "STATÍSTICA",

(4)

version 7.0, as well as the Windows operating environment was used. The data were sampled monthly in the period between the first day of January 2016 (01.01.2016) and the last day of December 2016 (12.31.2016). The data were collected based on the database that the company made available for the study. The data of the present work contemplated the information of the fiscal year of 2016.

After the data collection, carried out through the software of the "Statistical" managerial information program of the company surveyed, we developed a table with some descriptive statistics of the variables: mean, median, maximum, minimum and coefficient of variation (CV).

Results

Table 1 summarizes some descriptive statistics of the studied variables.

Table 1: Table with the descriptive statistics of the variables

VARIABLES AVERAGE MAXIMUM MINIMUM CV

Q1 2459,7 5729 1553 43,90%

Q2 2193,7 2781 1779 12,25%

Q3 661,4 4162 0 170,14%

Q4 3769 5534 2699 25,11%

Q5 345,1 601 230 31,55%

Q6 1331,4 2686 865 35,67%

Q7 75,1 125 51 24,76%

N 28,1 34 22 13,95%

E 1581 1797 1305,19 10,71%

F 687451,9 1005851 515754,7 18,20%

Source: Author, 2016.

It is observed that the variable Q3, presents an exceptionally large coefficient of variation considering that this type of packaging was introduced in the company only in the year 2016 by law, being an important variable for data analysis. The variable Q1 also stands out from the others, presenting a very significant variation. On the other hand, the variables Q2, E and N presented the smallest coefficients of variation, presenting a smaller contribution to the total variance of the data. The multivariate analysis was performed on the data matrix composed of 10 variables and 12 samples (12x10), with the data previously standardized. Principal component analysis (PCA) and hierarchical clustering analysis (HCA) were applied to the standardized data matrix. Considering that in the present study all variables (Q1, Q2, Q4, Q4, Q5, Q6, Q7, N, E, F) are equally important to aid in discriminating the samples, we chose the self- for a same variable (column) were subtracted from the mean value and divided by the standard deviation of the set of results obtained for that variable. This method of pre-processing is indicated when the data have very different orders of magnitude. Data were analyzed using the HCA and PCA techniques. The self-scaled data matrix for the year 2016 samples is shown in figure 1.

The main component analysis was performed on the data matrix composed of 10 variables and 12 samples for the year 2016. Among the 10 main components generated, the first three described approximately 89.0% of the total data variance. Figure 2 shows a graph of the main components generated as well as their explained variance.

(5)

Figure 2: Graph of the 10 main components generated indicating the percentage of variance that each one explains

The weights of the variables relative to each main component are shown in table 2:

55,64%

24,56%

9,71%

4,35%

2,99%

1,28% ,87% ,49% ,11% ,00 %

0 1 2 3 4 5 6 7 8 9 10

Componente principal 0

1 2 3 4 5 6

Variância explicada

(6)

Table 2: Weights of the variables in the main components PC1 and PC2

Source: Author, 2016

Note that the high weights correspond exactly to the variables indicated as correlated in table 2.

In figure 3, only the main component weights greater than 0.600 and less than -0,600 were highlighted to facilitate the identification of the variables with greater importance in the linear combination of each component. PC1 explains 55.6% of the total data variance. The largest negative weights in this component are those of the variables Q1, Q3, Q4, Q6, Q7 and F and the smallest negative weights are observed for the variables N and E. This main component represents the overall performance of the company considering that the variable Gross billing presents great weight for this component. Very negative values for this component indicates a large billing combined with a large production of products in the packages: box with 2 units of 5 liter gallons, 25 liter bottles, 50 liter bottles, 200 liter brass drum and 240 liter. Less negative values of this component indicates a small production of products in these packages and a small turnover. So from the point of view of production management, a large negative value for this co-inventor would be desirable. PC2 explains 24.5% of the total data variance. The largest negative weights in this component are those of the variables N and E. The largest positive weight in this component is that of the variable Q5. This component is a comparison of the variables number of employees involved in the production process and cost with electric power (operating cost) with the manufacture of products in 200-liter bottles. Negative values of this component indicates a high operating cost related to a small production of 200 liter cylinders. Positive values of this component indicates a large production of 200-liter cylinders and a small operating cost which is a desirable situation for the company. The production manager could then monitor the value of PC2 by establishing an acceptable value for this component because it is related to the balance between operating cost and the manufacture of 200 liter cylinders. Figure 3 shows the graph of the first two main components. The graph of the scores of the main component 1 versus that of the main component 2 explains 80.19% of the total variance of the data.

The sample of the month of JUL presented the highest values for the variables Q1, Q3, Q4, Q6, Q7 and F, located in the most negative part of the main component 1. Samples of the months of FEV, MAR, MAI, JUN and AGO presented intermediate values for these variables and are located in the intermediate part of PC 1. The months of JAN, ABR, SET, OUT, NOV and DEZ presented the lowest values for these variables and are in the positive part of PC1. This division of the months into groups can be related to the rainy or dry season, which decisively influences the company's revenues.

The period from October to March are the months of notably greater rainfall and the months of May to October are the months of greatest drought. The samples of the months of FEV, MAR, MAI, JUN and AGO presented the highest values and the months of JUN, AGO, SET, OUT, NOV and DEZ presented the lowest values for PC2. The months positive values for PC2 indicate small operating costs with large production of 200 liter cylinders and months of negative values for PC2 indicate high production cost with small production of 200 liter cylinders. MAR and MAI months presented negative values for PC1 and positive for PC2, with the months of better balance between gross billing and

Variables PC1 PC2

Q1 -0,940 0,042

Q2 -0,432 0,454

Q3 0,906 0,236

Q4 -0,781 0,395

Q5 -0,503 0,684

Q6 -0,945 0,097

Q7 -0,957 0,056

N -0,331 -0,819

E -0,224 -0,893

F -0,915 -0,295

(7)

for PC1 and negative for PC2, indicating the worst months in terms of gross revenue and operating cost. The inclusion of the months of NOV and DEC in this group is justified, as it is in these months that the payment of the first and second installments of the thirteenth salary occurs. The month of FEV stands out from the others because it presents the lowest operating cost, given that it is a month with fewer days of actual production due to having only 28 days and have carnival holidays.

Figure 3: Graph of the scores of the main component 1 versus the main component 2

Figure 4: Loadings chart of the first Principal Component (PC1) versus the second Principal Component (PC2), for the 12x10 data matrix, based on the correlation matrix

JAN FEV

MAR

ABR MAI

JUL JUN

AGO

OUTSET DEZNOV

-10 -8 -6 -4 -2 0 2 4 6

PC1: 55,64%

-4 -3 -2 -1 0 1 2 3 4 5

PC2: 24,56%

Grupo 1

Grupo 2

Grupo 3

Q1

Q2

Q3 Q4

Q5

Q7Q6

N E F

-1,0 -0,5 0,0 0,5 1,0

PC1 : 55,64%

-1,0 -0,5 0,0 0,5 1,0

PC2 : 24,56%

(8)

The hierarchical grouping analysis was applied to the original matrix without the need for pre- processing. HCA was performed with the objective of classifying the studied samples (months of the year 2016), verifying possible similarities and dissimilarities between the groups.

The Euclidean distance was used as a measure of similarity and the Ward method was used to delimit the groups.

Hierarchical Grouping analysis suggests the existence of three groups. It is clear that the months of JAN, FEV, ABR, SET, OUT, NOV and DEC constitute a group. The months of MAR, MAI, JUN AND AGO another group and JUL a group apart. The month of July is a unique month of school and drought holidays, representing an atypical month in the company's gross billing. The months of the first group coincide with the rainy season, which negatively influences the company's revenues. Few wash something that in a few moments will be dirty again, of course considering the main segmentation of the company surveyed, which is the sale of cleaning products and toiletries and vehicles. The months of MAR, MAI, JUN and AGO are months of lower rainfall, which positively favors the company's revenues. It can also be verified that the months of Group 2 constitute a more homogeneous group, that is, more similar to each other, whereas the months of Group 1 constitute a more heterogeneous group, that is, with less similarity to each other. Figure 05 represents the dendogram obtained from the data of a 12x10 matrix for the months of January to December of the year 2016.

The classification of the months in a group can aid in production planning measures and marketing strategies in order to months of each group may be the target of similar strategies. This approach can aid in the optimization of resources and manpower related to production planning.

Figure 05: Dendogram obtained from the analysis of the data of a 12x10 matrix for the months of January to December of the year 2016.

The results obtained with HCA complement those obtained by PCA, providing an overview of all samples and how they resemble each other. On the other hand, PCA allowed a better interpretation

0 1E5 2E5 3E5 4E5 5E5 6E5 7E5

Distância de ligação JUL

JUN MAI AGO MAR OUT DEZ SET NOV ABR FEV JAN

Grupo 1

Grupo 3 Grupo 2

(9)

data in the formation of the clusters.

Conclusion

The interpretation of the original data without statistical treatment is considerably complicated. This work highlights the importance of the use of multivariate analysis methods for the treatment of data on the production of household cleaning products. The use of the principal component analysis showed the common and discrepant characteristics between the different samples (months of the year), important for the adoption of efficient measures for production management, but hardly seen directly in the original data matrix. The hierarchical analysis of clusters complemented the analysis of main components, being another way to visualize the similarities and differences between the characteristics of the different months of the year studied. The analysis of the data together through the PCA and HCA techniques allowed to classify all the monthly samples of the year of 2016, showing that this type of data analysis allows quick and efficient information on the similarity between the samples through graphic visualization. The multivariate analysis applied to the production data in the household sanitizing industry allowed the extraction of information that would not be possible from the univariate analysis. This information can be of crucial importance to assist the production manager in adopting measures aimed at increasing production by optimizing operating costs.

References

[1] BAUMANN, R. O Brasil e a economia global. São Paulo: Campus, 1996.

[2] CAMPOS, S. A história do sabão. Disponível em: 16/05/2003,

<http://www.drashirleydecampos.com.br/noticias/21027>. Acesso em: 18 out.2009.

[3] CHEMELLO, E. Uma molécula “dupla personalidade???”. Disponível em: 27/02/2004,

<http://www.ucs.br/ccet/defq/naeq/material_didatico/textos_interativos_27.htm>. Acesso em:

15 out.2009.

[4] CORRÊA, L; GIANESI, G. N.; CAON, M. Planejamento, programação e controle da produção MRP II / ERP: conceitos, uso e implantação. 5.ed.-3.REIMPR.- São Paulo: Atlas, 2009.

[5] COSTA, A. Gráficos de controle X para processos robustos. Gestão e Produção, v. 5, n. 3, p.

259-271, dez. 1998.

[6] CRIVISQUI, M.. Análisis factorial de correspondencias: un instrumento de investigación en ciencias sociales. Asunción: Ed. Laboratorio de Informática Social, Universidad Catolica de Asuncion, 1993, 302 p..

[7] EPPRECHT, E.; SANTOS, A. Um método simples para o projeto ótimo de gráficos de X . Gestão e Produção. v. 5, n. 3, p. 206-220, dez. 1998.

[8] ERDMANN, R. Administração da produção: planejamento, programação e controle.

Florianópolis: Papa Livro, 2000.

[9] ESCOFIER, B.; PAGÈS, J. Análisis factoriales simples y múltiples: objetivos, métodos e interpretación. Bilbao : Ed. Universidad Del Pais Vasco, 1992.

[10] FLEURY, A.; HUMPREY, J. (1993). Human resources and the diffusion and adaptation of new quality methods in Brazilian manufacturing. Brighton, Institute of Development Studies, Research Report n. 24.

[11] HAIR JR., J.F. et al. Análise mutivariada de dados; tradução A. S. Sant'Anna & A. C. Neto, trad. 5.ed., Porto Alegre : Bookman, 2005

[12] KONRATH, A. Decomposição da estatística do gráfico de controle multivariado T2 de Hotelling por meio de algorítimo computacional. Dissertação (Mestrado em Engenharia de

(10)

Produção - Programa de Pós-Graduação em Engenharia de Produção, UFSC florianópolis, 2002)

[13] MANLY, B.J.; Métodos estatísticos multivariados : uma introdução ; tradução Sara Ianda Carmona. 3.ed., Porto Alegre : Bookman, 2008

[14 ]MATTA, A. Sabão... Disponível em: 01/11/2005,

<http:/74.125.93.132/search?q=cachê:Ijd03ZeliaEJ:www.ccmn.ufrj.br/curso/trabalhos/pdf/quím ica-trabalhos/experimentação_quimica/experim-quim1 >. Acesso em: 15 out.2009.

[15] RUSVEL, A. História do Sabão. Disponível em: 19/09/2008,

<http://www.srcoronado.com/smf/index.php?topic=9118.msg74444#msg74444>. Acesso em:

23 out.2009.

[16] SLACK, N. et. al. Administração da produção. 1.ed.-12.REIMP.- São Paulo: Atlas, 2010.

[17] TOMAZELA, J. Esgoto é causa da espuma no Tietê em Salto. Disponível em: 02/06/2009,

<http://estadao.com.br/noticias/cidades,esgoto-e-causa-da-espuma-no-tiete-em-salto-diz- cetesb,199494,0.htm>. Acesso em: 16 out.2009.

[18] TUBINO, D. Manual de planejamento e controle da produção. São Paulo: Atlas, 1997.

[19] VIEIRA, T. Relatório de fosfato – analise industrial. Disponível em: 04/10/2008,

<http://www.ebah.com.br/relatorio-fosfato-em-refrigerante-analise-instrumental-doc- a7848.html>. Acesso em: 20 out.2009.

[20] VOLLMANN, T. et. al. Sistemas de planejamento e controle da produção para gerenciamento da cadeia de suprimentos. 5.ed. Porto Alegre: Bookman, 2006.