2017 International Conference on Computer Science and Application Engineering (CSAE 2017) ISBN: 978-1-60595-505-6
Tacit Knowledge Mining of Formula Based on Double Decision
Attribute Concept Lattices
Xi-zheng Zhang, Yue-yue Cai* and Wen Luo
Business School of Hunan University, Yuelu District, Changsha City, Hunan Province, China
ABSTRACT
This study used concept lattice model to construct formal context relation of formula attributes, and extracted the association rules among the attributes of object especially the tacit knowledge between the component dosage of formula and double decision attributes: the comprehensive score and the cost. Taking the ferment as an example, the key components and dosage that affect the effect and cost of formula were excavated and the formula that has good performance in both functionality and economy were selected. The results of this study contribute to rapid decision-making in the factor selection and quantitative configuration when developing new formulas.
INTRODUCTION
specific case. For the fact that the relational databases include decision attributes in the real world and decision formal context allows us to give a more particular explanation of the induced implications to meet users’ different requirements, as an extension of the formal context, decision formal context often be used to implement special decision analysis through the concept lattice.
A formula contains a lot of tacit knowledge, including component type, component dosage, preparation procedure and so on. There is an important correlation among the component dosage, the relevant performance index and economic index of the formula. It is of great importance to analyze the relationship between the component dosage of the formula and these indexes for reducing the cost of the formula and improving the performance of its product. The traditional researches and discoveries of component dosage knowledge are mostly based on the experimental method to optimize formula. Such as literature [21, 22] both used single factor experiments and orthogonal experiments to optimize the black element yogurt ice cream formula and chicken essence formula respectively. In literature [23], a regression equation was established based on the experimental results. The author studied the relationship between the proportion of each component and the indexes to seek the optimal formula combination. In literature [24], Analytic Hierarchy Process (AHP) was used to evaluate the fragrance formula, which provided the basis for rationally selecting the fragrance formula. In literature [25], the multi-attributes group decision-making model of lubricating oil formula was established on the experimental data. This experimental method is able to get the pros and cons for a number of specific formulae; however, it is undoubtedly not desirable with large scale of formula data. In addition, past formula researches mostly considered a variety of performance indicators, while the economic indicators, such as cost, have not been taken into account.
In this paper, we proposed the double decision attribute concept lattices considering the effect and cost of formula on the basis of the decision form context theory in the literature [18]. Taking formulas with different component dosage as research objects, components, performance indicators and economic indicators of formula as research properties, we constructed a decision attribute concept lattice between formula component and comprehensive score, component and cost, respectively. After that, the association rules were extracted in order to dig out the tacit knowledge hidden in component dosage, comprehensive score and cost. Therefore, we can make hidden knowledge explicit and provide help for business decision makers to develop new product formulas.
DOUBLE DECISION ATTRIBUTE CONCEPT LATTICES OF FORMULA
In this section, we review some basic notions related to formal concept analysis and decision formal context and put forward the double decision attribute concept lattices of formula.
I a x, )
( indicates that the object x has the attribute a and (x,a)I means the opposite.
Definition 2 ([26]) On the basis of formal context, the formal concept (X,M)
is obtained, where XU,M A. For the formal concept(X,M), the following
conditions are required: X=
a A x X x a,( , ) I
,
= ,( , )
M x U a M x a I , XM and M X . In this case, X
and M are called the extent and intent of (X,M)respectively.
Definition 3 ([26])If (X1,M1) and (X2,M2) are the two formal concepts of
) , ,
(U A I , (X1,M1)(X2,M2):X1 X2(M2 M1) , then (X1,M1) is
called the sub-concept of (X2,M2) and (X2,M2) is called the super concept of )
,
(X1 M1 . The “” indicates the partial order relationship between concepts. The partial order set of all the formal concepts of the formal context is called the concept lattice of (U,A,I).
Definition 4 ([18]) A decision formal context is a quintuple (U,A,I,D,J), where (U,A,I) and (U,D,J) withAD are two formal contexts. Here,A and D are called the conditional attribute set and the decision attribute set of
) , , , ,
(U A I D J , respectively.
On the basis of the above definitions, the definition of decision formal context of the formula is defined as follow.
Definition 5 The decision formal context of formula is recorded as a quintuple
K (U,C,I,D,J), where U {X1,X2,,...,Xm} is a set of objects,m represents
the number of formula types to be studied. C
{ k j j j k k a a a a a a a a
a, ,..., , , ,..., 2,..., 1, 2,..., 2 2 1 2 1 2 1 1
1 } is a set of condition attribute, j represents the number of formula components. There is only one value in the sub-attribute {ai1,ai2...,aik} of each condition attribute equals to 1 (1 means that the object has this attribute), the others are 0. I is a binary relationship on UC.
D {d11,d12,...,d1k,d21,d22,...,d2k} is a set of double decision attribute (comprehensive score and cost of formula), k represents the number of sub-attributes of formula decision attribute. Similarly, there is only one value in the sub-attribute {di,di ...,dik
2 1
} of each condition attribute equals to 1. J is a binary relationship on D
U .
For the formula concept, each formula with different component dosage is an object. The j kinds of components represent jconditional attributes. Formula’s comprehensive score and cost are double decision attributes (d1,d2). Here, the
comprehensive score of formula is divided into two groups: high and low (d11,d12),
condition attribute of the formula, we can get the set of objects with high and low score respectively. Likewise, based on the concept lattice formed between the cost and the conditional attributes of formula, the set of objects with high, medium and low cost can be obtained respectively. In this way, all formulas can be classified into different groups according to the corresponding indicators. A general rule of component dosage of formula can be identified.
ASSOCIATION RULES MINING OF FORMULA
After a simple classification for formula based on the double decision attribute concept lattices, a further excavate the potential rule among formula component, comprehensive score and cost is needed. And while doing knowledge discovery, association rules are very useful in discovering the potential relationships and valuable knowledge that exist between the attributes [29]. Many scholars have studied the association rules mining based on concept lattice [30-33].
Definition 6 I is an item set, I
i i1, ,2 ,in
, D is a set of all transactions, transaction T is a subset of I , TI, each transaction identify with an only sign TID, Association rule can be described as the form R:X Y, X I,YI and
Y
X , X is called the condition of the rule, Y is called the result of the rule.
Definition 7 The support of R can be formalized as
( ) ( ) ( ) / ( )
Support R P XY number XY number D . number X( Y) is the
number of transactions of Xand Ycontained in the transaction set. number D( )is the number of all transactions that are contained in the transaction set . The support of R describes the importance of itself. The confidence of association rule R
can be formalized asConfidence R( )P Y X( )P X( Y) / ( )P X . It describes the reliability of this rule.
Definition 8The min_ sup represents the minimum support threshold that the association rule must satisfy, the min_conf represents the minimum confidence threshold that the association rule must satisfy. The association rules of which support and confidence bigger than min_supand min_conf are called strong association rule in this paper.
In the process of mining the association rules of formula components, we can set the min_sup and the min_conf according to the two indicators defined as above. Using the extraction algorithm in literature [33], we can extract the non-redundant association rules between the component dosage and the comprehensive score, the component dosage and the cost of formula respectively.
The kinds of key component aj and component dosage that influence the
APPLICATION STUDY
Taking the ferment formula as an example, we constructed the decision attribute concept lattices between the ferment component and comprehensive score, the ferment component and cost, respectively. On the basis of decision attribute concept lattices, the corresponding association rules were extracted and the hidden knowledge of the ferment formula components was dug out.
Data Sources and Processing Method
Using the data in literature [23] and added the cost column to get the original data shown as the TABLE I in appendix. The original data consists of 25 groups of data objects of formula with different component dosage. Aloe vera, Chinese wolfberry, hawthorn, Chinese yam and Coix seed as 5 condition attributes, the comprehensive score and cost as two decision attributes.
First, the original data was dealt with fuzzy measure [34]. The components and comprehensive score of ferment was divided into two categories: high and low respectively, the cost attribute was divided into three categories: low, medium and high. The corresponding membership function is shown in Figure 1.
[image:5.612.95.511.343.447.2]Figure 1. Membership function diagram.
TABLE II in appendix shows the attributes-objects table of 25 group data after fuzzification. The mean value of each column was taken as the corresponding threshold value and the values bigger than were displayed in bold in the table. The bold data was taken as 1, and the other data was taken as 0 in TABLE II of appendix, where "1" indicates that the object has one attribute, while "0" indicates that the object does not have it.
The decision formal context formed by A-Aloe vera (%), B-Chinese wolfberry (%), C-hawthorn (%), D- Chinese yam (%), E-Coix seed (%) and comprehensive score (denoted by Z) is shown in TABLE I. Similarly, we can obtain the decision formal context of each component and cost (see the TABLE III in appendix). Here listed only a part of decision formal context because of space constraints.
Knowledge Generation Based on Decision Concept Lattice
TABLE I. THE DECISION FORMAL CONTEXT BETWEEN THE FERMENT COMPONENT AND COMPREHENSIVE SCORE.
A1 A2 B1 B2 C1 C2 D1 D2 E1 E2 Z1 Z2
1 0 1 1 0 1 0 1 0 0 1 1 0
2 0 1 1 0 1 0 0 1 1 0 1 0
3 0 1 0 1 1 0 1 0 1 0 0 1
4 0 1 1 0 0 1 1 0 1 0 1 0
5 1 0 1 0 0 1 0 1 1 0 1 0
6 1 0 1 0 1 0 0 1 1 0 0 1
7 1 0 0 1 1 0 0 1 1 0 0 1
8 1 0 0 1 1 0 1 0 0 1 1 0
9 1 0 1 0 1 0 0 1 0 1 1 0
10 1 0 1 0 0 1 1 0 0 1 1 0
Figure 2. The concept lattice of each component and the comprehensive score.
Using the concept lattice tool-Conexp, the concepts of high comprehensive score (Z2) and low comprehensive score (Z1) were obtained respectively.
Figure 3. The concept lattice of decision attributes Z2.
The process of generating concept lattice is the process of conceptual clustering. Therefore, we can get an object set with the high comprehensive score from Figure 3: {3,6,7,11,12,17,18,22,23,24}. We can also get an object set with the low comprehensive score using the same method: {1,2,4,5,8,9,10,13,14,15,16,19,20,21,25}. In this way, the formulas were divided into two categories: Z1 and Z2, one is the low comprehensive score means that the formula is poor, the other is the high comprehensive score means the formula is good.
[image:6.612.182.411.432.537.2]148 concepts were generated from 25 objects and 13 attributes. Similarly, the object set of low cost:{5,10,11,12,13}, the object set of medium cost:{1,4,6,8,9,15,16,17,18,20,22,24,25} and the object set of high cost:{2,3,7,14,19,21,23} can be obtained (Figure 2 in Appendix). Correspondingly, the formulas were divided into three categories: low cost (W1), medium cost (W2) and high cost (W3).
Extract the Association Rules
[image:7.612.93.498.279.417.2]By set the support threshold to 0.2 and the confidence threshold to 0.7.,we extracted 72 strong association rules from Figure 1. Since we wanted to get the key components and component dosage which influence the effect of formula, it needed to obtain the strong association rules of which the antecedent rule was the component dosage and the consequent was the comprehensive score. Here we give the filtered association rules, as shown in TABLE II.
TABLE II. MIN_SUP=0.20,MIN_CONF=0.70.
Rules Support Confidence
1 B1 => Z1 0.60 73%
2 C1 E2 => Z1 0.36 78%
3 A2 => Z1 0.32 88%
4 A1 E1 => Z2 0.32 88%
5 A1 B2 => Z2 0.32 88%
6 B1 D2 => Z1 0.24 83%
7 B2 E1 => Z2 0.24 83%
8 A1 C2 D1 => Z2 0.24 83%
9 B1 C1 E2 => Z1 0.20 100%
TABLE III. MIN_SUP=0.15,MIN_CONG=0.60.
Rules Support Confidence
1 A1 B1 => W1 0.36 89%
2 A1 B1 C2 => W1 0.20 100% 3 A1 B1 D1 => W1 0.20 100% 4 A1 B1 E1 => W1 0.16 100% 5 A2 B1 D1 => W2 0.16 100% 6 A2 C1 E1 => W3 0.16 100% 7 B2 C1 E1 => W3 0.16 100% 8 A1 C1 E2 => W2 0.28 86% 9 A1 B1 E2 => W1 0.20 80%
10 A1 B2 D1 => W2 0.20 80%
TABLE II lists nine association rules, in which rules 1, 2, 3, 6, 9 are the association rules between each component of the ferment (i.e., the condition attribute) and the decision attribute: "low comprehensive score". Rule 1 (low content of Chinese wolfberry => low comprehensive score) and rule 3 (high content of aloe vera => low comprehensive score) showed that the low comprehensive score was mainly affected by low content of Chinese wolfberry and the high content of aloe vera. In addition, we can get a ferment formula with low comprehensive score: high content of aloe vera, low content of Chinese wolfberry, low content of hawthorn, high content of yam and high content of coix seed from these association rules. And rules 4, 5, 7, 8 are the association rules between each component of the ferment and the decision attribute: "high comprehensive score". Moreover, considering these association rules comprehensively, we can get one kind ferment formula with high comprehensive score: low content of aloe vera, high content of Chinese wolfberry, high content of hawthorn, low content of yam and low content of coix seed. Object 17 and object 22 meet such formula requirements, it can be seen that their comprehensive score is the highest from the original data table, indicating that the extracted association rules are in line with the actual situation.
association rules of which the antecedent rule was the component dosage and the consequent rule was the comprehensive score. Here we give the filtered association rules, as shown in TABLE III.
TABLE III lists 10 association rules, rule 1 indicating that when the content of aloe vera and Chinese wolfberry is low, the cost is low. The confidence of rule 2 is 100%. Comparing with rule 1, we can see that the content of hawthorn has little effect on the cost. Comparing rule 3 with rule 9, we can see that the content of Chinese wolfberry has great effect on the cost. At the same time, rule 7 and rule 10 also confirm it. The confidence of rule 5 is 100%. Comparing with the rules 3, we can see that the content of aloe vera has significant impact on the cost. At the same time, rule 6 also proved this conclusion. Combining rule 8 with the previous conclusions (the content of hawthorn has little effect on the cost and the content of aloe vera has significant impact on the cost), we can see that the content of the coix seed has significant impact on the cost. However, rule 9 shows that the impact of the content of coix seed on the cost is less than the content of Chinese wolfberry. In summary, it can be concluded that the content of aloe vera and Chinese wolfberry have a great impact on the cost, the content of coix seed has a medium impact on the cost and the content of yam has little effect on the cost.
Tacit Knowledge Mining on Ferment Formula Component Dosage Based on Concept Lattice Association Rules
From the above mining and analysis of the concept lattice and association rules between the formula components and the double decision attributes (comprehensive score and cost), and taking the formula effect and the cost into account, the results in TABLE IV were obtained through the classification of 25 formulas.
TABLE IV. FORMULA SELECTION BASED ON THE COMPREHENSIVE SCORE AND THE COST.
Category Formula
High comprehensive score and low cost High comprehensive score and medium cost
High comprehensive score and high cost Low comprehensive score and low cost Low comprehensive score and medium cost
Low comprehensive score and high cost
11,12 6,17,18,22,24
3,7,23 5,9,10,13 1,4,8,9,15,16,20,25
2,14,19,21
between the ferment component and the cost. So enterprise should avoid using this kind formula when developing new formula of ferment.
CONCLUSIONS
In this paper, the concept lattice knowledge is applied to the research of formula component for the first time. This paper constructed the double decision attribute concept lattices among the component, the effect and the cost of ferment formula, and extracted the non-redundant association rules. Considering two decision attributes: the comprehensive score and the cost of formula, excavated the key components and dosage that influence the effect and cost of formula. We selected the formula that has good performance in both function and economy and got the usage rule of component of the excellent formula. Finally, the validity of this method was verified by taking ferment as an example, realized the analysis for the problem of double decision attribute. The limits of this study are: it cannot accurately determine the specific dosage of each formulas from a quantitative point of view; and it only dig out the tacit knowledge contained in the formula component, without taking into account the kinds of components and the preparation procedure on the effect and cost of formula. In the future research, all of the above aspects should be considered for a deeper mining.
ACKNOWLEDGEMENT
The article was supported by the project of"knowledge allocation, integration and innovation research of enterprise knowledge network within cross-border integration" of the National Natural Science Foundation of China. (Grant No. 71571066).
REFERENCES
1. Wille, R. 1982. “Restructuring lattice theory: an approach based on hierarchies of concepts,” in I. Rival (Ed.), Ordered Sets, Reidel, Dordrecht, Boston, pp. 445-470.
2. LI Jinhai, MEI Changlin and LV Yuejin. 2011. “A heuristic knowledge-reduction method for decision formal contexts,” Computers and Mathematics with Applications, 61(4): 1096-1106. 3. Li J., Mei C. and Kumar C. A., et al. 2013. “On rule acquisition in decision formal contexts,”
International Journal of Machine Learning & Cybernetics, 4(6):721-731.
4. Stumme G., Wille R. and Wille U. 1998. “Conceptual Knowledge Discovery in Databases Using Formal Concept Analysis Methods,” in European Symposium on Principles of Data Mining and Knowledge Discovery. Springer-Verlag, pp. 450-458.
5. Rudolf Wille. 2002. “Why can concept lattices support knowledge discovery in databases?”
Journal of Experimental & Theoretical Artificial Intelligence, 14(2-3):81-92.
7. Yang S., Ding S. and Cai S., et al. 2008. “An algorithm of constructing concept lattices for CAT with cognitive diagnosis,” Knowledge-Based Systems, 21(8):852-855.
8. Xu W. and Li W. 2016. “Granular Computing Approach to Two-Way Learning Based on Formal Concept Analysis in Fuzzy Datasets,” IEEE Transactions on Cybernetics, 46(2):366. 9. Wang Y. H. and Yang L. 2011. “Application of data mining based on concept lattice in
inventory management,” Application Research of Computers, (05):1745-1747.
10. Gao J., Qu J. and Xin Z. 2015. “Towards on the MOOCs Knowledge Discovery Based on Concept Lattice,” International Journal of Multimedia & Ubiquitous Engineering, 10(5):287-296.
11. Zhang W. X. and Qiu G. F. 2005. Uncertain Decision Making Based on Rough Sets. Tsinghua University Press, Beijing. 2005.
12. Wei Ling, Qi Jianjun and Zhang Wenxiu. 2008. “Attribute reduction theory of concept lattice based on decision formal contexts,” Science China: Information Sciences, 38(2): 195-208. 13. Kaishe, Yanhui and Zhai, et al. 2007. “Study of Decision Implications Based on Formal
Concept Analysis,” International Journal of General Systems, 36(2):533-542.
14. Wu W. Z., Leung Y. and Mi J. S. 2009. “Granular Computing and Knowledge Reduction in Formal Contexts,” IEEE Transactions on Knowledge & Data Engineering, 21(10):1461-1474. 15. Li Jinhai, Mei Changlin And Lv Yuejin. 2011. “Knowledge reduction in decision formal
contexts,” Knowledge-Based Systems, 24 (5): 709-715.
16. Jinhai Li, Changlin Mei and Yuejin Lv. 2012. “Knowledge reduction in formal decision contexts based on an order-preserving mapping,” International Journal of General Systems, 41(2):143-161.
17. Wei Ling, Qi Jianjun and Zhang Wenxiu. 2008. “Attribute reduction theory of concept lattice based on decision formal contexts,” Science China: Information Sciences, 38(2): 195-208. 18. Li Jinhai, Mei Changlin, Aswani Kumar C and Zhang Xiao. 2013. “On rule acquisition in
decision formal contexts,” International Journal of Machine Learning and Cybernetics, 4 (6): 721-731.
19. Shao M. W., Leung Y. and Wu W. Z. 2014. “Rule acquisition and complexity reduction in formal decision contexts,” International Journal of Approximate Reasoning, 55(1):259-274. 20. Li Jinhai, Aswani Kumar C., Mei Changlin and Wang Xizhao. 2017. “Comparison of
reduction in formal decision contexts,” International Journal of Approximate Reasoning, 80: 100-122.
21. Shao Hu, Zhu Xiao and Pang Xiao. 2016. “Optimization of processing Technology and formula of Black Yoghurt Ice-cream,” The Food Industry, (02):81-84.
22. Ji Yanqing, Zhao Lankun and Ji Guanghong. 2016. “Study on Formula Optimization of Chicken Extract,” China Condiment, (07):42-45.
23. Yang Jingjuan, Li Na and Zhao Shenglan. 2016. “Research on the formula optimization of a ferment,” China Brewing, (01):95-99.
24. Huang Xiaoping, Xu Fuyuan and Xiao Zuobing. 2005. “The Application of AHP in the Selection of Flavor Formula,” Flavour Fragrance Cosmetics, (02):41-43.
26. Ganter B. 1996. “Formal Concept Analysis: Mathematical Foundations,” Springer-Verlag New York, Inc. 1996.
27. Xie Zhipeng and Liu Zongtian. 2002. “A Fast Incremental Algorithm for Building Concept Lattice,” Chinese Journal of Computers, (05):490-496.
28. Hu Keyun, Lu Yuchang and Shi Chunyi. 2000. “Advances in concept lattice and its application,”Journal of Tsinghua University (Natural Science Edition), (09): 77-81.
29. J. Wang, Q.H. Dang and Z.J. Wu. 2011. “Mining Model of Fuzzy Association Rules and its Application,” in Calciner. Process Automation Instrumentation, 219-220(8):904-907.
30. Hu X. G., Wang D. X. and Liu X. P. 2005. “The analysis on model of association rules mining based on concept lattice and Apriori algorithm,” in International Conference on Machine Learning and Cybernetics, IEEE, 3:1620-1624.
31. Dai S. and Li N. 2012. “A Research about mining association rules based on Quantitative Concept Lattice,” in International Conference on Transportation, Mechanical, and Electrical Engineering, IEEE: 1337-1340.
32. Liu B., Hua Q. I. and Shen F. 2016. “An Improved Algorithm of Association Rules Mining Based on Concept Lattice,” Geomatics World, pp.64-68.
33. Wang J., Li S. and Feng X. 2016. “Extraction of non-redundant association rules from concept lattices based on IsoFCA system,” in International Conference on Computer Science and Network Technology, IEEE, pp.479-484.