• No results found

Computational methods to analyze and compare assembly work instructions

CHAPTER 2: FRAME OF REFERENCE

2.5 Computational methods to analyze and compare assembly work instructions

Text similarity is required for a variety of applications ranging from artificial intelligence to information retrieval [59–62]. There are several methods of text comparison, a brief overview of some is provided by Achananuparp and colleagues [60]. In this research, we consider three broad approaches of text similarity measures:

40

 Pattern matching based on frequencies

 Vector representations of texts using corpora

Word Overlap and Jaccard method belong to the first of three broad approaches. Term Frequency – Inverse Document Frequency (TF-IDF) belongs to the second and Latent Semantic Analysis belongs to the third. In this research, Word Overlap, Jaccard and TF-IDF are implemented in Matlab, while an implementation of LSA (http://lsa.colorado.edu/) [63] is used to perform document-to-document similarity checks on assembly work instructions.

2.5.1 Word Overlap

This method of computing similarity of texts is relatively simplistic. Word Overlap similarity score is determined by counting the number of terms which are common to both, query and database text. This is then divided by the number of words in the query text. It is hypothesized that this method is sensitive to synonymy and polysemy of words. It is also hypothesized that if this method is used for retrieval of assembly work instructions, there will be a large number of false positives. These hypotheses will be tested as a part of this research. Mathematically, Word Overlap is represented as:

Word Overlap = A B A

(3)

41 2.5.2 Jaccard Algorithm

This method is similar to Word Overlap method. It computes the intersection (common words) of the words of the two texts being compared and also the union of the words. The proportion of the former to the latter is the Jaccard score of similarity is computed as:

 

Jaccard Similarity = A B

A B (4)

Where, A is the query text and B is the database text

2.5.3 Term Frequency – Inverse Document Frequency (TF – IDF)

This method of evaluating text similarity is reliant on all the documents in the database. The size of the database and accuracy of the scores are proportionally related. This measure of similarity takes into account the frequency of word occurrences in both, query and database, texts; as well as, frequency of word occurrence in all documents in the database. The version of TF-IDF used in this research is:

 

         

, , 1

TF-IDF Similarity = log 1 log 1 log

0.5 a A a B a A B a N tf tf df (5)

where, tfa,A is the number of times term a appears in A, tfa,B is the number of times term a

appears in B, dfa is the number of documents in which term a appears and N is the total

number of documents [62].

2.5.4 Latent Semantic Analysis

Text-based information retrieval is challenging especially when the information to be mined and retrieved is authored in unstructured, free text. In other words, when text is

42

authored in a subjective manner with a high degree of variability (in structure and language) among authors, computational retrieval becomes challenging. The diversity, within a language, with which information can be expressed, is large. Latent Semantic Analysis (LSA) may be used to mining and retrieval of similar text from databases [63]. LSA represents the query and database entities as matrices and performs singular value decomposition. The text entities (from query and database) are then represented as vectors in n-dimensional space and the similarity between the two is represented by the cosine of the angle between the vectors [63]. This method is independent of word-order and does not require the use of formatted corpora. The only two requirements from this method is tokenized units of text; and a large database of text to compare against (topic- space) [63]. These units maybe words, phrases, sentences or documents. LSA has been used for information retrieval by previous researchers [63,64]. In this research, an implementation of LSA (http://lsa.colorado.edu/) is used to perform similarity checks on assembly work instructions [64]. A generic topic space chosen for this comparison is “General reading up to 1st year college”. The ability of this LSA configuration to mimic human interpretation of assembly work instruction similarity is assessed.

Assembly work instructions, when authored in free text, can lead to inconsistencies in assembly time estimates and consequently, assembly line balancing results. For an enterprise which has several manufacturing locations producing similar products, there is a need for consistent assembly line planning to ensure efficient quality control. The strong relationship between standardized procedures and quality control are discussed by Berger [13]. The results from this research will enable the retrieval of

43

similar assembly work instructions and thus, enable continuous improvement. This comparison and retrieval of assembly work instructions will allow for assembly process planners to compare their assembly processes to those of other manufacturing locations. This will lead to incremental improvements and improved standardization of process plans [13]. In the context of this research, there are a few challenges:

 Assembly work instructions are not always grammatically correct.

 Work instruction authors use free text, leading to variations in process descriptions for the same product.

 Work instructions are generated by authors in several different locations around the world.

 The level of detail of work instructions may vary across authors.

The need to improve communication of process design knowledge, through assembly work instructions, has been established. Globalized manufacturing requires communication of assembly work instructions among the manufacturing facilities in order to ensure continuous improvement. Assembly work instructions are authored in free text at locations across the world, and may not be grammatically consistent. The performance of existing text similarity comparison methods while evaluating assembly work instruction similarity needs verification. This verification is conducted in this research.