The rationale for choosing TKIM was because this instrument directly measured practical knowledge and learning acquired while on the job (Sternberg et al., 1995; Wagner, 1985) and associated with individual performance (Wagner & Sternberg, 1987; Sternberg & Horvath, 1999). In TKIM scoring, an expert profile of successful managers as a baseline of quality was created for comparison with the other employees (Almeida,1994).
There are a number of specific techniques to calculate the score of tacit knowledge. Sternberg and Grigorenko (2001) and Hedlund et al. (1998) recommended three ways of calculating tacit knowledge scoring: (a) correlating between subjects’ rating with an index of group members; (b) examining the degree of participants’ responses with professional rules of thumb, or; (c) computing difference or agreement scores between subjects’ ratings and an expert prototype. This present study employed the technique of determining difference or agreement scores between respondents’ ratings and those of an expert. This crucial technique was adopted because of its accuracy and because it is a reliable method in the specific managerial context, as confirmed by Wagner (1987) and Armstrong and Mahmud (2008).
The calculation of score for tacit knowledge involved:
1. Rating scores of expert, novice and typical groups (Armstrong & Mahmud, 2008) were identified after considering Wagner’s (1987) observation that tacit knowledge scores are affected by individual differences, as studying tacit knowledge involves the deviation of rating scores from the expert and other groups.
2. The mean and standard deviations of respondents in the TKIM Inventory were calculated.
3. The mean and standard deviation values were applied in transformation of rating scores to the standard score TKIM by adopting the standardised transformation TKIM formula with standard deviation 1.5 by Wagner (1987). The ratings were transformed to create an equal standard deviation of ratings across items for every subject. The analysis of TKIM scores focused on the level of agreement found between experts’ and non-experts’ choices. After this, all the data of TKIM responses was subtracted from the specific mean and standard deviation of the experts’ group.
156
4. There are no right or wrong answers items in the test, and the interpretation of the non-experts’ score is not directly related to how low or highly the participant rated the items, but to how low or high was his or her normalised score of deviation from agreement with the experts. TKIM is a deviation score; the smaller the deviation, the stronger the agreement with the experts (Colonia-Willner, 1998: 49).
5. Specific equations were adopted from Armstrong and Mahmud (2008), following the techniques of TKIM by Wagner (1987), Menkes (2002) and Forsythe et al. (1998) as below :
(((Xij – Xi)/sdi) x 1.5) where
i = 1-308, j = 1-91,
Xij = rating score each subjects
Xi = mean across each subject’s response items,
Sdi = standard deviation across each subject’s response items.
6. This equation was used to produce the standardised deviation score for each subject. Further, this absolute standardised deviation score had to be subtracted from the experts’ average deviation score by adopting the next equation, shown below:
7. Differences tk score = corrected TK rating – experts’ mean TK
Dtkij = ABS (ztkij – xtkj),
Where;
i = 1-308, j = 1-91,
dtk = differentiated TK score, ztk = corrected TK ratings,
xtk= expert’s mean tacit knowledge
8. The values for each score in the inventory were then summated in order to produce scores for managing self, managing tasks, managing others and overall tacit knowledge. This approach followed Sternberg and Grigorenko’s (2001) technique of computing a profile match or difference score between participants’ ratings and an expert prototype. Averaging was necessary to obtain meaningful results of differences because the results of subscale contexts were not composed of the same number of items.
9. The present study differed slightly from Colonia-Willner (1998), who performed deviation TKIM by adding the squared z score (squared difference between non-experts’ rating and experts’ mean rating) for each
157
item and dividing the sum by the number of strategy items in the area (a
D²/N formula). However, Cianciolo et al. (2006) calculated the difference
score of TKIM by squared Mahalobis distance (D²) to obtain standardised distance.
10. Regarding the technique of square differences score to remove the polarity, it has been argued that squaring tends to inflate the value and affects further calculations. Therefore, the recommendation to use absolute value for comparisons between experts’ and novices’ scores (Kerr, 1991; Mahmud, 2006), was followed in this present study.
11. For this reason, the scoring method by Wagner (1987), transforming the raw data of rating tacit knowledge and identifying the deviation from the expert profile, has been recognised for its ability to allow for meaningful comparisons between groups (Sternberg et al., 1995).
6.5 Goodness of Measures
Two main techniques that are used to assess the goodness of measures are validity and reliability (Sekaran, 2000), also known as psychometric characteristics of an instrument (Punch, 2005:95).Therefore, the reliability and validity of the constructs were measured by summated scale (i.e., summated scale in factor analysis). Further details are given in the sections below.
6.5.1 Validity
Validity is the extent to which a measurement tool accurately measures what it is supposed to measure (Hair et al., 2010; Punch, 2005). The purpose of validity is to ensure that the scale measures the concept definition, is unidimensional and has appropriate levels of reliability. Thus, a scale’s validity has to be examined before further analysis. Punch (2005) and Sekaran (2000) stated that three types of validity are content and face validity, criterion-related validity and construct validity. Content validity is related to the full content of a conceptual definition being represented. A factor is considered to have content validity if there is theoretical support from the literature that items included in each summated factor representatively sample the intended domain of the concept it is intended to measure (Taylor & Wright, 2004). The discussion in the preceding literature review reflects the origin of the construct in the relevant literature. The purpose of content validity is to specify the content of a
158
definition, and to develop indicators which sample from all areas of content in the definition. Face validity, on the other hand, subjectively assesses the correspondence between individual items and the concept through ratings in a pilot test with sub- populations. The objective is to ensure that the selection of scale items of measurement meet theoretical assumptions and practical understanding (Hair et al., 2010).
Criterion-related validity is an indicator that a measured construct acts as expected based on theory compared with another measure of the same construct in which the researcher has confidence. Two type of criterion validity are concurrent validity and predictive validity. Concurrent validity is the criterion validity at the present time, while predictive validity is the criterion validity which may exist later. Because the time horizon of this present study is cross-sectional, concurrent validity has been adopted. As seen in from the literature review, numerous relationships between variables are expected. These are the expected correlations used in considering the criterion-related validity.
Construct validity focuses on to what extent a measure confirms theoretical expectations. Construct validity evaluates any measure in a given theoretical context and therefore show relationships with other constructs which can be predicted and interpreted within that context. In construct validity, there are two methods to assess validity, convergent and discriminant validity. Convergent validity is used to assess scales correlation with other factors of the same construct, while discriminant validity is to identify whether the scales are different from other constructs (Hair et al., 2010). Hence, factor analysis and correlation matrix analysis were performed to assess the convergent and discriminant validity of the data.
Factor analysis is an well-established tool used to identify the construct adequacy of a measuring device (Cooper & Schindler, 2003). All the data collected for the predictive variable were included in the validity analysis because these responses did not include any disagreement that required the data to be excluded. Regarding the sample size for factor analysis, Comrey and Lee (1992) suggest that 100 = poor, 200 = fair, 300 = good, 500 = very good, 1000 or more = excellent. Factor analysis was carried out with data collected from 308 subjects. This is an acceptable number according to Hair et al., (1998), Meyers et al. (2006), Coakes and Steed (2003) and Bartlett et al. (2001), for
159
conducting factor analysis. However, this study did not meet the minimum number per subject, which is five subjects per item according to Coakes and Steed (2003), ten subjects per item according to Meyers et al. (2006) and twenty subjects per item according to Hair et al. (1998). In this study, 151 items were analysed and a sample size of 308 is therefore considered less than satisfactory for conducting a single analysis. For this reason, a separate factor analysis was performed for all the interval scales measured. The validity and reliability of the three constructs, namely, knowledge sharing practices, managerial tacit knowledge and personality traits, were examined. The following sections discuss in details the construct validity (factor analysis) of the study variables.