QA Distiller - Comparative Analysis with QA Tools

5 QUANTITATIVE PHASE: DATA ANALYSIS

5.6 Comparative Analysis with QA Tools

5.6.2 QA Distiller

Figure 5.18 Screenshot of the QA Function in QA Distiller

A QA check on TM A using QA Distiller 8.5 returns detailed results.

Inconsistencies are graded in severity, and the number of occurrences of a repeated inconsistency is provided (see yellow diamond shape in figure 5.18).

The program presents inconsistencies in a spreadsheet format up to a total of 10,000 (see blue diamond shape in figure 5.18). As this QA check exceeded the limit, no total number was given. Double‐clicking on an error message in QA Distiller brings up the internal X‐Editor tool, wherein the user may edit or delete segments (see figure 5.19).

Figure 5.19 Screenshot of the X‐Editor Window in QA Distiller

The program appeared to be efficient at finding segment‐level inconsistencies in categories 2 and 3, giving each error a type such as ‘inconsistent translation’,

‘consecutive spaces’, ‘leading/trailing spaces’, and ‘capitalisation’. The results also contained a large number of false positives. It incorrectly identified 270

‘corrupt character’ errors due to an accented character in a repeated term appearing (correctly) in the German TT and also misidentified 201 number value errors. An example is in the translation of:

Options for ends allow you to select dot diameters which are from two to ten times the line thickness.

This was translated as

In den Listen Anfang und Ende unter Ausführung der Enden können Sie Punktdurchmesser auswählen, die das Zwei‐ bis Zehnfache der Linienstärke betragen.

Numbers that are written or spelled with punctuation appear to often be misidentified, as in the example provided. Other false positives of this type included ‘two’ translated as ‘both’, and problems with the use of the term ‘3d’.

Both of these tools are clearly useful and may provide valuable information about TMs. However, this section demonstrates the contribution of this study in measurement of inconsistency at a sub‐segment level and in using more detailed categorisation. Furthermore, neither tool provides sub‐segment categorisation of inconsistencies, or the frequency of category 1 TUs, although both calculations are difficult to automate. Both tools also returned a number of false positives in their error reports. Nonetheless, they quickly provide a snapshot of the state of consistency in a TM and, in the case of QA Distiller, error detail and a straightforward maintenance interface. The additional functionality is reflected in the cost of the tools – Xbench is free of charge at the time of writing (as it is under beta test) and QA Distiller licenses currently cost from €249 to €2500.

5.7 Summary

This chapter demonstrated the practical application of the first phase of the mixed methods study as set out in Chapter 4. It contained an analysis of each TM, presented in turn and subdivided into the four categories of repeated segments.

Section 5.6 contains the results of a QA check using two current commercial QA tools so that the results of an automated QA scan may be compared with the results of the current study. The findings in this chapter begin to answer our research questions as specified in section 4.2. Addressing question 1, we can say that, in this case study, TMs are not consistent. Across the four TM corpora, the rate of introduced inconsistency as shown in table 5.15 represents lost leverage and time spent editing previously accepted translations. As such, all of our stated general assumptions about TM (consistency, cost savings, and time savings) are affected by introduced inconsistency. Section 6.5 gives interview participants’

opinions as to whether this inconsistency is found more generally in TMs.

TM A TM B TM C TM D

Category 3 TUs 390 239 826 1713

Category 4 TUs 6674 4263 18343 25541

Total TUs with repeated ST segments (Category 3+4)

7064 4502 19169 27254

Percentage of TUs with introduced inconsistency

5.5% 5.3% 4.3% 6.3%

Table 5.15 Introduced Inconsistency in all TMs

Each TM corpus in this study shows a high proportion of introduced noun or term inconsistency in category 3. Many noun inconsistencies demonstrate influence from the source language, and different translation decisions have been propagated throughout the TM, such as the alternation between レイヤ 'laya' and 画層 'gasou' [layer] from TM D (example 5.53), or between the

alternated whole phrases 'in der Befehlszeile' [in the command line] and 'an der Eingabeaufforderung' [at the command prompt] in TM C (example 5.31). Verb and punctuation changes are also common throughout the corpora. These sub‐

segment inconsistencies are included in interview questions in the following chapter and discussed further in section 6.5.

Comments and markings, possibly as an indicator that the translator should revisit or review a segment, have been propagated in TMs C and D. In each of the TM corpora there appears to be a lack of clarity as to whether ST punctuation and formatting should be replicated or replaced by that native to the TT (see examples 5.48 and 5.54), leaving a combination of both in the TM data. The English‐to‐Japanese TM data (B and D) in particular also show evidence of explicitation in the TT. These further questions are also to be addressed in the qualitative study, contained in chapter 6.

Our results also show that TM source texts are not consistent. By comparing category 1 and 2 results, we can see the rate at which minor ST inconsistencies were corrected (or, inversely, further inconsistencies propagated) between the four TM corpora in table 5.16.

TM A TM B TM C TM D

Category 1 TUs (inconsistent → inconsistent)

370 65 995 1980

Category 2 TUs (inconsistent → inconsistent)

613 914 2077 1801

Total TUs with minor inconsistency in ST segments

983 979 3072 3781

Percentage of those TUs made consistent

62.4% 93.4% 67.6% 47.6%

Table 5.16 Minor Inconsistency in all TMs

A benefit of TM may be seen from the number of TUs in category 2, containing consistency introduced in the TM translation process. English‐to‐Japanese TM B aside, many of the TUs with inconsistent ST evinced further inconsistency in the aligned TT. ST segments featuring inconsistent letter case or extra trailing spaces at the end are often aligned with TT segments containing further inconsistency, for example, 60% of TUs with inconsistent ST letter case in TM D were associated with further TT inconsistency. However, when the number of inconsistent ST segments that are aligned with TT segments featuring further inconsistencies were tested for correlation using SPSS, the result was not statistically significant, so there was no clear correlation between the rate of inconsistent ST and introduced inconsistency in the TT³⁵. Our interviewees’ experiences of source text inconsistency will be discussed in section 6.3.

Our second research question asks how consistency can be identified and measured in TM data. The methodology described in chapter 3 and the results in the current chapter offer one possible way of identifying and measuring consistency. These results show that the typology and categorisation operationalised in section 4.4 brought quantifiable results from the TMs studied.

Chapter 6 will offer our interviewees opinions as to whether these phenomena appear more widely in TMs.

35 A strong but statistically insignificant correlation was found between source and target inconsistencies where r = .945, p = .055.

In document Measuring consistency in translation memories: a mixed-methods case study (Page 176-182)