Issues Arising from the Pilot Study

3 PILOT STUDY

3.5 Issues Arising from the Pilot Study

A number of methodological issues arose in the pilot study. During the course of the study, the manual search for inconsistencies within Paraconc was found to be laborious, and particularly so when searching among tags. The method would not have scaled up easily to a larger sample of TM data. Some elements of the process could be automated to remove tags and improve the speed and measurement reliability of a study without compromising accuracy. The search for commonly‐occurring word clusters in Wordsmith and the follow‐up search for those word clusters in Paraconc were unreliable methods of capturing all repetitions in the ST, although they succeeded in the initial aim of discovering whether there were TT inconsistencies present in the TM. A better method was nonetheless required for the main study.

As discussed later in section 4.3.2, a single case study forms a poor basis for generalisation. A decision was thus made to carry out the main study on more than one set of data, to show that the methodology was replicable. It was decided to use English/Japanese data in addition to English/German, to show that the methodology may be applied for non‐European languages. Japanese was chosen based on the researcher’s language competence.

At the data analysis stage of the pilot study, inconsistencies were counted based on the number of types at TT segment level. In example 3.2, where one ST segment is translated as two different TT segments, we counted two inconsistencies. However, while translating interactively using this TM, the translator would presumably have received one of 3.2.1t or 3.2.2t as a suggested match. The translator's decision to change the suggested TT segment to the second TT translation should count as one inconsistency, if the suggested segment is considered as the ‘first’ or ‘master’ segment. For this reason we changed the system of counting for the main study, assigning one TT segment

the status of the master segment in cases where there were multiple types of TT translated from a single ST segment. This is further explained in section 4.4.

Inconsistencies were largely categorised according to part‐of‐speech as it became clear while searching for inconsistencies that many translations of a single segment differed only by a noun or verb. Categories for inconsistent word order and inconsistent punctuation were added as these became apparent from the data. This pattern continued into the main study. Although this seemed successful, some inconsistent segments fell into more than one category, for example:

3.4s This chapter includes the following topics:

3.4.1t Dieses Kapitel enthält folgende Themen:¹²

3.4s This chapter includes the following topics:

3.4.2t In diesem Kapitel werden folgende Themen behandelt:¹³

In segments such as 3.4.1t and 3.4.2t, with verb and prepositional changes, counting as a single inconsistency would not give the full picture. Making each adjustment takes time for a translator ‐ salient as TM tools are intended to save time and money. As a result, it was decided that inconsistencies would be measured at segment level and again at sub‐segment level in the main study, counting each inconsistency found within the segment. Full details of the methods of counting and categories for the main study can be found in section 4.4. It quickly became apparent from examples such as 3.1, and the addition of ST segmentation inconsistency as a cause of TT inconsistency in the 2008 TM, that measuring TT inconsistencies alone did not provide a full picture of the level of consistency within a TM. As a result, the four categories of consistency in section 4.4 were chosen for the main study.

The pilot study was an exploratory project and as such could not be expected to explain the reasons behind the prevalence of certain categories of inconsistency.

It was felt that for the main study explanatory data, provided by professionals with experience of TM, would be valuable so that consistencies may be minimised in future. This was addressed in the main study by choosing a mixed methods research design as outlined in the following chapter. The pilot study and issues raised within it proved valuable as a framework to be improved upon in the methodology used in the main study.

3.6 Summary

This chapter described a pilot study that was carried out in 2009 in preparation for the main study, the methodology for which follows in Chapter 4. The pilot study was an initial attempt to check for consistency in two English‐to‐German TMs created during software updates. The methodology involved converting the data from a proprietary format, extracting segments, and searching for commonly occurring word clusters. By searching for these word clusters using a bilingual concordance tool and sorting the results alphabetically, it was possible to see whether segments had been repeated and to check for introduced inconsistency.

The pilot study found inconsistencies, mostly of noun phrase and word order, in the data. Fewer inconsistencies were found in the second TM, possibly as a result of a translation process that involved controlling the source text and using a translation management system. The study did, however, show that there were inconsistencies in the TMs that, using an improved and less laborious methodology, could be measured and categorised. It was decided to use a replicable quantitative approach, supplemented by a series of follow‐up qualitative interviews for the main study, updating methods of counting, and increasing the scope to search for source text inconsistencies.

In document Measuring consistency in translation memories: a mixed-methods case study (Page 73-77)