Concrete Example without Formulas – MQM (Multidimensional Quality Metrics)

Stage 1: Preliminary Stage

Stage 1 consists of five elements that prepare the translation for annotation and determine values by which the annotation will be evaluated. These elements do not necessarily need to be implemented in a specific order but typically include the following elements:

Reviewing the agreed on translation specifications: Translation specifications provide details about a translation project that are used during the translation process and requirements by which a translation is to be checked during evaluation. Basic specifications include the type of text involved, the intended audience, and the purpose of the target text. Evaluators must review the agreed-on specifications and have them accessible during evaluation.

Verifying (or selecting/creating) a metric: Often, the metric by which the evaluation will be performed is created by translation stakeholders prior to evaluation and retained for reuse. In such cases, the evaluator verifies the metric before continuing to the other tasks.

In the instances in which a metric is not already available, evaluators must select or create a metric that is compatible with the translation specifications. The creation of a metric involves the selection of a subset of translation error types. Later, during Stage 2, evaluators will assign an error type from the metric to each error instance they identify.
In the example used here, there are three severity levels that impact the evaluation: minor, major, and critical. Severity levels and their associated penalty points are applied in Stage 2.
In the MQM Error Typology, the top-level error types are called dimensions because they refer to different dimensions of translation quality. Under each of these seven dimensions, there are a certain number of error subtypes. Evaluators need training guidance in distinguishing between different kinds of error types and severity levels, both within the same dimension and across dimensions.

The following dimensions and error subtypes were selected for the Elections metric:

Accuracy
- Mistranslations
- Addition
- Omission
Linguistic Conventions in the Target Text
- Grammar
- Punctuation
- Spelling
Style
- Organizational style
- Unidiomatic style
- Awkward style

Assigning the Threshold Value: At some point, translation stakeholders also set (or verify, if established beforehand) their quality score threshold, ideally based on careful analysis of previous translation product performance and risk assessment. The Threshold Value will determine whether a translation product receives a “pass” rating if the Overall Quality Score is equal to or above the Threshold Value. Similarly, a translation product will receive a “fail” rating if the Overall Quality Score is below the Threshold Value. Like the translation specifications and metric, the Threshold Value is often set at the beginning of a translation project by stakeholders. The Threshold Value for the Elections project was set at 80, but this value should not be viewed as a recommendation for other projects.

Preparing the source text and target text for evaluation: This stage assumes that the source text and target text are segmented and aligned into segment pairs, often called translation units (TUs). Usually, TUs have already been segmented and aligned by a CAT tool. Below are the source text and target texts for the Elections text, divided into TUs:

TU	Source Text (French)	Target Text (English
1	Le Parlement européen a adopté, le 11 novembre 2015, une résolution sur la réforme de la loi électorale de l’Union européenne.	On November 11th, 2015, the European Parliament adopted a resolution on the reform of the laws of the European Union.
2	Plusieurs principes ont alors été retenus:	Several principles were retained:
3	(1) l’organisation des élections sur la base d’un scrutin de liste ou d’un vote unique transférable de type proportionnel;	(1) conducting elections on the basis of proportional representation; using a list system or a single transferable ballot system;
4	(2) la suppression du cumul de tout mandat national avec celui de député européen;	(2) prohibiting the cumulation of any national office with one as Member of the European Parliament;
5	(3) la liberté pour les États membres de constituer des circonscriptions au niveau national;	(3) upholding the freedom of Member States to draw up constituencies at national level;

Determining the Evaluation Word Count: the final element in Stage 1 is determining the word count for the source text, which is usually generated by a CAT tool. The number of words in the source text is known as the Evaluation Word Count, which will be used later in Stage 3. On long term projects, Evaluation Word Count is often established during the quality planning stage by translation stakeholders and is integrated during this Stage. In the current draft of ISO 5060, another TQE standard, the Evaluation Word Count is based on the target text rather than the source text. A discussion of whether the Evaluation Word Count should be based on the source text or the target text is beyond the scope of this Introduction.

The Evaluation Word Count for the Elections text is 74. Although the Evaluation Word Count of the Elections text is much smaller than a sample that most translation stakeholders might choose to evaluate, the shorter text allows for its use as a manageable example in this document.

Stage 2: Error Annotation Stage

Building on the completion of tasks in Stage 1, annotation of individual errors in the translation can begin. Typically, errors are recorded in a scorecard. There are several types of scorecards, ranging from Excel spreadsheets to specialized online applications to features of CAT tools.

During the error annotation stage, evaluators check the target text against the source text to examine whether the content of source and target texts correspond adequately in terms of terminology, accuracy, and other specified translation quality dimensions, based on the translation specifications.

When an evaluator annotates an error, they identify the error, mark its location in the target text, assign it an error type and severity level, and optionally comment on the error. Penalty points are assigned to each error instance.

There are many ways to assign penalty points. In this example, the penalty points for an error instance are determined solely by the severity level. All error types have the same weight.

While an evaluator conducts their evaluation, the scorecard can begin calculating the Absolute Penalty Total by adding up the penalty points assigned to all the errors identified so far during annotation.

Below is the annotation record for the Elections text and a prose explanation discussing the errors previously annotated by an evaluator in the context of the European Parliament:

TU 1

Le Parlement européen a adopté, le 11 novembre 2015, une résolution sur la réforme de la loi électorale de l’Union européenne.

On November 11th, 2015, the European Parliament adopted a resolution on the reform of the laws of the European Union.

In TU 1, there are two errors. The first is a minor Organizational Style error, as the style guide for this project requires that the date be formatted “11 November 2015”, not “November 11th, 2015”. The second error in this segment is a major Omission error because the target text omits the word “electoral” when describing the types of laws mentioned in the EU resolution. There is also the question of whether the English text should read “electoral laws of the European Union” or “European Union electoral law”, but that has not been annotated in this example.

TU 2

Plusieurs principes ont alors été retenus:

Several principles were retained:

The error marked in TU 2 is a major Mistranslation error, noting that “retained” is not an acceptable translation of “retenus”. It is a false cognate. In this context, an acceptable target word would be “included”.

TU 3

(1) l’organisation des élections sur la base d’un scrutin de liste ou d’un vote unique transférable de type proportionnel;

(1) conducting elections on the basis of proportional representation; using a list system or a single transferable ballot system;

TU 3 presents a minor Punctuation error. The semicolon after “representation” should have been a comma.

TU 4

(2) la suppression du cumul de tout mandat national avec celui de député européen;

(2) prohibiting the cumulation of any national office with one as Member of the European Parliament;

Although it accurately conveys the intended meaning of the source text in TU 4, the target text is unwieldy because of the literal word-for-word translation of the source material. This error was marked as a major Unidiomatic Style error by the evaluator who offered an idiomatic alternative: “prohibiting individuals from holding office as a member of a national parliament and as a Member of the European Parliament at the same time”.

TU 5

(3) la liberté pour les États membres de constituer des circonscriptions au niveau
national;

(3) upholding the freedom of Member States to draw up constituencies at national level;

In TU 5, the evaluator determined that the use of the phrase “draw up” was a minor Awkward Style error because of the lack of clarity in this target text option, noting that “establish” would serve as a better alternative.

This annotation of the Elections text yields an Absolute Penalty Total of 18. How this number was obtained will be explained in the next section, where formulas are included.

Stage 3: Automatic Calculation & Follow-Up Stage

As noted, in practice, a scorecard typically calculates the Absolute Penalty Total automatically and incrementally during annotation. This total is used along with the Evaluation Word Count to calculate the Overall Quality Score. During the final stage, this Overall Quality Score is then compared against the Threshold Value to determine a pass/fail rating.

To receive a “pass” rating, a translation product must, first, have no critical errors, based on the definition that any critical error in the translation product will typically render the text unusable for the purpose established in the specifications, and therefore is automatically labeled with a “fail” rating. Second, the translation product must achieve an Overall Quality Score greater than or equal to the established Threshold Value for a given project.

The annotation for the Elections text cited above results in an Overall Quality Score value of 75.7. Since the Overall Quality Score of 75.7 is below the Threshold Value of 80, this evaluation would assign a “fail” rating to the Elections test.

Once annotation is complete and a rating has been assigned, translation stakeholders can take action in response to an evaluation. Often, the TQE workflow has already outlined the purpose of the evaluation. If the evaluation task is being used to judge the suitability of the translated text to meet target audience needs, the one thing that they must do is decide to accept or reject the translation. However, if the annotation is being used for product and process assessment and feedback for system improvement, action is at the discretion of the stakeholders.

Either way (being used for evaluation or assessment), stakeholders have several other options, such as the ones listed below:

Decide whether someone will make corrections in the target text.
Identify root causes of the translation errors.
Identify and implement process improvements to reduce the likelihood of the same errors in future translations.

Additionally, stakeholders must decide whether and with whom to share details of the evaluation, taking into account privacy agreements and fairness principles.

Stakeholders can also establish other action items depending on their needs.