Repeated Errors – MQM (Multidimensional Quality Metrics)

There are frequently situations where the same error occurs several times throughout evaluated sample (e.g. an incorrect capitalization, a wrong term, a misspelled name). Such error instances are often referred to as “repeated errors”.

There are two ways of dealing with such errors. Either you treat each error on its own merits and penalize each instance of the error, or you penalize them only once, at the first occurrence of the repeated error.

There are a number of arguments to justify both approaches.

However, whether you apply one method or the other has an obvious impact on quality measurements. Therefore, during the preparatory stage, when designing the metric, it is important that implementers decide how to handle repeated errors, and that this decision is made clear to the evaluators.

Scenario 1

Implementers decide that the concept of repeated errors shall not be taken into account at all, and that all errors detected in the evaluated content are assigned the corresponding error type and the appropriate severity level. This means that every error instance is penalized and contributes to the final score calculation.

Scenario 2

Implementers decide that the concept of repeated errors shall be taken into consideration. While each instance of the detected error is typically annotated for correction or follow-up purposes, only the first instance of the error is penalized and contributes to the calculation of the final quality score. This scenario requires that the implementers take some further decision.

Whether the concept of repeated errors applies to identical segments, strings, or sentences only, or whether it applies to the same type of error regardless of whether it appears in identical or different segments, strings, or sentences.
Which types of errors are eligible for being treated as repeated errors and which are not. Examples of errors that would typically qualify as repeated errors: no approved glossary is available and the translator consistently uses a term that is not the one preferred by the buyer/client/product producer; or when the source is ambiguous or lacks context and the translator’s consistent interpretation of the source content is therefore not entirely accurate. Examples of errors that would typically not qualify as repeated errors are errors resulting from negligence or from violations of grammar rules (typos, punctuation) or ignoring style guide or project-related instructions.
How to record the repeated errors: Repeated minor errors can be annotated once, with the severity level minor(with the argument that correcting the errors throughout the text is a quick search-and-replace action). Or repeated minor errors can be recorded once using the severity level major (with the argument that correcting several occurrences is a bigger nuisance than just one). The instances of the error that are not annotated as penalizing can be assigned neutral severity so that they do not contribute to the penalty score, with a label “repeated errors”, to ensure traceability.

Even though there are several options of how to treat repeated errors, the logic behind the choice of the scenario should take the evaluation goal and use-case into account:

If repeated errors are not counted, then metrics answers the question “how many unique errors were made”. This approach is more suitable for pre-delivery evaluation of a human translated content when the evaluation also serves as feedback to a translator. The translator is often requested to implement the feedback, which means they have to correct every individual instance of the repeated error anyways.

For quality evaluation of the final product though, it normally makes sense to count all occurrences of repeated errors. In the final delivery, from the user’s perspective, any error is an error, regardless of how many time it appears.

Moreover, it is taken for granted that repeated errors are corrected, regardless of how the quality evaluation metric is defined. For example, as a matter of a good data management practice since MT engines cannot learn from feedback, they only can learn from training data.

The concept of repeated errors (problems) has proved to be very useful for the identification of repetitive errors in MT output. Identified repetitive errors in MT output can be addressed using rules applied in pre- and post-processing automations, and eliminating certain the error in training data would be cost prohibitive. On the other hand, repeated errors can be useful in designing pre- and post-processing automation. But generally repeated errors are counted in TQE of final deliverables, as a measure of final quality.

The information and the examples in this section do not pretend to be exhaustive. There are a wealth of use cases and practices out there. This section just aims to explain the basics of the concept of repeated errors and encourages implementers to consider how to treat repeated errors for their specific use cases in their specific implementation scenarios.

Training material and evaluation instructions should provide guidance not only on how to categorize errors with regard to error types and severity levels, but also on how to handle repeated errors, ideally with examples, to ensure consistent evaluation.