Values and Scores

Figure 1: Scorecard with Top-Level MQM Error Types

The scorecard can be confusing on first glance. It is important to understand the different values entered on the scorecard, how an evaluator uses the card, and how the math functions produce the Overall Quality Score.

Scorecard Model Calculation Values

The following values play major roles in the translation quality scorecards. The abbreviations listed here are sometimes used when discussing formal equations.

Evaluation Word Count / EWC

The evaluation word count is the word count of the text chosen for evaluation. The EWC can include the complete text or just a sample selection of text segments. EWC is used in the calculation of the Overall Quality Score (OQS). The word count according to the ASTM standard is usually based on the source content.
Note: ISO 5060 cites the option to use character counts instead of word counts, or to use line counts that assume uniform characters per line. These approaches accommodate languages that sometimes have dramatically different word counts. ISO 5060 also focuses on count values for target language content counts.

Reference Word Count / RWC

The reference word count is an “arbitrary number of words” in a hypothetical reference evaluation text. This uniform word count is used so implementers can compare results across different projects. It is often set at 1000 because this value simplifies the process of establishing meaningful scores that are comparable and look like familiar percentages. Cell C17 in the scorecard in Figure 1 exhibits the value for the reference word count.

Maximum Score Value / MSV

The maximum score value of 100 is also an arbitrary value designed to manipulate the overall quality score to shift its value into a range which is easier to understand because it converts the score to a percentage-like value. Cell C19 in the scorecard shown in Figure 1 shows this value for the maximum score value.

The MQM-Core Error Typology lists and classifies different kinds of errors, which are identified by error type names and definitions, and are organized hierarchically.
In the scorecard, error type names are listed in the Error Types column (Column B) and associated with Error Type Numbers (ET Nos).

The Error Type Numbers that appear in the first column of the scorecard are row numbers specific to each individual scorecard design. They have no other mnemonic reference and will probably vary from scorecard to scorecard depending on the metric design and the scoring model.

Once evaluators have identified an error instance, they assign an error type to the error instance and tally it in the scorecard cell for the appropriate severity level. The common scorecard example used in the standard features four levels: neutral, minor, major, and critical (cells C3–F3), which in this example have been pre-assigned to the values of 0, 1, 5, and 25. Other options are shown on the Scorecards Options page.

Severity penalty multipliers listed in Cells C4-F4 of Figure 1 for each of the severity levels are penalty values assigned to severity levels as shown in Row 3 of the scorecard in Figure 1. When an error instance is assigned to an error type and severity level, the severity penalty multiplier for that column is automatically multiplied times the number of errors recorded in that cell. If Error Type Weights (Column G) are included in a scorecard, the product of each multiplication is then multiplied by its respective error type weight. The sum of these values contributes to a factor that is then subtracted from the maximum score. Penalty point multipliers and error type penalty totals are both sometimes referred to as Penalty Points.

Assignment of an error instance to the neutral severity level (Column C) indicates that the evaluator considers that a different solution is warranted, but that the translator should not be penalized for an error. For instance, the root cause may be beyond the translator’s control, a termbase may have been incorrect or missing, the evaluator’s suggested change is only preferential, or the severity of the error does not warrant even minor severity.
This value can be used to flag items for fine-tuning feedback purposes. Implementers can also choose to use the “Neutral” designation to mark repeated errors, if these are only counted once (see https://themqm.org/resources/scorecards/repeatederrors/).

Assignment of an error instance to the minor severity level (Column D) indicates that an error instance has a limited impact on, for example, accuracy, stylistic quality, consistency, fluency, clarity, or general appeal of the content, but it does not seriously impede the usability, understandability, or reliability of the content for its intended purpose.

Assignment of an error instance to the major severity level (Column E) indicates that an error instance seriously affects the understandability, reliability, or usability of the content for its intended purpose or hinders the proper use of the product or service, for instance due to a significant loss or change in meaning or because the error appears in a highly visible or important part of the content.

Assignment of an error instance to the critical severity level(Column F) renders the entire content unfit for purpose or poses the risk for serious physical, financial, or reputational harm.

The error count appearing in any given cell (C6–F13 in Figure 1) reflects the total number of instances of that individual error type or subtype assigned to a given error severity level for a given translation evaluation.
The scorecard automatically multiplies the error count for each cell by its respective severity penalty multiplier to produce an intermediate product, which may or may not be displayed in a separate column in the scorecard.

The intermediate products described in the preceding entry are then added up for each individual error type row to yield the sum of the products of individual error counts associated with that error type, called the Error Type Penalty Total, displayed in Figure 1, cell range H6-H13.

Scores and Totals

The sum of all error type penalty totals for a given translation evaluation project yields the Absolute Penalty Total (Cell H14), which is used to calculate the other quality measures in the scoring model.

The per-word penalty total (Cell H16) is determined by dividing the absolute penalty total by the evaluation word count.

The first quality measure of translation product is the Overall Normed Penalty Total (Cell H17). It represents the per-word error penalty total relative to the reference word count, which for this example is determined by multiplying the per-word penalty total times 1000.

The Overall Quality Fraction (Cell H18), which can be used as a factor in calculating other scores, is determined by dividing the absolute penalty score by the evaluation word count.

The primary quality measure of a translation product, the Overall Quality Score (H19), is most easily determined by multiplying the per-word penalty score by the maximum score value, usually 100, and subtracting this value from 1. This process manipulates the score so that it will resemble a percentage value.

The Threshold Value (H20) is a percentage-like value predetermined on the basis of stakeholder needs and expectations, text or content type, and practical experience. It is used as the value established for the lower limit against which the quality of a translation is measured and assigned a quality rating (Pass/Fail rating).

The Pass/Fail assessment decision to accept or reject an evaluated translation is based on the absence of critical errors and an Overall Quality Score equal to or exceeding the Threshold Value.

The Critical Error Count comprises the number of critical errors tallied during the annotation phase, whereby any value of one or more automatically triggers a Fail Rating, regardless of other score values.

Adjustment Values

Figure 1 shows a value of 1.0 in each Error Type row, which of course has no effect on the value of the ETPT. The function of the ETW is to weight individual error types more or less severely, depending on project specifications. For instance, in a demanding advertising text, the error type Style might be considered to be so important that errors in style are weighted with a value of 1.1 or some other higher value in order to increase the impact of style errors in the final score calculation. In situations where the ETW is always 1 for all error types, this column could be eliminated.

The scorecard in Figure 1 indicates the value of 1 for a Scaling Parameter (C18). This is a real number value that is used as a multiplier to differentiate between overall normed penalty total values for different text types or projects with significantly different specifications. This value does not appear in any calculation row or column, but it is one factor in the equation used to calculate the Overall Quality Score. If there is no need or desire to distinguish the scores for various text types or projects, this value can be omitted.

When the scorecard is used in contexts where it is desirable to display equations and equation values, abbreviated forms can be displayed in the scorecard, as shown in Figure 2.

Figure 2: Scorecard with Top-Level MQM Error Types, showing abbreviations and equations

The role of evaluators and the math functions of the scorecard are spelled out in the
Detailed Process page.