About us

Is MQM 2.0 complete?

As of July 2021, MQM 2.0 is under development. Some components, such as the error typology, are mostly stable and widely implemented. Others are under active development and subject to change. Development is ongoing in the ASTM WK46396 committee and will contribute to the resulting standard.

Is MQM for evaluating human translation or unedited MT?

MQM was designed to be useful in evaluating any type of unedited machine translation, postedited MT, and human translation of all kinds. The intention in creating it was to allow implementers to compare both MT and human translation on an equal and unbiased basis and to avoid the subjectivity of many current ranking approaches by focusing on specific errors. It was also intended to allow comparison of the types of errors that occur in both types of translation.

Should I use TAUS DQF instead of MQM 2.0?

If you’re using the TAUS DQF error typology, you are already using MQM 2.0! DQF uses a subset of MQM that was designed to reflect the common needs of commercial translation and localization activities, so if you use DQF you are already in the MQM world. In fact, the MQM committee recommends that most implementers use the DQF subset, which has been extensively adopted by developers and language industry stakeholders. If it does not meet your needs, you can draw on the additional error types in the full MQM error typology.

MQM is so complex. Do you really expect people to use so many categories?

Absolutely not. MQM is a modular system that allows you to use just a few categories or as many as you need. For many production environments, the top-level “dimensions” may be sufficient, but if you are involved with testing systems or providing detailed feedback to translators, you may want to drill down and use more error types, such as those available through DQF. 

If I'm using MQM 1.0, what do I need to do to update to the new version?

With the introduction of root cause processing, the Internationalization dimension was deprecated. In addition, there are a few changes to the error typology, such as the removal of “Improper exact TM match” as an error type. Consult the error typology for further details.

Is MQM just for translated texts?

It can also be used to evaluate source texts to report problems in them, although some of the error types – most notably those in the Accuracy dimension and some in the Terminology dimension – do not apply to this activity.

 

Can MQM be used to evaluate the quality of translation?

Yes, although translation specifications are particularly important in evaluating transcreation. If the output is based on a source text, MQM applies, but in cases that are more creative and less tied to a source text, some error types may not apply.

Can MQM be used for evaluating translators, evaluators, or language service providers?

Yes, although it focuses on the translations they produce or on the performance of evaluators. It cannot address other aspects that are beyond the scope of errors in content, such as the professionalism or timeliness of delivery. It also cannot evaluate processes directly, although the results may suggest ways to improve them.

 

What is the difference between analytic and holistic translation quality evaluation

Analytic TQE focuses on the identification of specific errors, error type classification, severity level assignment, and (optionally) root cause analysis. By contrast, holistic TQE considers the text as a whole on the basis of one or more high-level criteria, rather than on identifying specific errors.

 

Why would anyone use MQM when it is so expensive to examine every translation in detail?

It is not expected that implementers evaluate all of their translated content with MQM. Most implementers evaluate a sample of the text and, as they establish processes that are capable of delivering quality results, they are able to scale this effort back considerably. If you implement MQM correctly as part of a systematic quality management program, the evaluation effort will decline over time.

What is the difference between MQM and automated MT evaluation metrics such as BLEU or METEOR?

MQM does require more effort than automated MT evaluation metrics, but it provides valuable information on why translations obtain the scores they do and actionable guidance on how to improve them. By contrast, automated metrics typically provide just a number with no indication of how to improve outcomes or why translations obtained the scores they did. They are also not usable for production environments because they require human reference translations. More recently, quality estimation tools, such as QuEst++, have shown promise for cases where references are not available, but human evaluation remains the state of the art. BLEU and METEOR scores also cannot provide actionable guidance on the suitability of output for any purpose. By contrast, MQM provides actionable guidance.

What is the difference between "analytic TQE" and LQA?

The acronym “LQA” is often used inconsistently across the industry and may refer to a variety of activities related to localized content. If it is used to refer to examining translated content and identifying errors, it is the same thing as analytic translation quality evaluation.

What is root cause processing (a new feature in MQM 2.0)?

MQM 2.0 defines a typology of translation root causes. Root causes are broken into two top-level categories. Pre-translation root causes refer to problems outside of the translator’s control. Production root causes refer to human errors introduced by translators, revisors, and reviewers.

Root causes as identified by reviewers can be used to organize and filter errors that are considered during quality analysis. Identifying patterns or root causes serve as the basis for continuous improvement of translation quality processes.

Can implementers use MQM to provide feedback on the source content?

Root cause processing provides a mechanism for providing feedback on problems in the source content. This allows implementers to drive quality upstream to the source.

What is the difference between the LISA QA Model and MQM?

The LISA QA Model was a framework and software product for evaluating quality using a single metric. It was widely implemented in the translation industry but was not modular. MQM builds on the legacy of the LISA QA Model but updates it to support multiple metric definitions, each associated with a set of translation specifications. Users of the LISA QA Model, which is no longer maintained, are encouraged to update their processes to use MQM.

Can you use MQM for multimedia?

Yes. As of July 2021, the committee is adding error types for the errors that can be found in multimedia translation contexts.

 

Can you use MQM for spoken language services?

Not unless the spoken language is converted to written form. For example, some implementers have used MQM to evaluate the accuracy of speech-to-text services such as auto-transcription.

How can I influence the development of the forthcoming ASTM standard described in the homepage of this website?

If you wish to be involved, please contact Arle Lommel (arle.lommel@gmail.com). You can either provide feedback on MQM or join the committee that is responsible for MQM.