Synthesis lectures on human language technologies,
#39
1947-4059 ;
Includes bibliographical references (pages 123-145).
1. Introduction
2. Quality estimation for MT at subsentence level -- 2.1 Introduction -- 2.2 Applications -- 2.3 Labels -- 2.4 Features -- 2.4.1 Word-level features -- 2.4.2 Phrase-level features -- 2.5 Architectures -- 2.5.1 Non-sequential approaches -- 2.5.2 Sequential approaches -- 2.5.3 APE-based approaches -- 2.6 Evaluation -- 2.7 State-of-the-art results -- 2.7.1 The predictor-estimator approach -- 2.7.2 Unbabel's hybrid approach -- 2.7.3 The APE-based approach
3. Quality estimation for MT at sentence level -- 3.1 Introduction -- 3.2 Applications -- 3.3 Labels -- 3.4 Features -- 3.4.1 Complexity features -- 3.4.2 Fluency features -- 3.4.3 Confidence features -- 3.4.4 Adequacy features -- 3.4.5 Pseudo-reference and back-translation features -- 3.4.6 Linguistically motivated features -- 3.5 Architectures -- 3.6 Evaluation -- 3.7 State-of-the-art results
4. Quality estimation for MT at document level -- 4.1 Introduction -- 4.2 Applications -- 4.3 Labels -- 4.3.1 Labels for evaluating Gisting -- 4.3.2 Labels for measuring post-editing effort -- 4.4 Features -- 4.4.1 Complexity features -- 4.4.2 Fluency features -- 4.4.3 Adequacy features -- 4.4.4 Discourse-aware features -- 4.4.5 Word embedding features -- 4.4.6 Consensus and pseudo-reference features -- 4.5 Architectures -- 4.6 Evaluation -- 4.7 State-of-the-art results -- 4.7.1 Referential translation machines -- 4.7.2 Document embeddings -- 4.7.3 Best post-editing effort and gisting systems
5. Quality estimation for other applications -- 5.1 Text simplification -- 5.1.1 Applications -- 5.1.2 Labels -- 5.1.3 Features -- 5.1.4 Architectures -- 5.1.5 Evaluation -- 5.1.6 State-of-the-art results -- 5.2 Automatic text summarization -- 5.2.1 The summary assessment approach -- 5.2.2 The summary ranking approach -- 5.3 Grammatical error correction -- 5.3.1 The "there's no comparison" approach -- 5.3.2 Fluency and meaning preservation -- 5.4 Automatic speech recognition -- 5.5 Natural language generation -- 5.5.1 The QE ranking approach -- 5.5.2 QE for browse pages
6. Final remarks -- 6.1 Future directions -- 6.2 Resources and toolkits -- Bibliography -- Authors' biographies.
0
8
8
8
8
8
Many applications within natural language processing involve performing text-to-text transformations, i.e., given a text in natural language as input, systems are required to produce a version of this text (e.g., a translation), also in natural language, as output. Automatically evaluating the output of such systems is an important component in developing text-to-text applications. Two approaches have been proposed for this problem: (i) to compare the system outputs against one or more reference outputs using string matching-based evaluation metrics and (ii) to build models based on human feedback to predict the quality of system outputs without reference texts. Despite their popularity, reference-based evaluation metrics are faced with the challenge that multiple good (and bad) quality outputs can be produced by text-to-text approaches for the same input. This variation is very hard to capture, even with multiple reference texts. In addition, reference-based metrics cannot be used in production (e.g., online machine translation systems), when systems are expected to produce outputs for any unseen input. In this book, we focus on the second set of metrics, so-called Quality Estimation (QE) metrics, where the goal is to provide an estimate on how good or reliable the texts produced by an application are without access to gold-standard outputs. QE enables different types of evaluation that can target different types of users and applications. Machine learning techniques are used to build QE models with various types of quality labels and explicit features or learnt representations, which can then predict the quality of unseen system outputs. This book describes the topic of QE for text-to-text applications, covering quality labels, features, algorithms, evaluation, uses, and state-of-the-art approaches. It focuses on machine translation as application, since this represents most of the QE work done to date. It also briefly describes QE for several other applications, including text simplification, text summarization, grammatical error correction, and natural language generation.
9781681733753
Machine translating-- Evaluation.
FOREIGN LANGUAGE STUDY-- Multi-Language Phrasebooks.
LANGUAGE ARTS & DISCIPLINES-- Alphabets & Writing Systems.
LANGUAGE ARTS & DISCIPLINES-- Grammar & Punctuation.
LANGUAGE ARTS & DISCIPLINES-- Linguistics-- General.