N.L.P.
NATURAL LANGUAGE PROCESSING
Teacher: Lê Ngọc Tấn
Email:
[email protected]
Blog:
Trường Đại học Công nghiệp Tp. HCM
Khoa Công nghệ thông tin
(Faculty of Information Technology)
Chapter 5
Foundation of
Statistical Machine Translation
NLP. p.2
Introduction to
Statistical Machine Translation
Since 1950
DARPA projects
Statistical MT Systems
NLP. p.3
Statistical MT Systems
NLP. p.4
Three Problems in Statistical MT Systems
NLP. p.5
Translation Model
Goal of the translation model
Match foreign input to English output
Statistical modeling and IBM model 4
EM algorithm
Word alignment
Methods of translation such as word - based, phrase -
based and syntax - based
NLP. p.6
Translation Model
The Machine Translation Pyramid
However, the currently best performing SMT systems
are still crawling at the bottom
NLP. p.7
Language model
Goal of the Language Model
Detect good English
What is Good English?
Standard technique : n-grams model (Trigram)
NLP. p.8
Decoding algorithm
Goal of the decoding algorithm
Put models to work, perform the actual translation
Greedy decoder
– Greedy Hill-Climbing [Germann, 2003]
• Start with gloss
• Improve probability with actions
• Use 2-step look-ahead to avoid some local minima
Beam Search Decoding
NLP. p.9
Other Decoding Methods
Finite State Transducers
– Well studied framework, many tools available
Integer Programming [Germann and al., 2001]
For String to Tree Model : Parsing
– Use dynamic programming, similar to chart parsing
– Hypothesis space can be efficiency encoded in forest structure
NLP. p.10
Tools for SMT
GIZA++
Website:
MOSES toolkit
Website:
– SRILM
– IRSTLM
– MOSES
– GIZA++
NLP. p.11
SMT Evaluations
BLEU – BiLingual Evaluation Understanding
Metrics
NIST – National Institute of Standards and Technology
TER – Translation Error Rate
PER – Position-independent Error Rate
NLP. p.12