Natural Language Processing - Chapter 5: Foundation of Statistical Machine Translation

BLEU – BiLingual Evaluation Understanding  Metrics  NIST – National Institute of Standards and Technology  TER – Translation Error Rate  PER – Position-independent Error Rate

pdf12 trang | Chia sẻ: dntpro1256 | Lượt xem: 647 | Lượt tải: 0download
Bạn đang xem nội dung tài liệu Natural Language Processing - Chapter 5: Foundation of Statistical Machine Translation, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
N.L.P. NATURAL LANGUAGE PROCESSING  Teacher: Lê Ngọc Tấn  Email: letan.dhcn@gmail.com  Blog: Trường Đại học Công nghiệp Tp. HCM Khoa Công nghệ thông tin (Faculty of Information Technology) Chapter 5 Foundation of Statistical Machine Translation NLP. p.2 Introduction to Statistical Machine Translation  Since 1950  DARPA projects  Statistical MT Systems NLP. p.3 Statistical MT Systems NLP. p.4 Three Problems in Statistical MT Systems  NLP. p.5 Translation Model  Goal of the translation model Match foreign input to English output  Statistical modeling and IBM model 4  EM algorithm  Word alignment  Methods of translation such as word - based, phrase - based and syntax - based NLP. p.6 Translation Model  The Machine Translation Pyramid  However, the currently best performing SMT systems are still crawling at the bottom NLP. p.7 Language model  Goal of the Language Model Detect good English  What is Good English?  Standard technique : n-grams model (Trigram) NLP. p.8 Decoding algorithm  Goal of the decoding algorithm Put models to work, perform the actual translation  Greedy decoder – Greedy Hill-Climbing [Germann, 2003] • Start with gloss • Improve probability with actions • Use 2-step look-ahead to avoid some local minima  Beam Search Decoding NLP. p.9 Other Decoding Methods  Finite State Transducers – Well studied framework, many tools available  Integer Programming [Germann and al., 2001]  For String to Tree Model : Parsing – Use dynamic programming, similar to chart parsing – Hypothesis space can be efficiency encoded in forest structure NLP. p.10 Tools for SMT  GIZA++ Website:  MOSES toolkit Website: – SRILM – IRSTLM – MOSES – GIZA++ NLP. p.11 SMT Evaluations  BLEU – BiLingual Evaluation Understanding  Metrics  NIST – National Institute of Standards and Technology  TER – Translation Error Rate  PER – Position-independent Error Rate  NLP. p.12

Các file đính kèm theo tài liệu này:

  • pdf5_chapter_foundation_of_smt_v1_2744_2009064.pdf