Natural Language Processing - Chapter 2: Fundamental algorithms and mathematical models

Based on many statistical methods and approaches, the probability theory is applied to solve several issues in NLP: – Determine the co-relation between two different languages – Determine the relation between the text length and the number of words in order to calculate the lexicon diversity of one language – Determine knowledge for the automatic information retrieval and machine translation

14 trang | Chia sẻ: dntpro1256 | Lượt xem: 777 | Lượt tải: 0

Bạn đang xem nội dung tài liệu Natural Language Processing - Chapter 2: Fundamental algorithms and mathematical models, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

N.L.P. NATURAL LANGUAGE PROCESSING  Teacher: Lê Ngọc Tấn  Email: letan.dhcn@gmail.com  Blog: Trường Đại học Công nghiệp Tp. HCM Khoa Công nghệ thông tin (Faculty of Information Technology) Chapter 2 Fundamental algorithms and mathematical models NLP. p.2 Probability Theory and Bayes Theorems  Concepts in probability  Bayes theorems  Application of the probability theory in NLP NLP. p.3 Concepts in Probability (1)  NLP. p.4 Concepts in Probability (2)  NLP. p.5 Concepts in Probability (2)  NLP. p.6 Normal distribution  Normal distribution (the bell curve, Gaussian distribution)  Formula  Properties NLP. p.7 Binomial distribution  Binomial distribution  Formula b(n,k;p) = C(n k) * pk * p(n-k) = [n! / (n-k)! * k!] * pk * p(n-k)  Properties NLP. p.8 Conditional probability distribution  Conditional distribution  Formula p(y/x) = p(x/y) / p(x)  Properties NLP. p.9 Bayes Theorems  What is a Naïve Bayes classifier? – A probabilistic classifier based on applying Bayes’ theorem with strong independence assumptions  The Native Bayes method is used to solve the problem of – Text classification, text categorization – Spelling checking, POS tagging  Bayes formula : P(e/v) = [P(e) * P(v/e)] / P(v) NLP. p.10 HMM – ME models  In 1913, the hidden Markov model (HMM) was created by A.A. Markov (Russia)  The Maximum Entropy Markov models are used to solve named entity recognition  What is an entropy? – A concept which widely used across many scientific disciplines roughly is a measure of disorder. – An entropy can measure the degree of uncertainly of a probabilistic event NLP. p.11 HMM – ME models  The HMM is used successfully in – Speech recognition – POS tagging – NER NLP. p.12 What is Log-linear Model  A mathematical model that takes the form of a function whose logarithm is a first-degree polynomial function of the parameters of the model, which makes it possible to apply linear regression  The standard log-linear model consists of three factors – The phrase translation table – The reordering model – The language model  A popular optimization method for log-linear models is the ME approach NLP. p.13 Application of the probability in NLP  Based on many statistical methods and approaches, the probability theory is applied to solve several issues in NLP: – Determine the co-relation between two different languages – Determine the relation between the text length and the number of words in order to calculate the lexicon diversity of one language – Determine knowledge for the automatic information retrieval and machine translation – NLP. p.14

Các file đính kèm theo tài liệu này:

2_chapter_fundamental_algorithms_and_mathematical_models_v1_8141_2009061.pdf