Natural Language Processing - Chapter 4: Computational Linguistics
Normalization – To process a corpus into one standard format Lemmatization – To determine the lemma for a given word – To group together the different inflected forms of a word so they can be analyzed as a single item Tokenization – To break a text into words or symbols or phrases or other meaningful – Token
Các file đính kèm theo tài liệu này:
- 4_chapter_computational_linguistics_v1_8914_2009063.pdf