Natural Language Processing

Course name: Natural Language Processing
Code discipline:
Subject area:

Short Description

The course covers classical and modern methods of processing and analyzing natural language texts. It aims to teach fundamental approaches to text analysis, to develop and consolidate skills in working with modern software tools for natural language processing.

Prerequisites

As an undergraduate level course, the students are expected to have basic understanding of probability, linear algebra, programming in python and basics of machine learning.

Prerequisite subjects

CSE101 — Introduction to Programming I
CSE202 — Analytical Geometry and Linear Algebra I
CSE206 — Probability And Statistics
CSE302 — Introduction to Machine Learning

Prerequisite topics

Course Topics

Course Sections and Topics
Section	Topics within the section
Formal foundations of text analysis methods	Fundamentals of the theory of formal languages Statistical language modeling Theory of parsing
Classical models of representation and analysis of text and applications	Models Based on Entropy Maximization (MaxEnt) Decision trees in text processing. Markov models. Support Vector Machines in Text Classification Problems Applications: information extraction, question-answer systems, text generation, machine translation Quality assessment of NLP systems
Neural network models for text analysis	Vector representations of words. word2vec model, contextual vector representations Architectures based on convolutional networks Architectures based on recurrent networks Encoder-decoder architecture Attention mechanism
Modern models based on the "Transformer" architecture	Architecture "Transformer" Self-attention mechanism Pre-trained language models. BERT. GPT

Intended Learning Outcomes (ILOs)

What is the main purpose of this course?

The course is about the processing and modeling natural languages. In addition to frontal lectures, the flipped classes and student project presentations will be organized. During lab sessions the working language is Python. The primary framework for deep learning is PyTorch. Usage of TensorFlow and Keras is possible, usage of Docker is highly appreciated.

ILOs defined at three levels

Level 1: What concepts should a student know/remember/explain?

By the end of the course, the students should know ...

Fundamental approaches to text analysis;
Various natural language processing algorithms;
Ways to measure the performance of NLP systems;
Popular software tools for natural language processing.

Level 2: What basic practical skills should a student be able to perform?

By the end of the course, the students should be able to ...

to describe and explain the difference between formal and natural languages;
to describe and explain classical methods used for text analysis;
to describe and explain neural network architectures used for text analysis;
to describe and explain the difference between different neural network architectures for text analysis;
to describe and explain modern architectures based on the Transformer.

Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?

By the end of the course, the students should be able to ...

to apply machine learning methods for solving text processing problems;
to apply methods for assessing the quality of NLP systems;
to apply deep learning algorithms for solving text processing problems.

Grading

Course grading range


Grade	Range	Description of performance
A. Excellent	90-100	-
B. Good	75-89	-
C. Satisfactory	60-74	-
D. Poor	0-59	-

Course activities and grading breakdown


Activity Type	Percentage of the overall course grade
Final Exam	30
Final project	30
Assignments	30
Lab Participation / Quizzes	10

Recommendations for students on how to succeed in the course

The student is recommended the following scheme of preparation for classes:

Work out lecture notes.
Work out the materials of seminars (practical) classes.
In case of difficulty, formulate questions to the teacher.

To prepare for the classes, it is recommended to use the presented resources and additional literature.

Resources, literature and reference materials

Open access resources

Handbook Of Natural Language Processing, Second Edition Chapman & Hall Crc Machine Learning & Pattern Recognition 2010
Clark, Alexander, Chris Fox, and Shalom Lappin, eds. The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed.)
Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. – " O'Reilly Media, Inc.", 2019. SECOND EDITION

Additional literature

Osinga, Douwe. Deep Learning Cookbook: Practical Recipes to Get Started Quickly. O'Reilly Media, 2018.
Николенко С. Кадурин А., Архангельская Е. Глубокое обучение. – Спб.: Питер, 2018.
Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing

Closed access resources

Software and tools used within the course

Teaching Methodology: Methods, techniques, & activities

Activities and Teaching Methods

Activities within each section
Learning Activities	Section 1	Section 2	Section 3
Development of individual parts of software product code	1	1	1
Homework and group projects	1	1	1
Midterm evaluation	1	1	1
Testing (written or computer based)	1	1	1
Discussions	1	1	1

Formative Assessment and Course Activities

Ongoing performance assessment

Section 1


Activity Type	Content	Is Graded?
Question	Explain the Chomsky hierarchy	1
Question	What is a language model?	1
Question	What is perplexity in language modeling?	1
Question	What is smoothing and why is smoothing used in language modeling?	0

Section 2


Activity Type	Content	Is Graded?
Question	Explain the MaxEnt principle in a modeling language	1
Question	Describe examples of NLP applications and approaches to assessing their quality	1

Section 3


Activity Type	Content	Is Graded?
Question	Why use an encoder-decoder model rather than a regular RNN for automatic translation?	1
Question	How to handle variable length input sequences with RNN?	1
Question	What is beam search and why is it used?	1
Question	Describe the components of the encoder-decoder model	1

Section 4


Activity Type	Content	Is Graded?
Question	What is the most important layer in the Transformer architecture? What is its purpose?	1
Question	Describe the architecture of BERT and the training process for this model.	1
Question	Describe the architecture of GPT (version 2 or 3) and the training process for this model.	1

Final assessment

Section 1

Suppose you run a dependency parser (shift-reduce) with a standard arc system for a sentence of length n. How many shift operations are needed?
Describe the main differences between formal languages (such as logic or programming languages) and natural languages (such as Russian).
Give an equation to find the most likely sequence of part-of-speech (POS) tags that can be used by a stochastic POS tagger. Assuming a bigram model.
Explain what is meant by the terms smoothing and backoff in the context of the stochastic POS tagger model.

Section 2

Suppose you classify text based on a bag of words in a document. The raw input is a single line containing the text of the entire document. Describe in one or two sentences the pipeline from the raw input to the feature vector.
Suppose you have a neural network that matches the training data exactly. Describe two ways to solve this problem.
Compare different methods for text processing (decision trees, hidden markov models, support vector machines).

Section 3

How would you classify different RNN architectures?
You train a neural network with the Adam optimizer and observe the negative log probability on the training set over epochs. Instead of decreasing, it seems to fluctuate around where it started. What could you add to your training routine to fix this?
Describe the reason why the negative sampling skipgram model learns faster than the base skipgram model.

Section 4

Compare BERT architectures with GPT (version 2 or 3). Explain the disadvantages and advantages of each.
What contextual representations do you know for words?
What is model distillation? How is it performed?
Explain the attention mechanism.
What is self-attention?
Explain the Transformer architecture.

BSc: Natural Language Processing

Contents