Difference between revisions of "BSc: Natural Language Processing"
Line 232: | Line 232: | ||
# What is self-attention? |
# What is self-attention? |
||
# Explain the Transformer architecture. |
# Explain the Transformer architecture. |
||
− | |||
− | === The retake exam === |
||
− | '''Section 1''' |
||
− | |||
− | '''Section 2''' |
||
− | |||
− | '''Section 3''' |
||
− | |||
− | '''Section 4''' |
Revision as of 19:08, 10 January 2023
Natural Language Processing
- Course name: Natural Language Processing
- Code discipline:
- Subject area:
Short Description
The course covers classical and modern methods of processing and analyzing natural language texts. It aims to teach fundamental approaches to text analysis, to develop and consolidate skills in working with modern software tools for natural language processing.
Prerequisites
As an undergraduate level course, the students are expected to have basic understanding of probability, linear algebra, programming in python and basics of machine learning.
Prerequisite subjects
- CSE101 — Introduction to Programming I
- CSE202 — Analytical Geometry and Linear Algebra I
- CSE206 — Probability And Statistics
- CSE302 — Introduction to Machine Learning
Prerequisite topics
Course Topics
Section | Topics within the section |
---|---|
Formal foundations of text analysis methods |
|
Classical models of representation and analysis of text and applications |
|
Neural network models for text analysis |
|
Modern models based on the "Transformer" architecture |
|
Intended Learning Outcomes (ILOs)
What is the main purpose of this course?
The course is about the processing and modeling natural languages. In addition to frontal lectures, the flipped classes and student project presentations will be organized. During lab sessions the working language is Python. The primary framework for deep learning is PyTorch. Usage of TensorFlow and Keras is possible, usage of Docker is highly appreciated.
ILOs defined at three levels
Level 1: What concepts should a student know/remember/explain?
By the end of the course, the students should know ...
- Fundamental approaches to text analysis;
- Various natural language processing algorithms;
- Ways to measure the performance of NLP systems;
- Popular software tools for natural language processing.
Level 2: What basic practical skills should a student be able to perform?
By the end of the course, the students should be able to ...
- to describe and explain the difference between formal and natural languages;
- to describe and explain classical methods used for text analysis;
- to describe and explain neural network architectures used for text analysis;
- to describe and explain the difference between different neural network architectures for text analysis;
- to describe and explain modern architectures based on the Transformer.
Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?
By the end of the course, the students should be able to ...
- to apply machine learning methods for solving text processing problems;
- to apply methods for assessing the quality of NLP systems;
- to apply deep learning algorithms for solving text processing problems.
Grading
Course grading range
Grade | Range | Description of performance |
---|---|---|
A. Excellent | 90-100 | - |
B. Good | 75-89 | - |
C. Satisfactory | 60-74 | - |
D. Poor | 0-59 | - |
Course activities and grading breakdown
Activity Type | Percentage of the overall course grade |
---|---|
Final Exam | 30 |
Final project | 30 |
Assignments | 30 |
Lab Participation / Quizzes | 10 |
Recommendations for students on how to succeed in the course
The student is recommended the following scheme of preparation for classes:
- Work out lecture notes.
- Work out the materials of seminars (practical) classes.
- In case of difficulty, formulate questions to the teacher.
To prepare for the classes, it is recommended to use the presented resources and additional literature.
Resources, literature and reference materials
Open access resources
- Handbook Of Natural Language Processing, Second Edition Chapman & Hall Crc Machine Learning & Pattern Recognition 2010
- Clark, Alexander, Chris Fox, and Shalom Lappin, eds. The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
- Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed.)
- Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. – " O'Reilly Media, Inc.", 2019. SECOND EDITION
Additional literature
- Osinga, Douwe. Deep Learning Cookbook: Practical Recipes to Get Started Quickly. O'Reilly Media, 2018.
- Николенко С. Кадурин А., Архангельская Е. Глубокое обучение. – Спб.: Питер, 2018.
- Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
Closed access resources
Software and tools used within the course
Teaching Methodology: Methods, techniques, & activities
Activities and Teaching Methods
Learning Activities | Section 1 | Section 2 | Section 3 |
---|---|---|---|
Development of individual parts of software product code | 1 | 1 | 1 |
Homework and group projects | 1 | 1 | 1 |
Midterm evaluation | 1 | 1 | 1 |
Testing (written or computer based) | 1 | 1 | 1 |
Discussions | 1 | 1 | 1 |
Formative Assessment and Course Activities
Ongoing performance assessment
Section 1
Activity Type | Content | Is Graded? |
---|---|---|
Question | Explain the Chomsky hierarchy | 1 |
Question | What is a language model? | 1 |
Question | What is perplexity in language modeling? | 1 |
Question | What is anti-aliasing and why is anti-aliasing used in language modeling? | 0 |
Section 2
Activity Type | Content | Is Graded? |
---|---|---|
Question | Explain the MaxEnt principle in a modeling language | 1 |
Question | Describe examples of NLP applications and approaches to assessing their quality | 1 |
Section 3
Activity Type | Content | Is Graded? |
---|---|---|
Question | Why use an encoder-decoder model rather than a regular RNN for automatic translation? | 1 |
Question | How to handle variable length input sequences with RNN? | 1 |
Question | What is beam search and why is it used? | 1 |
Question | Describe the components of the encoder-decoder model | 1 |
Section 4
Activity Type | Content | Is Graded? |
---|---|---|
Question | What is the most important layer in the Transformer architecture? What is its purpose? | 1 |
Question | Describe the architecture of BERT and the training process for this model. | 1 |
Question | Describe the architecture of GPT (version 2 or 3) and the training process for this model. | 1 |
Final assessment
Section 1
- Suppose you run a dependency parser (shift-reduce) with a standard arc system for a sentence of length n. How many shift operations are needed?
- Describe the main differences between formal languages (such as logic or programming languages) and natural languages (such as Russian).
- Give an equation to find the most likely sequence of part-of-speech (POS) tags that can be used by a stochastic POS tagger. Assuming a bigram model.
- Explain what is meant by the terms smoothing and backoff in the context of the stochastic POS tagger model.
Section 2
- Suppose you classify text based on a bag of words in a document. The raw input is a single line containing the text of the entire document. Describe in one or two sentences the pipeline from the raw input to the feature vector.
- Suppose you have a neural network that matches the training data exactly. Describe two ways to solve this problem.
- Compare different methods for text processing (decision trees, hidden markov models, support vector machines).
Section 3
- How would you classify different RNN architectures?
- You train a neural network with the Adam optimizer and observe the negative log probability on the training set over epochs. Instead of decreasing, it seems to fluctuate around where it started. What could you add to your training routine to fix this?
- Describe the reason why the negative sampling skipgram model learns faster than the base skipgram model.
Section 4
- Compare BERT architectures with GPT (version 2 or 3). Explain the disadvantages and advantages of each.
- What contextual representations do you know for words?
- What is model distillation? How is it performed?
- Explain the attention mechanism.
- What is self-attention?
- Explain the Transformer architecture.