Difference between revisions of "BSc: Natural Language Processing"

From IU
Jump to navigation Jump to search
 
(6 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  +
 
= Natural Language Processing =
 
= Natural Language Processing =
  +
* '''Course name''': Natural Language Processing
  +
* '''Code discipline''':
  +
* '''Subject area''':
   
  +
== Short Description ==
* <span>'''Course name:'''</span> Natural Language Processing
 
  +
The course covers classical and modern methods of processing and analyzing natural language texts. It aims to teach fundamental approaches to text analysis, to develop and consolidate skills in working with modern software tools for natural language processing.
* <span>'''Course number:'''</span> XYZ
 
* <span>'''Subject area:'''</span> Programming Languages and Software Engineering
 
* <span>'''Version:'''</span> Spring 2022
 
   
  +
== Prerequisites ==
   
  +
As an undergraduate level course, the students are expected to have basic understanding of
== Course characteristics ==
 
  +
probability, linear algebra, programming in python and basics of machine learning.
   
=== Key concepts of the class ===
+
=== Prerequisite subjects ===
  +
* CSE101 — Introduction to Programming I
  +
* CSE202 — Analytical Geometry and Linear Algebra I
  +
* CSE206 — Probability And Statistics
  +
* CSE302 — Introduction to Machine Learning
   
  +
=== Prerequisite topics ===
*
 
*
 
*
 
*
 
*
 
*
 
*
 
*
 
*
 
   
  +
== Course Topics ==
=== What is the purpose of this course? ===
 
  +
{| class="wikitable"
  +
|+ Course Sections and Topics
  +
|-
  +
! Section !! Topics within the section
  +
|-
  +
| Formal foundations of text analysis methods ||
  +
# Fundamentals of the theory of formal languages
  +
# Statistical language modeling
  +
# Theory of parsing
  +
|-
  +
| Classical models of representation and analysis of text and applications ||
  +
# Models Based on Entropy Maximization (MaxEnt)
  +
# Decision trees in text processing. Markov models. Support Vector Machines in Text Classification Problems
  +
# Applications: information extraction, question-answer systems, text generation, machine translation
  +
# Quality assessment of NLP systems
  +
|-
  +
| Neural network models for text analysis ||
  +
# Vector representations of words. word2vec model, contextual vector representations
  +
# Architectures based on convolutional networks
  +
# Architectures based on recurrent networks
  +
# Encoder-decoder architecture
  +
# Attention mechanism
  +
|-
  +
| Modern models based on the "Transformer" architecture ||
  +
# Architecture "Transformer"
  +
# Self-attention mechanism
  +
# Pre-trained language models. BERT. GPT
  +
|}
  +
== Intended Learning Outcomes (ILOs) ==
   
  +
=== What is the main purpose of this course? ===
The Natural Language Processing course teaches
 
  +
The course is about the processing and modeling natural languages. In addition to frontal lectures, the flipped classes and student project presentations will be organized. During lab sessions the working language is Python. The primary framework for deep learning is PyTorch. Usage of TensorFlow and Keras is possible, usage of Docker is highly appreciated.
   
=== - What should a student remember at the end of the course? ===
+
=== ILOs defined at three levels ===
   
  +
==== Level 1: What concepts should a student know/remember/explain? ====
*
 
  +
By the end of the course, the students should know ...
*
 
  +
* Fundamental approaches to text analysis;
  +
* Various natural language processing algorithms;
  +
* Ways to measure the performance of NLP systems;
  +
* Popular software tools for natural language processing.
   
  +
==== Level 2: What basic practical skills should a student be able to perform? ====
  +
By the end of the course, the students should be able to ...
  +
* to describe and explain the difference between formal and natural languages;
  +
* to describe and explain classical methods used for text analysis;
  +
* to describe and explain neural network architectures used for text analysis;
  +
* to describe and explain the difference between different neural network architectures for text analysis;
  +
* to describe and explain modern architectures based on the Transformer.
   
=== - What should a student be able to understand at the end of the course? ===
+
==== Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? ====
  +
By the end of the course, the students should be able to ...
  +
* to apply machine learning methods for solving text processing problems;
  +
* to apply methods for assessing the quality of NLP systems;
  +
* to apply deep learning algorithms for solving text processing problems.
   
  +
== Grading ==
* How to create high quality software using mainstream concepts of programming.
 
* What is object-oriented programming and its main advantages
 
* How to increase the level of abstraction with help of genericity.
 
* How to create concurrent programs and what are the main issues related to this kind of programming
 
   
  +
=== Course grading range ===
=== - What should a student be able to apply at the end of the course? ===
 
  +
{| class="wikitable"
  +
|+
  +
|-
  +
! Grade !! Range !! Description of performance
  +
|-
  +
| A. Excellent || 90-100 || -
  +
|-
  +
| B. Good || 75-89 || -
  +
|-
  +
| C. Satisfactory || 60-74 || -
  +
|-
  +
| D. Poor || 0-59 || -
  +
|}
   
  +
=== Course activities and grading breakdown ===
* To be able to create quality programs in Java.
 
  +
{| class="wikitable"
 
  +
|+
=== Course evaluation ===
 
  +
|-
 
  +
! Activity Type !! Percentage of the overall course grade
{|
 
  +
|-
|+ Course grade breakdown
 
  +
| Final Exam || 30
!
 
!
 
!align="center"| '''Proposed points'''
 
 
|-
 
|-
  +
| Final project || 30
| Labs/seminar classes
 
| 40
 
|align="center"| 40
 
 
|-
 
|-
  +
| Assignments || 30
| Interim performance assessment
 
| 30
 
|align="center"| 30
 
 
|-
 
|-
  +
| Lab Participation / Quizzes || 10
| Exams
 
| 30
 
|align="center"| 30
 
 
|}
 
|}
   
  +
=== Recommendations for students on how to succeed in the course ===
If necessary, please indicate freely your course’s features in terms of students’ performance assessment:
 
   
  +
The student is recommended the following scheme of preparation for classes:
==== Labs/seminar classes: ====
 
   
  +
* Work out lecture notes.
* In-class participation 1 point for each individual contribution in a class but not more than 1 point a week (i.e. 14 points in total for 14 study weeks),
 
  +
* Work out the materials of seminars (practical) classes.
* overall course contribution (to accumulate extra-class activities valuable to the course progress, e.g. a short presentation, book review, very active in-class participation, etc.) up to 6 points.
 
  +
* In case of difficulty, formulate questions to the teacher.
   
  +
To prepare for the classes, it is recommended to use the presented resources and additional literature.
==== Interim performance assessment: ====
 
   
  +
== Resources, literature and reference materials ==
* in-class tests up to 10 points for each test (i.e. up to 40 points in total for 2 theory and 2 practice tests),
 
* computational practicum assignment up to 10 points for each task (i.e. up to 30 points for 3 tasks).
 
   
==== Exams: ====
+
=== Open access resources ===
  +
* Handbook Of Natural Language Processing, Second Edition Chapman & Hall Crc Machine Learning & Pattern Recognition 2010
  +
* Clark, Alexander, Chris Fox, and Shalom Lappin, eds. The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
  +
* Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed.)
  +
* Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. – " O'Reilly Media, Inc.", 2019. SECOND EDITION
   
  +
=== Additional literature ===
* mid-term exam up to 30 points,
 
* final examination up to 30 points.
 
   
  +
* Osinga, Douwe. Deep Learning Cookbook: Practical Recipes to Get Started Quickly. O'Reilly Media, 2018.
==== Overall score: ====
 
  +
* Николенко С. Кадурин А., Архангельская Е. Глубокое обучение. – Спб.: Питер, 2018.
  +
* Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
   
  +
=== Closed access resources ===
100 points (100%).
 
   
=== Grades range ===
 
   
  +
=== Software and tools used within the course ===
{|
 
  +
|+ Course grading range
 
  +
= Teaching Methodology: Methods, techniques, & activities =
!
 
  +
!
 
  +
== Activities and Teaching Methods ==
!align="center"| '''Proposed range'''
 
  +
{| class="wikitable"
  +
|+ Activities within each section
 
|-
 
|-
  +
! Learning Activities !! Section 1 !! Section 2 !! Section 3
| A. Excellent
 
| 85-100
 
|align="center"| 85-100
 
 
|-
 
|-
  +
| Development of individual parts of software product code || 1 || 1 || 1
| B. Good
 
| 75-84
 
|align="center"| 75-84
 
 
|-
 
|-
  +
| Homework and group projects || 1 || 1 || 1
| C. Satisfactory
 
| 60-75
 
|align="center"| 60-75
 
 
|-
 
|-
  +
| Midterm evaluation || 1 || 1 || 1
| D. Poor
 
| 0-59
+
|-
  +
| Testing (written or computer based) || 1 || 1 || 1
|align="center"| 0-59
 
|}
+
|-
  +
| Discussions || 1 || 1 || 1
  +
|}
  +
== Formative Assessment and Course Activities ==
   
  +
=== Ongoing performance assessment ===
If necessary, please indicate freely your course’s grading features:
 
   
  +
==== Section 1 ====
* A: more than 85 of the overall score;
 
  +
{| class="wikitable"
* B: at least 85 of the overall score;
 
  +
|+
* C: at least 75 of the overall score;
 
  +
|-
* D: less than 60 of the overall score.
 
  +
! Activity Type !! Content !! Is Graded?
 
  +
|-
=== Resources and reference material ===
 
  +
| Question || Explain the Chomsky hierarchy || 1
 
  +
|-
==== Textbook: ====
 
  +
| Question || What is a language model? || 1
 
  +
|-
*
 
  +
| Question || What is perplexity in language modeling? || 1
*
 
  +
|-
 
  +
| Question || What is smoothing and why is smoothing used in language modeling? || 0
== Course Sections ==
 
  +
|}
 
The main sections of the course and approximate hour distribution between them is as follows:
 
   
  +
==== Section 2 ====
{|
 
  +
{| class="wikitable"
|+ Course Sections
 
  +
|+
|align="center"| '''Section'''
 
| '''Section Title'''
 
|align="center"| '''Lectures'''
 
|align="center"| '''Seminars'''
 
|align="center"| '''Self-study'''
 
|align="center"| '''Knowledge'''
 
 
|-
 
|-
  +
! Activity Type !! Content !! Is Graded?
|align="center"| '''Number'''
 
|
 
|align="center"| '''(hours)'''
 
|align="center"| '''(labs)'''
 
|align="center"|
 
|align="center"| '''evaluation'''
 
 
|-
 
|-
  +
| Question || Explain the MaxEnt principle in a modeling language || 1
|align="center"| 1
 
| Introduction to programming
 
|align="center"| 12
 
|align="center"| 6
 
|align="center"| 12
 
|align="center"| 2
 
 
|-
 
|-
  +
| Question || Describe examples of NLP applications and approaches to assessing their quality || 1
|align="center"| 2
 
  +
|}
| Introduction to object-oriented programming
 
  +
==== Section 3 ====
|align="center"| 8
 
  +
{| class="wikitable"
|align="center"| 4
 
  +
|+
|align="center"| 8
 
|align="center"| 1
 
 
|-
 
|-
  +
! Activity Type !! Content !! Is Graded?
|align="center"| 3
 
| Introduction to generics, exception handling and programming by contract (C)
 
|align="center"| 8
 
|align="center"| 4
 
|align="center"| 8
 
|align="center"| 1
 
 
|-
 
|-
  +
| Question || Why use an encoder-decoder model rather than a regular RNN for automatic translation? || 1
|align="center"| 4
 
| Introduction to programming environments
 
|align="center"| 12
 
|align="center"| 6
 
|align="center"| 12
 
|align="center"| 2
 
 
|-
 
|-
  +
| Question || How to handle variable length input sequences with RNN? || 1
|align="center"| 5
 
| Introduction to concurrent and functional programming
 
|align="center"| 8
 
|align="center"| 4
 
|align="center"| 8
 
|align="center"| 1
 
 
|-
 
|-
  +
| Question || What is beam search and why is it used? || 1
|align="center"| Final examination
 
|
+
|-
  +
| Question || Describe the components of the encoder-decoder model || 1
|align="center"|
 
  +
|}
|align="center"|
 
  +
==== Section 4 ====
|align="center"|
 
  +
{| class="wikitable"
|align="center"| 2
 
|}
+
|+
  +
|-
 
  +
! Activity Type !! Content !! Is Graded?
=== Section 1 ===
 
  +
|-
 
  +
| Question || What is the most important layer in the Transformer architecture? What is its purpose? || 1
==== Section title: ====
 
  +
|-
 
  +
| Question || Describe the architecture of BERT and the training process for this model. || 1
Introduction to programming
 
  +
|-
 
  +
| Question || Describe the architecture of GPT (version 2 or 3) and the training process for this model. || 1
=== Topics covered in this section: ===
 
  +
|}
 
  +
=== Final assessment ===
* Basic definitions – algorithm, program, computer, von Neumann architecture, CPU lifecycle.
 
  +
'''Section 1'''
* Programming languages history and overview. Imperative (procedural) and functional approaches.
 
  +
# Suppose you run a dependency parser (shift-reduce) with a standard arc system for a sentence of length n. How many shift operations are needed?
* Translation – compilation vs. interpretation. JIT, AOT. Hybrid modes.
 
  +
# Describe the main differences between formal languages (such as logic or programming languages) and natural languages (such as Russian).
* Introduction to typification. Static and dynamic typing. Type inference. Basic types – integer, real, character, boolean, bit. Arrays and strings. Records-structures.
 
  +
# Give an equation to find the most likely sequence of part-of-speech (POS) tags that can be used by a stochastic POS tagger. Assuming a bigram model.
* Programming – basic concepts. Statements and expressions. 3 atomic statements - assignment, if-check, goto. Control structures – conditional, assignment, goto, case-switch-inspect, loops.
 
  +
# Explain what is meant by the terms smoothing and backoff in the context of the stochastic POS tagger model.
* Variables and constants.
 
  +
'''Section 2'''
* Routines – procedures and functions.
 
  +
# Suppose you classify text based on a bag of words in a document. The raw input is a single line containing the text of the entire document. Describe in one or two sentences the pipeline from the raw input to the feature vector.
 
  +
# Suppose you have a neural network that matches the training data exactly. Describe two ways to solve this problem.
=== What forms of evaluation were used to test students’ performance in this section? ===
 
  +
# Compare different methods for text processing (decision trees, hidden markov models, support vector machines).
 
  +
'''Section 3'''
<div class="tabular">
 
  +
# How would you classify different RNN architectures?
 
  +
# You train a neural network with the Adam optimizer and observe the negative log probability on the training set over epochs. Instead of decreasing, it seems to fluctuate around where it started. What could you add to your training routine to fix this?
<span>|a|c|</span> &amp; '''Yes/No'''<br />
 
  +
# Describe the reason why the negative sampling skipgram model learns faster than the base skipgram model.
Development of individual parts of software product code &amp; 0<br />
 
  +
'''Section 4'''
Homework and group projects &amp; 1<br />
 
  +
# Compare BERT architectures with GPT (version 2 or 3). Explain the disadvantages and advantages of each.
Midterm evaluation &amp; 1<br />
 
  +
# What contextual representations do you know for words?
Testing (written or computer based) &amp; 1<br />
 
  +
# What is model distillation? How is it performed?
Reports &amp; 0<br />
 
  +
# Explain the attention mechanism.
Essays &amp; 0<br />
 
  +
# What is self-attention?
Oral polls &amp; 1<br />
 
  +
# Explain the Transformer architecture.
Discussions &amp; 1<br />
 
 
 
 
</div>
 
=== Typical questions for ongoing performance evaluation within this section ===
 
 
# What is the difference between compiler and interpreter?
 
# What is the difference between type and variable?
 
# What is the background of structured programming?
 
 
=== Typical questions for seminar classes (labs) within this section ===
 
 
# How to compile a program?
 
# How to run a program?
 
# How to debug a program?
 
 
=== Test questions for final assessment in this section ===
 
 
# What are the basic control structure of structured programming?
 
# What is the difference between statements and expressions?
 
# What are the benefits of type inference?
 
 
=== Section 2 ===
 
 
==== Section title: ====
 
 
Introduction to object-oriented programming
 
 
=== Topics covered in this section: ===
 
 
* Key principles of object-oriented programming
 
* Overloading is not overriding
 
* Concepts of class and object
 
* How objects can be created?
 
* Single and multiple inheritance
 
 
=== What forms of evaluation were used to test students’ performance in this section? ===
 
 
<div class="tabular">
 
 
<span>|a|c|</span> &amp; '''Yes/No'''<br />
 
Development of individual parts of software product code &amp; 1<br />
 
Homework and group projects &amp; 1<br />
 
Midterm evaluation &amp; 1<br />
 
Testing (written or computer based) &amp; 0<br />
 
Reports &amp; 1<br />
 
Essays &amp; 0<br />
 
Oral polls &amp; 1<br />
 
Discussions &amp; 1<br />
 
 
 
 
</div>
 
=== Typical questions for ongoing performance evaluation within this section ===
 
 
# What is the meaning of polymorphism?
 
# How to check the dynamic type of an object?
 
# What are the limitations of single inheritance?
 
# What are the issues related with multiple inheritance?
 
 
=== Typical questions for seminar classes (labs) within this section ===
 
 
# How to handle array of objects of some class type?
 
# How to implement the class which logically has to have 2 constructors with the same signature but with different semantics?
 
 
=== Test questions for final assessment in this section ===
 
 
# Name all principles of object-oriented programming?
 
# Explain what conformance means?
 
# Explain why cycles are prohibited in the inheritance graph?
 
 
=== Section 3 ===
 
 
==== Section title: ====
 
 
Introduction to generics, exception handling and programming by contract (C)
 
 
=== Topics covered in this section: ===
 
 
* Introduction to generics
 
* Introduction to exception handling
 
* Introduction to programming by contract (C)
 
 
=== What forms of evaluation were used to test students’ performance in this section? ===
 
 
<div class="tabular">
 
 
<span>|a|c|</span> &amp; '''Yes/No'''<br />
 
Development of individual parts of software product code &amp; 0<br />
 
Homework and group projects &amp; 1<br />
 
Midterm evaluation &amp; 1<br />
 
Testing (written or computer based) &amp; 0<br />
 
Reports &amp; 0<br />
 
Essays &amp; 0<br />
 
Oral polls &amp; 1<br />
 
Discussions &amp; 1<br />
 
 
 
 
</div>
 
=== Typical questions for ongoing performance evaluation within this section ===
 
 
# What is constrained genericity?
 
# What is exception?
 
# What is assertion?
 
 
=== Typical questions for seminar classes (labs) within this section ===
 
 
# How constrained genericity may be used for sorting of objects?
 
# In which order catch blocks are being processed?
 
# Where is the problem when precondition is violated?
 
 
=== Test questions for final assessment in this section ===
 
 
# Can array be treated as generic class?
 
# What is the difference between throw and throws in Java?
 
# What is purpose of the class invariant?
 
 
=== Section 4 ===
 
 
==== Section title: ====
 
 
Introduction to programming environments
 
 
=== Topics covered in this section: ===
 
 
* Concept of libraries as the basis for reuse.
 
* Concept of interfaces/API. Separate compilation.
 
* Approaches to software documentation.
 
* Persistence. Files.
 
* How to building a program. Recompilation problem. Name clashes, name spaces
 
 
=== What forms of evaluation were used to test students’ performance in this section? ===
 
 
<div class="tabular">
 
 
<span>|a|c|</span> &amp; '''Yes/No'''<br />
 
Development of individual parts of software product code &amp; 0<br />
 
Homework and group projects &amp; 1<br />
 
Midterm evaluation &amp; 0<br />
 
Testing (written or computer based) &amp; 1<br />
 
Reports &amp; 0<br />
 
Essays &amp; 0<br />
 
Oral polls &amp; 1<br />
 
Discussions &amp; 1<br />
 
 
 
 
</div>
 
=== Typical questions for ongoing performance evaluation within this section ===
 
 
# How reuse helps to develop software?
 
# How concept of libraries and separate compilation co-relate?
 
# What are the benefits of integrating documentation into the source code?
 
# Why is it essential to have persistent data structures?
 
 
=== Typical questions for seminar classes (labs) within this section ===
 
 
# What is to be done to design and develop a library?
 
# How to add documenting comments into the source code?
 
# What ways exists in Java to support persistence ?
 
 
=== Test questions for final assessment in this section ===
 
 
# How to deal with name clashes?
 
# What is the main task of the recompilation module?
 
# What are the differences between different formats of persistence files?
 
 
=== Section 5 ===
 
 
==== Section title: ====
 
 
Introduction to concurrent and functional programming
 
 
=== Topics covered in this section: ===
 
 
* Concurrent programming.
 
* Functional programming within imperative programming languages.
 
 
=== What forms of evaluation were used to test students’ performance in this section? ===
 
 
<div class="tabular">
 
 
<span>|a|c|</span> &amp; '''Yes/No'''<br />
 
Development of individual parts of software product code &amp; 0<br />
 
Homework and group projects &amp; 1<br />
 
Midterm evaluation &amp; 0<br />
 
Testing (written or computer based) &amp; 1<br />
 
Reports &amp; 0<br />
 
Essays &amp; 0<br />
 
Oral polls &amp; 1<br />
 
Discussions &amp; 1<br />
 
 
 
 
</div>
 
=== Typical questions for ongoing performance evaluation within this section ===
 
 
# Explain the key differences parallelism and concurrency
 
# What are the key issues related to parallel execution?
 
# What are the models of parallel execution?
 
# What is the difference between function and object?
 
 
=== Typical questions for seminar classes (labs) within this section ===
 
 
# Which Java construction support concurrency?
 
# What is a thread?
 
# What is in-line lambda function?
 
 
=== Test questions for final assessment in this section ===
 
 
# What is the meaning of SIMD and MIMD?
 
# What are the implications of the Amdahl’s law?
 
# What model of concurrency Java relies on?
 
# Which function can be considered as pure?
 
# How to declare a function to accept a functional object as its argument?
 
# How Java supports high-order functions?
 
# How capturing variables works in Java?
 

Latest revision as of 19:09, 10 January 2023

Natural Language Processing

  • Course name: Natural Language Processing
  • Code discipline:
  • Subject area:

Short Description

The course covers classical and modern methods of processing and analyzing natural language texts. It aims to teach fundamental approaches to text analysis, to develop and consolidate skills in working with modern software tools for natural language processing.

Prerequisites

As an undergraduate level course, the students are expected to have basic understanding of probability, linear algebra, programming in python and basics of machine learning.

Prerequisite subjects

  • CSE101 — Introduction to Programming I
  • CSE202 — Analytical Geometry and Linear Algebra I
  • CSE206 — Probability And Statistics
  • CSE302 — Introduction to Machine Learning

Prerequisite topics

Course Topics

Course Sections and Topics
Section Topics within the section
Formal foundations of text analysis methods
  1. Fundamentals of the theory of formal languages
  2. Statistical language modeling
  3. Theory of parsing
Classical models of representation and analysis of text and applications
  1. Models Based on Entropy Maximization (MaxEnt)
  2. Decision trees in text processing. Markov models. Support Vector Machines in Text Classification Problems
  3. Applications: information extraction, question-answer systems, text generation, machine translation
  4. Quality assessment of NLP systems
Neural network models for text analysis
  1. Vector representations of words. word2vec model, contextual vector representations
  2. Architectures based on convolutional networks
  3. Architectures based on recurrent networks
  4. Encoder-decoder architecture
  5. Attention mechanism
Modern models based on the "Transformer" architecture
  1. Architecture "Transformer"
  2. Self-attention mechanism
  3. Pre-trained language models. BERT. GPT

Intended Learning Outcomes (ILOs)

What is the main purpose of this course?

The course is about the processing and modeling natural languages. In addition to frontal lectures, the flipped classes and student project presentations will be organized. During lab sessions the working language is Python. The primary framework for deep learning is PyTorch. Usage of TensorFlow and Keras is possible, usage of Docker is highly appreciated.

ILOs defined at three levels

Level 1: What concepts should a student know/remember/explain?

By the end of the course, the students should know ...

  • Fundamental approaches to text analysis;
  • Various natural language processing algorithms;
  • Ways to measure the performance of NLP systems;
  • Popular software tools for natural language processing.

Level 2: What basic practical skills should a student be able to perform?

By the end of the course, the students should be able to ...

  • to describe and explain the difference between formal and natural languages;
  • to describe and explain classical methods used for text analysis;
  • to describe and explain neural network architectures used for text analysis;
  • to describe and explain the difference between different neural network architectures for text analysis;
  • to describe and explain modern architectures based on the Transformer.

Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?

By the end of the course, the students should be able to ...

  • to apply machine learning methods for solving text processing problems;
  • to apply methods for assessing the quality of NLP systems;
  • to apply deep learning algorithms for solving text processing problems.

Grading

Course grading range

Grade Range Description of performance
A. Excellent 90-100 -
B. Good 75-89 -
C. Satisfactory 60-74 -
D. Poor 0-59 -

Course activities and grading breakdown

Activity Type Percentage of the overall course grade
Final Exam 30
Final project 30
Assignments 30
Lab Participation / Quizzes 10

Recommendations for students on how to succeed in the course

The student is recommended the following scheme of preparation for classes:

  • Work out lecture notes.
  • Work out the materials of seminars (practical) classes.
  • In case of difficulty, formulate questions to the teacher.

To prepare for the classes, it is recommended to use the presented resources and additional literature.

Resources, literature and reference materials

Open access resources

  • Handbook Of Natural Language Processing, Second Edition Chapman & Hall Crc Machine Learning & Pattern Recognition 2010
  • Clark, Alexander, Chris Fox, and Shalom Lappin, eds. The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
  • Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed.)
  • Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. – " O'Reilly Media, Inc.", 2019. SECOND EDITION

Additional literature

  • Osinga, Douwe. Deep Learning Cookbook: Practical Recipes to Get Started Quickly. O'Reilly Media, 2018.
  • Николенко С. Кадурин А., Архангельская Е. Глубокое обучение. – Спб.: Питер, 2018.
  • Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing

Closed access resources

Software and tools used within the course

Teaching Methodology: Methods, techniques, & activities

Activities and Teaching Methods

Activities within each section
Learning Activities Section 1 Section 2 Section 3
Development of individual parts of software product code 1 1 1
Homework and group projects 1 1 1
Midterm evaluation 1 1 1
Testing (written or computer based) 1 1 1
Discussions 1 1 1

Formative Assessment and Course Activities

Ongoing performance assessment

Section 1

Activity Type Content Is Graded?
Question Explain the Chomsky hierarchy 1
Question What is a language model? 1
Question What is perplexity in language modeling? 1
Question What is smoothing and why is smoothing used in language modeling? 0

Section 2

Activity Type Content Is Graded?
Question Explain the MaxEnt principle in a modeling language 1
Question Describe examples of NLP applications and approaches to assessing their quality 1

Section 3

Activity Type Content Is Graded?
Question Why use an encoder-decoder model rather than a regular RNN for automatic translation? 1
Question How to handle variable length input sequences with RNN? 1
Question What is beam search and why is it used? 1
Question Describe the components of the encoder-decoder model 1

Section 4

Activity Type Content Is Graded?
Question What is the most important layer in the Transformer architecture? What is its purpose? 1
Question Describe the architecture of BERT and the training process for this model. 1
Question Describe the architecture of GPT (version 2 or 3) and the training process for this model. 1

Final assessment

Section 1

  1. Suppose you run a dependency parser (shift-reduce) with a standard arc system for a sentence of length n. How many shift operations are needed?
  2. Describe the main differences between formal languages (such as logic or programming languages) and natural languages (such as Russian).
  3. Give an equation to find the most likely sequence of part-of-speech (POS) tags that can be used by a stochastic POS tagger. Assuming a bigram model.
  4. Explain what is meant by the terms smoothing and backoff in the context of the stochastic POS tagger model.

Section 2

  1. Suppose you classify text based on a bag of words in a document. The raw input is a single line containing the text of the entire document. Describe in one or two sentences the pipeline from the raw input to the feature vector.
  2. Suppose you have a neural network that matches the training data exactly. Describe two ways to solve this problem.
  3. Compare different methods for text processing (decision trees, hidden markov models, support vector machines).

Section 3

  1. How would you classify different RNN architectures?
  2. You train a neural network with the Adam optimizer and observe the negative log probability on the training set over epochs. Instead of decreasing, it seems to fluctuate around where it started. What could you add to your training routine to fix this?
  3. Describe the reason why the negative sampling skipgram model learns faster than the base skipgram model.

Section 4

  1. Compare BERT architectures with GPT (version 2 or 3). Explain the disadvantages and advantages of each.
  2. What contextual representations do you know for words?
  3. What is model distillation? How is it performed?
  4. Explain the attention mechanism.
  5. What is self-attention?
  6. Explain the Transformer architecture.