BSc: Natural Language Processing

From IU
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Natural Language Processing

  • Course name: Natural Language Processing
  • Code discipline:
  • Subject area:

Short Description

The course covers classical and modern methods of processing and analyzing natural language texts. It aims to teach fundamental approaches to text analysis, to develop and consolidate skills in working with modern software tools for natural language processing.

Prerequisites

As an undergraduate level course, the students are expected to have basic understanding of probability, linear algebra, programming in python and basics of machine learning.

Prerequisite subjects

  • CSE101 — Introduction to Programming I
  • CSE202 — Analytical Geometry and Linear Algebra I
  • CSE206 — Probability And Statistics
  • CSE302 — Introduction to Machine Learning

Prerequisite topics

Course Topics

Course Sections and Topics
Section Topics within the section
Formal foundations of text analysis methods
  1. Fundamentals of the theory of formal languages
  2. Statistical language modeling
  3. Theory of parsing
Classical models of representation and analysis of text and applications
  1. Models Based on Entropy Maximization (MaxEnt)
  2. Decision trees in text processing. Markov models. Support Vector Machines in Text Classification Problems
  3. Applications: information extraction, question-answer systems, text generation, machine translation
  4. Quality assessment of NLP systems
Neural network models for text analysis
  1. Vector representations of words. word2vec model, contextual vector representations
  2. Architectures based on convolutional networks
  3. Architectures based on recurrent networks
  4. Encoder-decoder architecture
  5. Attention mechanism
Modern models based on the "Transformer" architecture
  1. Architecture "Transformer"
  2. Self-attention mechanism
  3. Pre-trained language models. BERT. GPT

Intended Learning Outcomes (ILOs)

What is the main purpose of this course?

The course is about the processing and modeling natural languages. In addition to frontal lectures, the flipped classes and student project presentations will be organized. During lab sessions the working language is Python. The primary framework for deep learning is PyTorch. Usage of TensorFlow and Keras is possible, usage of Docker is highly appreciated.

ILOs defined at three levels

Level 1: What concepts should a student know/remember/explain?

By the end of the course, the students should know ...

  • Fundamental approaches to text analysis;
  • Various natural language processing algorithms;
  • Ways to measure the performance of NLP systems;
  • Popular software tools for natural language processing.

Level 2: What basic practical skills should a student be able to perform?

By the end of the course, the students should be able to ...

  • to describe and explain the difference between formal and natural languages;
  • to describe and explain classical methods used for text analysis;
  • to describe and explain neural network architectures used for text analysis;
  • to describe and explain the difference between different neural network architectures for text analysis;
  • to describe and explain modern architectures based on the Transformer.

Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?

By the end of the course, the students should be able to ...

  • to apply machine learning methods for solving text processing problems;
  • to apply methods for assessing the quality of NLP systems;
  • to apply deep learning algorithms for solving text processing problems.

Grading

Course grading range

Grade Range Description of performance
A. Excellent 90-100 -
B. Good 75-89 -
C. Satisfactory 60-74 -
D. Poor 0-59 -

Course activities and grading breakdown

Activity Type Percentage of the overall course grade
Final Exam 30
Final project 30
Assignments 30
Lab Participation / Quizzes 10

Recommendations for students on how to succeed in the course

The student is recommended the following scheme of preparation for classes:

  • Work out lecture notes.
  • Work out the materials of seminars (practical) classes.
  • In case of difficulty, formulate questions to the teacher.

To prepare for the classes, it is recommended to use the presented resources and additional literature.

Resources, literature and reference materials

Open access resources

  • Handbook Of Natural Language Processing, Second Edition Chapman & Hall Crc Machine Learning & Pattern Recognition 2010
  • Clark, Alexander, Chris Fox, and Shalom Lappin, eds. The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
  • Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed.)
  • Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. – " O'Reilly Media, Inc.", 2019. SECOND EDITION

Additional literature

  • Osinga, Douwe. Deep Learning Cookbook: Practical Recipes to Get Started Quickly. O'Reilly Media, 2018.
  • Николенко С. Кадурин А., Архангельская Е. Глубокое обучение. – Спб.: Питер, 2018.
  • Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing

Closed access resources

Software and tools used within the course

Teaching Methodology: Methods, techniques, & activities

Activities and Teaching Methods

Activities within each section
Learning Activities Section 1 Section 2 Section 3
Development of individual parts of software product code 1 1 1
Homework and group projects 1 1 1
Midterm evaluation 1 1 1
Testing (written or computer based) 1 1 1
Discussions 1 1 1

Formative Assessment and Course Activities

Ongoing performance assessment

Section 1

Activity Type Content Is Graded?
Question Explain the Chomsky hierarchy 1
Question What is a language model? 1
Question What is perplexity in language modeling? 1
Question What is smoothing and why is smoothing used in language modeling? 0

Section 2

Activity Type Content Is Graded?
Question Explain the MaxEnt principle in a modeling language 1
Question Describe examples of NLP applications and approaches to assessing their quality 1

Section 3

Activity Type Content Is Graded?
Question Why use an encoder-decoder model rather than a regular RNN for automatic translation? 1
Question How to handle variable length input sequences with RNN? 1
Question What is beam search and why is it used? 1
Question Describe the components of the encoder-decoder model 1

Section 4

Activity Type Content Is Graded?
Question What is the most important layer in the Transformer architecture? What is its purpose? 1
Question Describe the architecture of BERT and the training process for this model. 1
Question Describe the architecture of GPT (version 2 or 3) and the training process for this model. 1

Final assessment

Section 1

  1. Suppose you run a dependency parser (shift-reduce) with a standard arc system for a sentence of length n. How many shift operations are needed?
  2. Describe the main differences between formal languages (such as logic or programming languages) and natural languages (such as Russian).
  3. Give an equation to find the most likely sequence of part-of-speech (POS) tags that can be used by a stochastic POS tagger. Assuming a bigram model.
  4. Explain what is meant by the terms smoothing and backoff in the context of the stochastic POS tagger model.

Section 2

  1. Suppose you classify text based on a bag of words in a document. The raw input is a single line containing the text of the entire document. Describe in one or two sentences the pipeline from the raw input to the feature vector.
  2. Suppose you have a neural network that matches the training data exactly. Describe two ways to solve this problem.
  3. Compare different methods for text processing (decision trees, hidden markov models, support vector machines).

Section 3

  1. How would you classify different RNN architectures?
  2. You train a neural network with the Adam optimizer and observe the negative log probability on the training set over epochs. Instead of decreasing, it seems to fluctuate around where it started. What could you add to your training routine to fix this?
  3. Describe the reason why the negative sampling skipgram model learns faster than the base skipgram model.

Section 4

  1. Compare BERT architectures with GPT (version 2 or 3). Explain the disadvantages and advantages of each.
  2. What contextual representations do you know for words?
  3. What is model distillation? How is it performed?
  4. Explain the attention mechanism.
  5. What is self-attention?
  6. Explain the Transformer architecture.