Difference between revisions of "BSc: Natural Language Processing"

From IU
Jump to navigation Jump to search
(Blanked the page)
Tag: Blanking
Line 1: Line 1:
  +
  +
= Natural Language Processing =
  +
* '''Course name''': Natural Language Processing
  +
* '''Code discipline''':
  +
* '''Subject area''':
  +
  +
== Short Description ==
  +
The course covers classical and modern methods of processing and analyzing natural language texts. It aims to teach fundamental approaches to text analysis, to develop and consolidate skills in working with modern software tools for natural language processing.
  +
  +
== Prerequisites ==
  +
  +
=== Prerequisite subjects ===
  +
* CSE302 — Introduction to Machine Learning
  +
  +
== Course Topics ==
  +
{| class="wikitable"
  +
|+ Course Sections and Topics
  +
|-
  +
! Section !! Topics within the section
  +
|-
  +
| Formal foundations of text analysis methods ||
  +
# Fundamentals of the theory of formal languages
  +
# Statistical language modeling
  +
# Theory of parsing
  +
|-
  +
| Classical models of representation and analysis of text and applications ||
  +
# Models Based on Entropy Maximization (MaxEnt)
  +
# Decision trees in text processing. Markov models. Support Vector Machines in Text Classification Problems
  +
# Applications: information extraction, question-answer systems, text generation, machine translation
  +
# Quality assessment of NLP systems
  +
|-
  +
| Neural network models for text analysis ||
  +
# Vector representations of words. word2vec model, contextual vector representations
  +
# Architectures based on convolutional networks
  +
# Architectures based on recurrent networks
  +
# Encoder-decoder architecture
  +
# Attention mechanism
  +
|-
  +
| Modern models based on the "Transformer" architecture ||
  +
# Architecture "Transformer"
  +
# Self-attention mechanism
  +
# Pre-trained language models. BERT. GPT
  +
|}
  +
== Intended Learning Outcomes (ILOs) ==
  +
  +
=== What is the main purpose of this course? ===
  +
The course is about the processing and modeling natural languages. In addition to frontal lectures, the flipped classes and student project presentations will be organized. During lab sessions the working language is Python. The primary framework for deep learning is PyTorch. Usage of TensorFlow and Keras is possible, usage of Docker is highly appreciated.
  +
  +
=== ILOs defined at three levels ===
  +
  +
==== Level 1: What concepts should a student know/remember/explain? ====
  +
By the end of the course, the students should know:
  +
* Fundamental approaches to text analysis
  +
* Various natural language processing algorithms;
  +
* Ways to measure the performance of NLP systems;
  +
* Popular software tools for natural language processing.
  +
  +
==== Level 2: What basic practical skills should a student be able to perform? ====
  +
By the end of the course, the students should be able to ...
  +
* to describe and explain the difference between formal and natural languages;
  +
* to describe and explain classical methods used for text analysis;
  +
* to describe and explain neural network architectures used for text analysis;
  +
* to describe and explain the difference between different neural network architectures for text analysis;
  +
* to describe and explain modern architectures based on the Transformer.
  +
  +
==== Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? ====
  +
By the end of the course, the students should be able to ...
  +
* to apply machine learning methods for solving text processing problems;
  +
* to apply methods for assessing the quality of NLP systems;
  +
* to apply deep learning algorithms for solving text processing problems.
  +
  +
== Grading ==
  +
  +
=== Course grading range ===
  +
{| class="wikitable"
  +
|+
  +
|-
  +
! Grade !! Range !! Description of performance
  +
|-
  +
| A. Excellent || 90-100 || -
  +
|-
  +
| B. Good || 75-89 || -
  +
|-
  +
| C. Satisfactory || 60-74 || -
  +
|-
  +
| D. Poor || 0-59 || -
  +
|}
  +
  +
=== Course activities and grading breakdown ===
  +
{| class="wikitable"
  +
|+
  +
|-
  +
! Activity Type !! Percentage of the overall course grade
  +
|-
  +
| Midterm || 30
  +
|-
  +
| Final project || 40
  +
|-
  +
| Assignments || 20
  +
|-
  +
| Lab Participation / Quizzes || 10
  +
|}
  +
  +
=== Recommendations for students on how to succeed in the course ===
  +
  +
The student is recommended the following scheme of preparation for classes:
  +
  +
* Work out lecture notes.
  +
* Work out the materials of seminars (practical) classes.
  +
* In case of difficulty, formulate questions to the teacher.
  +
  +
To prepare for the classes, it is recommended to use the presented resources and additional literature.
  +
  +
== Resources, literature and reference materials ==
  +
  +
=== Open access resources ===
  +
* Clark, Alexander, Chris Fox, and Shalom Lappin, eds. The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
  +
* Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed.)
  +
* Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. – " O'Reilly Media, Inc.", 2019. SECOND EDITION
  +
  +
=== Additional literature ===
  +
  +
* Osinga, Douwe. Deep Learning Cookbook: Practical Recipes to Get Started Quickly. O'Reilly Media, 2018.
  +
* Николенко С. Кадурин А., Архангельская Е. Глубокое обучение. – Спб.: Питер, 2018.
  +
* Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
  +
  +
=== Closed access resources ===
  +
  +
  +
=== Software and tools used within the course ===
  +
  +
= Teaching Methodology: Methods, techniques, & activities =
  +
  +
== Activities and Teaching Methods ==
  +
{| class="wikitable"
  +
|+ Activities within each section
  +
|-
  +
! Learning Activities !! Section 1 !! Section 2 !! Section 3
  +
|-
  +
| Development of individual parts of software product code || 1 || 1 || 1
  +
|-
  +
| Homework and group projects || 1 || 1 || 1
  +
|-
  +
| Midterm evaluation || 1 || 1 || 1
  +
|-
  +
| Testing (written or computer based) || 1 || 1 || 1
  +
|-
  +
| Discussions || 1 || 1 || 1
  +
|}
  +
== Formative Assessment and Course Activities ==
  +
  +
=== Ongoing performance assessment ===
  +
  +
==== Section 1 ====
  +
{| class="wikitable"
  +
|+
  +
|-
  +
! Activity Type !! Content !! Is Graded?
  +
|-
  +
| Question || ? || 1
  +
|-
  +
| Question || ? || 1
  +
|}
  +
==== Section 2 ====
  +
{| class="wikitable"
  +
|+
  +
|-
  +
! Activity Type !! Content !! Is Graded?
  +
|-
  +
| Question || ? || 1
  +
|-
  +
| Question || ? || 1
  +
|}
  +
==== Section 3 ====
  +
{| class="wikitable"
  +
|+
  +
|-
  +
! Activity Type !! Content !! Is Graded?
  +
|-
  +
| Question || ? || 1
  +
|-
  +
| Question || ? || 1
  +
|}
  +
=== Final assessment ===
  +
'''Section 1'''
  +
# ?
  +
'''Section 2'''
  +
# ?
  +
'''Section 3'''
  +
# ?
  +
'''Section 4'''
  +
# ?
  +
  +
=== The retake exam ===
  +
'''Section 1'''
  +
  +
'''Section 2'''
  +
  +
'''Section 3'''
  +
  +
'''Section 4'''

Revision as of 19:56, 29 December 2022

Natural Language Processing

  • Course name: Natural Language Processing
  • Code discipline:
  • Subject area:

Short Description

The course covers classical and modern methods of processing and analyzing natural language texts. It aims to teach fundamental approaches to text analysis, to develop and consolidate skills in working with modern software tools for natural language processing.

Prerequisites

Prerequisite subjects

  • CSE302 — Introduction to Machine Learning

Course Topics

Course Sections and Topics
Section Topics within the section
Formal foundations of text analysis methods
  1. Fundamentals of the theory of formal languages
  2. Statistical language modeling
  3. Theory of parsing
Classical models of representation and analysis of text and applications
  1. Models Based on Entropy Maximization (MaxEnt)
  2. Decision trees in text processing. Markov models. Support Vector Machines in Text Classification Problems
  3. Applications: information extraction, question-answer systems, text generation, machine translation
  4. Quality assessment of NLP systems
Neural network models for text analysis
  1. Vector representations of words. word2vec model, contextual vector representations
  2. Architectures based on convolutional networks
  3. Architectures based on recurrent networks
  4. Encoder-decoder architecture
  5. Attention mechanism
Modern models based on the "Transformer" architecture
  1. Architecture "Transformer"
  2. Self-attention mechanism
  3. Pre-trained language models. BERT. GPT

Intended Learning Outcomes (ILOs)

What is the main purpose of this course?

The course is about the processing and modeling natural languages. In addition to frontal lectures, the flipped classes and student project presentations will be organized. During lab sessions the working language is Python. The primary framework for deep learning is PyTorch. Usage of TensorFlow and Keras is possible, usage of Docker is highly appreciated.

ILOs defined at three levels

Level 1: What concepts should a student know/remember/explain?

By the end of the course, the students should know:

  • Fundamental approaches to text analysis
  • Various natural language processing algorithms;
  • Ways to measure the performance of NLP systems;
  • Popular software tools for natural language processing.

Level 2: What basic practical skills should a student be able to perform?

By the end of the course, the students should be able to ...

  • to describe and explain the difference between formal and natural languages;
  • to describe and explain classical methods used for text analysis;
  • to describe and explain neural network architectures used for text analysis;
  • to describe and explain the difference between different neural network architectures for text analysis;
  • to describe and explain modern architectures based on the Transformer.

Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?

By the end of the course, the students should be able to ...

  • to apply machine learning methods for solving text processing problems;
  • to apply methods for assessing the quality of NLP systems;
  • to apply deep learning algorithms for solving text processing problems.

Grading

Course grading range

Grade Range Description of performance
A. Excellent 90-100 -
B. Good 75-89 -
C. Satisfactory 60-74 -
D. Poor 0-59 -

Course activities and grading breakdown

Activity Type Percentage of the overall course grade
Midterm 30
Final project 40
Assignments 20
Lab Participation / Quizzes 10

Recommendations for students on how to succeed in the course

The student is recommended the following scheme of preparation for classes:

  • Work out lecture notes.
  • Work out the materials of seminars (practical) classes.
  • In case of difficulty, formulate questions to the teacher.

To prepare for the classes, it is recommended to use the presented resources and additional literature.

Resources, literature and reference materials

Open access resources

  • Clark, Alexander, Chris Fox, and Shalom Lappin, eds. The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
  • Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed.)
  • Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. – " O'Reilly Media, Inc.", 2019. SECOND EDITION

Additional literature

  • Osinga, Douwe. Deep Learning Cookbook: Practical Recipes to Get Started Quickly. O'Reilly Media, 2018.
  • Николенко С. Кадурин А., Архангельская Е. Глубокое обучение. – Спб.: Питер, 2018.
  • Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing

Closed access resources

Software and tools used within the course

Teaching Methodology: Methods, techniques, & activities

Activities and Teaching Methods

Activities within each section
Learning Activities Section 1 Section 2 Section 3
Development of individual parts of software product code 1 1 1
Homework and group projects 1 1 1
Midterm evaluation 1 1 1
Testing (written or computer based) 1 1 1
Discussions 1 1 1

Formative Assessment and Course Activities

Ongoing performance assessment

Section 1

Activity Type Content Is Graded?
Question ? 1
Question ? 1

Section 2

Activity Type Content Is Graded?
Question ? 1
Question ? 1

Section 3

Activity Type Content Is Graded?
Question ? 1
Question ? 1

Final assessment

Section 1

  1. ?

Section 2

  1. ?

Section 3

  1. ?

Section 4

  1. ?

The retake exam

Section 1

Section 2

Section 3

Section 4