Introduction to Machine Learning

Course name: Introduction to Machine Learning
Course number: R-01

Course characteristics

Key concepts of the class

Machine learning paradigms
Machine Learning approaches, and algorithms

What is the purpose of this course?

There is a growing business need of individuals skilled in artificial intelligence, data analytics, and machine learning. Therefore, the purpose of this course is to provide students with an intensive treatment of a cross-section of the key elements of machine learning, with an emphasis on implementing them in modern programming environments, and using them to solve real-world data science problems.

Prerequisites

The course will benefit if students already know some topics of mathematics and programming.

Maths:

CSE202 — Analytical Geometry and Linear Algebra I
CSE204 — Analytical Geometry and Linear Algebra II
CSE201 — Mathematical Analysis I
CSE203 — Mathematical Analysis II
CSE206 — Probability And Statistics

Programming:

CSE117 — Data Structures and Algorithms: python, numpy, basic object-oriented concepts, memory management.

For a more concrete identification of subtopics, please see chapters 2, 3 and 4 of (1), which has done an excellent job in listing and describing all important maths subtopics essential for machine learning students. In addition to that, students are strongly advised to gain a basic understanding of descriptive statistics and data distributions, statistical hypothesis testing, and data sampling, resampling, and experimental design techniques.

(1) Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press.

Course Objectives Based on Bloom’s Taxonomy

What should a student remember at the end of the course?

By the end of the course, the students should be able to recognize and define

Different learning paradigms
A wide variety of learning approaches and algorithms
Various learning settings
Performance metrics
Popular machine learning software tools

What should a student be able to understand at the end of the course?

By the end of the course, the students should be able to describe and explain (with examples)

Difference between different learning paradigms
Difference between classification and regression
Concept of learning theory (bias/variance tradeoffs and large margins etc.)
Kernel methods
Regularization
Ensemble Learning
Neural or Deep Learning

What should a student be able to apply at the end of the course?

By the end of the course, the students should be able to apply

Classification approaches to solve supervised learning problems
Clustering approaches to solve unsupervised learning problems
Ensemble learning to improve a model’s performance
Regularization to improve a model’s generalization
Deep learning algorithms to solve real-world problems

Course evaluation

Course grade breakdown
		Proposed points
Labs/seminar classes	20	0
Interim performance assessment	30	40
Exams	50	60

If necessary, please indicate freely your course’s features in terms of students’ performance assessment: None

Grades range

Course grading range
		Proposed range
A. Excellent	90-100
B. Good	75-89
C. Satisfactory	60-74
D. Poor	0-59

If necessary, please indicate freely your course’s grading features: The semester starts with the default range as proposed in the Table [tab:MLCourseGradingRange], but it may change slightly (usually reduced) depending on how the semester progresses.

Resources and reference material

T. Hastie, R. Tibshirani, D. Witten and G. James. An Introduction to Statistical Learning. Springer 2013.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer 2011.
Tom M Mitchel. Machine Learning, McGraw Hill
Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer

Course Sections

The main sections of the course and approximate hour distribution between them is as follows:

Course Sections
Section	Section Title	Teaching Hours
1	Supervised Learning	24
2	Decision Trees and Ensemble Learning	8
3	Unsupervised Learning	8
4	Deep Learning	12

Section 1 Section title:

Supervised Learning

Topics covered in this section:

Introduction to Machine Learning
Derivatives and Cost Function
Data Pre-processing
Linear Regression
Multiple Linear Regression
Gradient Descent
Polynomial Regression
Bias-varaince Tradeoff
Difference between classification and regression
Logistic Regression
Naive Bayes
KNN
Confusion Metrics
Performance Metrics
Regularization
Hyperplane Based Classification
Perceptron Learning Algorithm
Max-Margin Classification
Support Vector Machines
Slack Variables
Lagrangian Support Vector Machines
Kernel Trick

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1

Typical questions for ongoing performance evaluation within this section

Is it true that in simple linear regression ${\textstyle R^{2}}$ and the squared correlation between X and Y are identical?
What are the two assumptions that the Linear regression model makes about the Error Terms?
Fit a regression model to a given data problem, and support your choice of the model.
In a list of given tasks, choose which are regression and which are classification tasks.
In a given graphical model of binary random variables, how many parameters are needed to define the Conditional Probability Distributions for this Bayes Net?
Write the mathematical form of the minimization objective of Rosenblatt’s perceptron learning algorithm for a two-dimensional case.
What is perceptron learning algorithm?
Write the mathematical form of its minimization objective for a two-dimensional case.
What is a max-margin classifier?
Explain the role of slack variable in SVM.

Typical questions for seminar classes (labs) within this section

How to implement various regression models to solve different regression problems?
Describe the difference between different types of regression models, their pros and cons, etc.
Implement various classification models to solve different classification problems.
Describe the difference between Logistic regression and naive bayes.
Implement perceptron learning algorithm, SVMs, and its variants to solve different classification problems.
Solve a given optimization problem using the Lagrange multiplier method.

Test questions for final assessment in this section

What does it mean for the standard least squares coefficient estimates of linear regression to be scale equivariant?
Given a fitted regression model to a dataset, interpret its coefficients.
Explain which regression model would be a better fit to model the relationship between response and predictor in a given data.
If the number of training examples goes to infinity, how will it affect the bias and variance of a classification model?
Given a two dimensional classification problem, determine if by using Logistic regression and regularization, a linear boundary can be estimated or not.
Explain which classification model would be a better fit to for a given classification problem.
Consider the Leave-one-out-CV error of standard two-class SVM. Argue that under a given value of slack variable, a given mathematical statement is either correct or incorrect.
How does the choice of slack variable affect the bias-variance tradeoff in SVM?
Explain which Kernel would be a better fit to be used in SVM for a given data.

Section 2 Section title:

Decision Trees and Ensemble Methods

Topics covered in this section:

Decision Trees
Bagging
Boosting
Random Forest
Adaboost

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1

Typical questions for ongoing performance evaluation within this section

What are pros and cons of decision trees over other classification models?
Explain how tree-pruning works.
What is the purpose of ensemble learning?
What is a bootstrap, and what is its role in Ensemble learning?
Explain the role of slack variable in SVM.

Typical questions for seminar classes (labs) within this section

Implement different variants of decision trees to solve different classification problems.
Solve a given classification problem problem using an ensemble classifier.
Implement Adaboost for a given problem.

Test questions for final assessment in this section

When a decision tree is grown to full depth, how does it affect tree’s bias and variance, and its response to noisy data?
Argue if an ensemble model would be a better choice for a given classification problem or not.
Given a particular iteration of boosting and other important information, calculate the weights of the Adaboost classifier.

Section 3 Section title:

Unsupervised Learning

Topics covered in this section:

K-means Clustering
K-means++
Hierarchical Clustering
DBSCAN
Mean-shift

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1

Typical questions for ongoing performance evaluation within this section

Which implicit or explicit objective function does K-means implement?
Explain the difference between k-means and k-means++.
Whaat is single-linkage and what are its pros and cons?
Explain how DBSCAN works.

Typical questions for seminar classes (labs) within this section

Implement different clustering algorithms to solve to solve different clustering problems.
Implement Mean-shift for video tracking

Test questions for final assessment in this section

K-Means does not explicitly use a fitness function. What are the characteristics of the solutions that K-Means finds? Which fitness function does it implicitly minimize?
Suppose we clustered a set of N data points using two different specified clustering algorithms. In both cases we obtained 5 clusters and in both cases the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in one method be assigned to the same cluster in the other method?
What are the characterics of noise points in DBSCAN?

Section 4 Section title:

Deep Learning

Topics covered in this section:

Artificial Neural Networks
Back-propagation
Convolutional Neural Networks
Autoencoder
Variatonal Autoencoder
Generative Adversairal Networks

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1

Typical questions for ongoing performance evaluation within this section

What is a fully connected feed-forward ANN?
Explain different hyperparameters of CNNs.
Calculate KL-divergence between two probability distributions.
What is a generative model and how is it different from a discriminative model?

Typical questions for seminar classes (labs) within this section

Implement different types of ANNs to solve to solve different classification problems.
Calculate KL-divergence between two probability distributions.
Implement different generative models for different problems.

Test questions for final assessment in this section

Explain what is ReLU, what are its different variants, and what are their pros and cons?
Calculate the number of parameters to be learned during training in a CNN, given all important information.
Explain how a VAE can be used as a generative model.

Exams and retake planning

Exam

Exams will be paper-based and will be conducted in a form of problem solving, where the problems will be similar to those mentioned above and will based on the contents taught in lecture slides, lecture discussions (including white-board materials), lab materials, reading materials (including the text books), etc. Students will be given 1-3 hours to complete the exam.

Retake 1

First retake will be conducted in the same form as the final exam. The weight of the retake exam will be 5% larger than the passing threshold of the course.

Retake 2

Second retake will be conducted in the same form as the final exam. The weight of the retake exam will be 5% larger than the passing threshold of the course.

BSc: Introduction To Machine Learning.previous version

Introduction to Machine Learning

Course characteristics

Key concepts of the class

What is the purpose of this course?

Prerequisites

Course Objectives Based on Bloom’s Taxonomy

What should a student remember at the end of the course?

What should a student be able to understand at the end of the course?

What should a student be able to apply at the end of the course?

Course evaluation

Grades range

Resources and reference material

Course Sections

Section 1

Section title:

Topics covered in this section:

What forms of evaluation were used to test students’ performance in this section?

Typical questions for ongoing performance evaluation within this section

Typical questions for seminar classes (labs) within this section

Test questions for final assessment in this section

Section 2

Section title:

Topics covered in this section:

What forms of evaluation were used to test students’ performance in this section?

Typical questions for ongoing performance evaluation within this section

Typical questions for seminar classes (labs) within this section

Test questions for final assessment in this section

Section 3

Section title:

Topics covered in this section:

What forms of evaluation were used to test students’ performance in this section?

Typical questions for ongoing performance evaluation within this section

Typical questions for seminar classes (labs) within this section

Test questions for final assessment in this section

Section 4

Section title:

Topics covered in this section:

What forms of evaluation were used to test students’ performance in this section?

Typical questions for ongoing performance evaluation within this section

Typical questions for seminar classes (labs) within this section

Test questions for final assessment in this section

Exams and retake planning

Exam

Retake 1

Retake 2

Navigation menu

Search