BSc: Introduction To Machine Learning
Introduction to Machine Learning
- Course name: Introduction to Machine Learning
- Code discipline: R-01
- Subject area:
Short Description
This course covers the following concepts: Machine learning paradigms; Machine Learning approaches, and algorithms.
Prerequisites
Prerequisite subjects
- CSE202 — Analytical Geometry and Linear Algebra I
- CSE204 — Analytical Geometry and Linear Algebra II
- CSE201 — Mathematical Analysis I
- CSE203 — Mathematical Analysis II
- CSE206 — Probability And Statistics
- CSE117 — Data Structures and Algorithms: python, numpy, basic object-oriented concepts, memory management.
Prerequisite topics
Course Topics
Section | Topics within the section |
---|---|
Supervised Learning |
|
Decision Trees and Ensemble Methods |
|
Unsupervised Learning |
|
Deep Learning |
|
Intended Learning Outcomes (ILOs)
What is the main purpose of this course?
There is a growing business need of individuals skilled in artificial intelligence, data analytics, and machine learning. Therefore, the purpose of this course is to provide students with an intensive treatment of a cross-section of the key elements of machine learning, with an emphasis on implementing them in modern programming environments, and using them to solve real-world data science problems.
ILOs defined at three levels
Level 1: What concepts should a student know/remember/explain?
By the end of the course, the students should be able to ...
- Different learning paradigms
- A wide variety of learning approaches and algorithms
- Various learning settings
- Performance metrics
- Popular machine learning software tools
Level 2: What basic practical skills should a student be able to perform?
By the end of the course, the students should be able to ...
- Difference between different learning paradigms
- Difference between classification and regression
- Concept of learning theory (bias/variance tradeoffs and large margins etc.)
- Kernel methods
- Regularization
- Ensemble Learning
- Neural or Deep Learning
Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?
By the end of the course, the students should be able to ...
- Classification approaches to solve supervised learning problems
- Clustering approaches to solve unsupervised learning problems
- Ensemble learning to improve a model’s performance
- Regularization to improve a model’s generalization
- Deep learning algorithms to solve real-world problems
Grading
Course grading range
Grade | Range | Description of performance |
---|---|---|
A. Excellent | 90-100 | - |
B. Good | 75-89 | - |
C. Satisfactory | 60-74 | - |
D. Poor | 0-59 | - |
Course activities and grading breakdown
Activity Type | Percentage of the overall course grade |
---|---|
Labs/seminar classes | 0 |
Interim performance assessment | 40 |
Exams | 60 |
Recommendations for students on how to succeed in the course
Resources, literature and reference materials
Open access resources
- T. Hastie, R. Tibshirani, D. Witten and G. James. An Introduction to Statistical Learning. Springer 2013.
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer 2011.
- Tom M Mitchel. Machine Learning, McGraw Hill
- Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer
Closed access resources
Software and tools used within the course
Teaching Methodology: Methods, techniques, & activities
Activities and Teaching Methods
Learning Activities | Section 1 | Section 2 | Section 3 | Section 4 |
---|---|---|---|---|
Development of individual parts of software product code | 1 | 1 | 1 | 1 |
Homework and group projects | 1 | 1 | 1 | 1 |
Midterm evaluation | 1 | 1 | 1 | 1 |
Testing (written or computer based) | 1 | 1 | 1 | 1 |
Discussions | 1 | 1 | 1 | 1 |
Formative Assessment and Course Activities
Ongoing performance assessment
Section 1
Activity Type | Content | Is Graded? |
---|---|---|
Question | Is it true that in simple linear regression and the squared correlation between X and Y are identical? | 1 |
Question | What are the two assumptions that the Linear regression model makes about the Error Terms? | 1 |
Question | Fit a regression model to a given data problem, and support your choice of the model. | 1 |
Question | In a list of given tasks, choose which are regression and which are classification tasks. | 1 |
Question | In a given graphical model of binary random variables, how many parameters are needed to define the Conditional Probability Distributions for this Bayes Net? | 1 |
Question | Write the mathematical form of the minimization objective of Rosenblatt’s perceptron learning algorithm for a two-dimensional case. | 1 |
Question | What is perceptron learning algorithm? | 1 |
Question | Write the mathematical form of its minimization objective for a two-dimensional case. | 1 |
Question | What is a max-margin classifier? | 1 |
Question | Explain the role of slack variable in SVM. | 1 |
Question | How to implement various regression models to solve different regression problems? | 0 |
Question | Describe the difference between different types of regression models, their pros and cons, etc. | 0 |
Question | Implement various classification models to solve different classification problems. | 0 |
Question | Describe the difference between Logistic regression and naive bayes. | 0 |
Question | Implement perceptron learning algorithm, SVMs, and its variants to solve different classification problems. | 0 |
Question | Solve a given optimization problem using the Lagrange multiplier method. | 0 |
Section 2
Activity Type | Content | Is Graded? |
---|---|---|
Question | What are pros and cons of decision trees over other classification models? | 1 |
Question | Explain how tree-pruning works. | 1 |
Question | What is the purpose of ensemble learning? | 1 |
Question | What is a bootstrap, and what is its role in Ensemble learning? | 1 |
Question | Explain the role of slack variable in SVM. | 1 |
Question | Implement different variants of decision trees to solve different classification problems. | 0 |
Question | Solve a given classification problem problem using an ensemble classifier. | 0 |
Question | Implement Adaboost for a given problem. | 0 |
Section 3
Activity Type | Content | Is Graded? |
---|---|---|
Question | Which implicit or explicit objective function does K-means implement? | 1 |
Question | Explain the difference between k-means and k-means++. | 1 |
Question | Whaat is single-linkage and what are its pros and cons? | 1 |
Question | Explain how DBSCAN works. | 1 |
Question | Implement different clustering algorithms to solve to solve different clustering problems. | 0 |
Question | Implement Mean-shift for video tracking | 0 |
Section 4
Activity Type | Content | Is Graded? |
---|---|---|
Question | What is a fully connected feed-forward ANN? | 1 |
Question | Explain different hyperparameters of CNNs. | 1 |
Question | Calculate KL-divergence between two probability distributions. | 1 |
Question | What is a generative model and how is it different from a discriminative model? | 1 |
Question | Implement different types of ANNs to solve to solve different classification problems. | 0 |
Question | Calculate KL-divergence between two probability distributions. | 0 |
Question | Implement different generative models for different problems. | 0 |
Final assessment
Section 1
- What does it mean for the standard least squares coefficient estimates of linear regression to be scale equivariant?
- Given a fitted regression model to a dataset, interpret its coefficients.
- Explain which regression model would be a better fit to model the relationship between response and predictor in a given data.
- If the number of training examples goes to infinity, how will it affect the bias and variance of a classification model?
- Given a two dimensional classification problem, determine if by using Logistic regression and regularization, a linear boundary can be estimated or not.
- Explain which classification model would be a better fit to for a given classification problem.
- Consider the Leave-one-out-CV error of standard two-class SVM. Argue that under a given value of slack variable, a given mathematical statement is either correct or incorrect.
- How does the choice of slack variable affect the bias-variance tradeoff in SVM?
- Explain which Kernel would be a better fit to be used in SVM for a given data.
Section 2
- When a decision tree is grown to full depth, how does it affect tree’s bias and variance, and its response to noisy data?
- Argue if an ensemble model would be a better choice for a given classification problem or not.
- Given a particular iteration of boosting and other important information, calculate the weights of the Adaboost classifier.
Section 3
- K-Means does not explicitly use a fitness function. What are the characteristics of the solutions that K-Means finds? Which fitness function does it implicitly minimize?
- Suppose we clustered a set of N data points using two different specified clustering algorithms. In both cases we obtained 5 clusters and in both cases the centers of the clusters are exactly the same. Can 3 points that are assigned to different clusters in one method be assigned to the same cluster in the other method?
- What are the characterics of noise points in DBSCAN?
Section 4
- Explain what is ReLU, what are its different variants, and what are their pros and cons?
- Calculate the number of parameters to be learned during training in a CNN, given all important information.
- Explain how a VAE can be used as a generative model.
The retake exam
Section 1
Section 2
Section 3
Section 4