Difference between revisions of "MSc: Computer Vision"

From IU
Jump to navigation Jump to search
(Created page with "= Computer Vision = * <span>'''Course name:'''</span> Computer Vision * <span>'''Course number:'''</span> R-03 * <span>'''Area of instruction:'''</span> Computer Science and...")
 
Line 3: Line 3:
 
* <span>'''Course name:'''</span> Computer Vision
 
* <span>'''Course name:'''</span> Computer Vision
 
* <span>'''Course number:'''</span> R-03
 
* <span>'''Course number:'''</span> R-03
* <span>'''Area of instruction:'''</span> Computer Science and Engineering
 
   
  +
== Course Characteristics ==
== Administrative details ==
 
   
  +
=== Key concepts of the class ===
* <span>'''Faculty:'''</span> Computer Science and Engineering
 
* <span>'''Year of instruction:'''</span> 1st year of MSc
 
* <span>'''Semester of instruction:'''</span> 2nd semester
 
* <span>'''No. of Credits:'''</span> 5 ECTS
 
* <span>'''Total workload on average:'''</span> 180 hours overall
 
* <span>'''Frontal lecture hours:'''</span> 2 hours per week.
 
* <span>'''Frontal tutorial hours:'''</span> 0 hours per week.
 
* <span>'''Lab hours:'''</span> 2 hours per week.
 
* <span>'''Individual lab hours:'''</span> 2 hours per week.
 
* <span>'''Frequency:'''</span> weekly throughout the semester.
 
* <span>'''Grading mode:'''</span> letters: A, B, C, D.
 
   
  +
* Computer vision techniques
== Course outline ==
 
  +
* Classical and deep learning models
   
  +
=== What is the purpose of this course? ===
This course provides an intensive treatment of a cross-section of the key elements of computer vision, with an emphasis on implementing them in modern programming environments, and using them to solve real-world problems. The course will begin with the fundamentals of image processing and image filtering, but will quickly build to cover more advanced topics, including image segmentation, object detection and recognition, face detection, content-based image retrieval, artificial neural networks, convolutional neural networks, generative adversarial networks, image creation using GANs, and much more. A key focus of the course is on providing students with not only theory but also hands-on practice of building their computer vision applications.
 
   
  +
This course provides an intensive treatment of a cross-section of the key elements of computer vision, with an emphasis on implementing them in modern programming environments, and using them to solve real-world problems. The course will begin with the fundamentals of image processing and image filtering, but will quickly build to cover more advanced topics, including image segmentation, object detection and recognition, face detection, content-based image retrieval, artificial neural networks, convolutional neural networks, generative adversarial networks and much more. A key focus of the course is on providing students with not only theory but also hands-on practice of building their computer vision applications.
== Expected learning outcomes ==
 
   
  +
=== Course objectives based on Bloom’s taxonomy ===
* Apply knowledge of image acquisition, image processing, and image analysis to extract useful information from visual images.
 
* Design, implement, and document appropriate, effective, and efficient software solutions for a variety of real-world computer vision problems.
 
* Exploit standard computer vision software libraries in the development of these solutions.
 
   
  +
=== - What should a student remember at the end of the course? ===
== Required background knowledge ==
 
   
  +
By the end of the course, the students should be able to process the video
Solid knowledge of essential topics in the courses that are mentioned below.
 
   
  +
* Robots visual perception strategies
== Prerequisite courses ==
 
  +
* Significant exposure to real-world implementations
  +
* To develop research interest in the theory and application of computer vision
   
  +
=== - What should a student be able to understand at the end of the course? ===
Linear Algebra, Calculus, Probability &amp; Statistics, Computer Programming, Basic Machine Learning.
 
   
  +
By the end of the course, the students should be able to choose the correct computer vision model.
== Detailed topics covered in the course ==
 
   
  +
* Suitability of different computer vision models in different scenarios
* Introduction to Computer Vision, Image Acquisition, Basic Image Processing
 
  +
* Ability to choose the right model for the given task
* Kernels, Morphological operations, Smoothing and Blurring, Lightening and Color spaces
 
* Gradients, Edge Detection, Contours, Histograms, Labelling Connected Components
 
* Object Detection, Template Matching, Image Pyramids, Object Detection using HOG and SVM
 
* Image Classification, Common Machine Learning Algorithms for Image Classification
 
* Clustering, Bag of Visual Words, Image Pyramids, Image Classification Examples
 
* Face Detection
 
* Image Descriptors: Color channel statistics, Moments, Texture, HoGs
 
* Local features, Key point detectors, Local invariant descriptors, Binary Descriptors
 
* Video Processing: Moving object detection, background models
 
* Artificial Neural Nets and Convolutional Neural Nets
 
* Generative Adversarial Networks
 
* Case Studies: Research paper implementation and presentation
 
   
  +
=== - What should a student be able to apply at the end of the course? ===
== Textbook ==
 
   
  +
By the end of the course, the students should be able to deploy and developed models.
There is no specific text book for this course.
 
   
  +
* Hands on experience to implement different models to know inside behavior
== Reference material ==
 
  +
* Sufficient exposure to train and deploy model for the given task
  +
* Fine tune the deployed model in the real-world settings
   
  +
=== Course evaluation ===
  +
  +
{|
  +
|+ Course grade breakdown
  +
!
  +
!
  +
!align="center"| '''Proposed points'''
  +
|-
  +
| Labs/seminar classes
  +
| 20
  +
|align="center"| 20
  +
|-
  +
| Interim performance assessment
  +
| 30
  +
|align="center"| 50
  +
|-
  +
| Exams
  +
| 50
  +
|align="center"| 30
  +
|}
  +
  +
=== Grades range ===
  +
  +
{|
  +
|+ Course grading range
  +
!
  +
!
  +
!align="center"| '''Proposed range'''
  +
|-
  +
| A. Excellent
  +
| 90-100
  +
|align="center"| 90-100
  +
|-
  +
| B. Good
  +
| 75-89
  +
|align="center"| 75-89
  +
|-
  +
| C. Satisfactory
  +
| 60-74
  +
|align="center"| 60-74
  +
|-
  +
| D. Poor
  +
| 0-59
  +
|align="center"| 0-59
  +
|}
  +
  +
=== Resources and reference material ===
  +
  +
* Handouts supplied by the instructor
  +
* Materials from the interment and research papers shared by instructor
 
*
 
*
 
*
 
*
*
 
*
 
   
== Required computer resources ==
+
== Course Sections ==
  +
  +
The main sections of the course and approximate hour distribution between them is as follows:
  +
  +
{|
  +
|+ Course Sections
  +
!align="center"| '''Section'''
  +
! '''Section Title'''
  +
!align="center"| '''Teaching Hours'''
  +
|-
  +
|align="center"| 1
  +
| Image Acquisition and Basic Image Processing
  +
|align="center"| 8
  +
|-
  +
|align="center"| 2
  +
| Image Filtering and Binary Vision
  +
|align="center"| 8
  +
|-
  +
|align="center"| 3
  +
| Feature Extractors and Descriptors
  +
|align="center"| 16
  +
|-
  +
|align="center"| 4
  +
| Deep Learning models for computer vision
  +
|align="center"| 16
  +
|}
  +
  +
=== Section 1 ===
  +
  +
==== Section title: ====
  +
  +
Image Acquisition and Basic Image Processing
  +
  +
=== Topics covered in this section: ===
  +
  +
* Computer vision in action
  +
* The Human Vision System
  +
* Optical Illusions
  +
* Sampling and Quantization
  +
* Image Representation
  +
* Colour Spaces
  +
  +
=== What forms of evaluation were used to test students’ performance in this section? ===
  +
  +
<div class="tabular">
  +
  +
<span>|a|c|</span> &amp; '''Yes/No'''<br />
  +
Development of individual parts of software product code &amp; 1<br />
  +
Homework and group projects &amp; 1<br />
  +
Midterm evaluation &amp; 1<br />
  +
Testing (written or computer based) &amp; 1<br />
  +
Reports &amp; 0<br />
  +
Essays &amp; 0<br />
  +
Oral polls &amp; 0<br />
  +
Discussions &amp; 1<br />
  +
  +
  +
  +
</div>
  +
=== Typical questions for ongoing performance evaluation within this section ===
  +
  +
# What are the color spaces and where it’s used?
  +
# What are the primary and secondary colors?
  +
# How image is formed into computers?
  +
# How you will convert the RGB to grayscale images
  +
  +
=== Typical questions for seminar classes (labs) within this section ===
  +
  +
# Loading and plotting the images in python environment
  +
# Convertion of different color spaces
  +
# How you find the skin in the images based on the color space models
  +
# how to find red eye dot in face using color space models
  +
  +
=== Test questions for final assessment in this section ===
  +
  +
# How you can distinguish different color spaces?
  +
# Explain and provide the reason for the blind spot creation in human eye.
  +
# In what scenarios computer vision is better than human vision?
  +
# Write down different robotic application areas where computer vision is applied successfully.
  +
  +
=== Section 2 ===
  +
  +
==== Section title: ====
  +
  +
Image Filtering and Binary Vision
  +
  +
=== Topics covered in this section: ===
  +
  +
* Image noise
  +
* Convolutions and kernels
  +
* Smoothing and blurring
  +
* Thresholding and histograms
  +
* Morphological operations
  +
* Gradients and Edge detection
  +
  +
=== What forms of evaluation were used to test students’ performance in this section? ===
  +
  +
<div class="tabular">
  +
  +
<span>|a|c|</span> &amp; '''Yes/No'''<br />
  +
Development of individual parts of software product code &amp; 1<br />
  +
Homework and group projects &amp; 1<br />
  +
Midterm evaluation &amp; 1<br />
  +
Testing (written or computer based) &amp; 1<br />
  +
Reports &amp; 0<br />
  +
Essays &amp; 0<br />
  +
Oral polls &amp; 0<br />
  +
Discussions &amp; 1<br />
  +
  +
  +
  +
</div>
  +
=== Typical questions for ongoing performance evaluation within this section ===
  +
  +
# What are the challenges to perform histogram task?
  +
# Apply convolutional filter to calculate the response.
  +
# What kind of parameters are required to apply different image filters?
  +
# How you will compute the gradients of the image and its benefits?
  +
  +
=== Typical questions for seminar classes (labs) within this section ===
  +
  +
# Implement Otsu Method
  +
# Implement Sobel, Preweitt filters
  +
# Implement Canny edge detector
  +
# Perform analysis over the different filtering on the given images
  +
  +
=== Test questions for final assessment in this section ===
  +
  +
# Calculate the kernels for the given images
  +
# Explain the difference between different filters
  +
# What is image noise and how it contributes to make the computer vision task difficult?
  +
# Apply different combination of the filters to achieve the required output of the given image.
  +
  +
=== Section 3 ===
  +
  +
==== Section title: ====
  +
  +
Feature Extractors and Descriptors
  +
  +
==== Topics covered in this section: ====
  +
  +
* Histogram of Gradients (HoG)
  +
* Scale-invariant feature transform (SIFT)
  +
* Harris corner detector
  +
* Template matching
  +
* Bag of visual words
  +
* Face Detection and Recognition (Viola Johns)
  +
  +
=== What forms of evaluation were used to test students’ performance in this section? ===
  +
  +
<div class="tabular">
  +
  +
<span>|a|c|</span> &amp; '''Yes/No'''<br />
  +
Development of individual parts of software product code &amp; 1<br />
  +
Homework and group projects &amp; 1<br />
  +
Midterm evaluation &amp; 1<br />
  +
Testing (written or computer based) &amp; 1<br />
  +
Reports &amp; 0<br />
  +
Essays &amp; 0<br />
  +
Oral polls &amp; 0<br />
  +
Discussions &amp; 1<br />
  +
  +
  +
  +
</div>
  +
=== Typical questions for ongoing performance evaluation within this section ===
  +
  +
# How feature extractor works over the given image?
  +
# What is the difference between the feature extraction and descriptors?
  +
# Explain the examples of descriptors and feature extractors.
  +
# Write down the pros and cons of SIFT, HOG and Harris.
  +
  +
==== Typical questions for seminar classes (labs) within this section ====
  +
  +
# Implement template matching algorithm
  +
# Implement histogram of gradient using CV2 library
  +
# Implement of SIFT for the given task
  +
# Implement Harris corner detection
  +
# Analysis of different extractors for the given task
  +
  +
==== Test questions for final assessment in this section ====
  +
  +
# How you distinguish different feature extractors and descriptors?
  +
# What are the possible methods to detect the corners?
  +
# How corners are useful to help the robotic vision task?
  +
# How you will patch the different images to construct the map of the location?
  +
  +
=== Section 4 ===
  +
  +
==== Section title: ====
  +
  +
Deep learning models for computer vision
  +
  +
==== Topics covered in this section: ====
  +
  +
* You Only Look Once: Unified, Real-Time Object Detection (YOLO)
  +
* Generative Adversarial Networks (GAN)
  +
* Fully Convolutional Networks (FCN) for semantic segmentation
  +
* Multi Domain Network (MDNet) for object tracking
  +
* Generic Object Tracking Using Regression Networks (GOTURN) for object tracking
  +
  +
=== What forms of evaluation were used to test students’ performance in this section? ===
  +
  +
<div class="tabular">
  +
  +
<span>|a|c|</span> &amp; '''Yes/No'''<br />
  +
Development of individual parts of software product code &amp; 1<br />
  +
Homework and group projects &amp; 1<br />
  +
Midterm evaluation &amp; 1<br />
  +
Testing (written or computer based) &amp; 1<br />
  +
Reports &amp; 0<br />
  +
Essays &amp; 0<br />
  +
Oral polls &amp; 0<br />
  +
Discussions &amp; 1<br />
  +
  +
  +
  +
</div>
  +
=== Typical questions for ongoing performance evaluation within this section ===
  +
  +
# How classification task is different from detection task?
  +
# Explain the transfer learning mechanism for object detection task.
  +
# How many types of model exist for object tracking in videos.
  +
# Write down the pros and cons of YOLO, FCN and MDNet.
  +
  +
==== Typical questions for seminar classes (labs) within this section ====
   
  +
# Implement YOLO using transfer learning mechanism
Students should have laptops. A Mac or Window’s PC capable of running a scientific python development environment.
 
  +
# Implement GAN for MNIST dataset
  +
# Implement FCN and GOTURN
  +
# Analysis of different models for the given task
   
  +
==== Test questions for final assessment in this section ====
== Evaluation ==
 
   
  +
# What are the loss functions used in YOLO?
* Quizzes (40%)
 
  +
# What are the learnable parameters of FCN for semantic segmentation?
* Mid-term exam (15%)
 
  +
# How semantic segmentation is different from instance segmentation?
* Case Study (15%)
 
  +
# Write the application areas for object tracking in robotics.
* Final exam (30%)
 

Revision as of 14:24, 30 July 2021

Computer Vision

  • Course name: Computer Vision
  • Course number: R-03

Course Characteristics

Key concepts of the class

  • Computer vision techniques
  • Classical and deep learning models

What is the purpose of this course?

This course provides an intensive treatment of a cross-section of the key elements of computer vision, with an emphasis on implementing them in modern programming environments, and using them to solve real-world problems. The course will begin with the fundamentals of image processing and image filtering, but will quickly build to cover more advanced topics, including image segmentation, object detection and recognition, face detection, content-based image retrieval, artificial neural networks, convolutional neural networks, generative adversarial networks and much more. A key focus of the course is on providing students with not only theory but also hands-on practice of building their computer vision applications.

Course objectives based on Bloom’s taxonomy

- What should a student remember at the end of the course?

By the end of the course, the students should be able to process the video

  • Robots visual perception strategies
  • Significant exposure to real-world implementations
  • To develop research interest in the theory and application of computer vision

- What should a student be able to understand at the end of the course?

By the end of the course, the students should be able to choose the correct computer vision model.

  • Suitability of different computer vision models in different scenarios
  • Ability to choose the right model for the given task

- What should a student be able to apply at the end of the course?

By the end of the course, the students should be able to deploy and developed models.

  • Hands on experience to implement different models to know inside behavior
  • Sufficient exposure to train and deploy model for the given task
  • Fine tune the deployed model in the real-world settings

Course evaluation

Course grade breakdown
Proposed points
Labs/seminar classes 20 20
Interim performance assessment 30 50
Exams 50 30

Grades range

Course grading range
Proposed range
A. Excellent 90-100 90-100
B. Good 75-89 75-89
C. Satisfactory 60-74 60-74
D. Poor 0-59 0-59

Resources and reference material

  • Handouts supplied by the instructor
  • Materials from the interment and research papers shared by instructor

Course Sections

The main sections of the course and approximate hour distribution between them is as follows:

Course Sections
Section Section Title Teaching Hours
1 Image Acquisition and Basic Image Processing 8
2 Image Filtering and Binary Vision 8
3 Feature Extractors and Descriptors 16
4 Deep Learning models for computer vision 16

Section 1

Section title:

Image Acquisition and Basic Image Processing

Topics covered in this section:

  • Computer vision in action
  • The Human Vision System
  • Optical Illusions
  • Sampling and Quantization
  • Image Representation
  • Colour Spaces

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1


Typical questions for ongoing performance evaluation within this section

  1. What are the color spaces and where it’s used?
  2. What are the primary and secondary colors?
  3. How image is formed into computers?
  4. How you will convert the RGB to grayscale images

Typical questions for seminar classes (labs) within this section

  1. Loading and plotting the images in python environment
  2. Convertion of different color spaces
  3. How you find the skin in the images based on the color space models
  4. how to find red eye dot in face using color space models

Test questions for final assessment in this section

  1. How you can distinguish different color spaces?
  2. Explain and provide the reason for the blind spot creation in human eye.
  3. In what scenarios computer vision is better than human vision?
  4. Write down different robotic application areas where computer vision is applied successfully.

Section 2

Section title:

Image Filtering and Binary Vision

Topics covered in this section:

  • Image noise
  • Convolutions and kernels
  • Smoothing and blurring
  • Thresholding and histograms
  • Morphological operations
  • Gradients and Edge detection

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1


Typical questions for ongoing performance evaluation within this section

  1. What are the challenges to perform histogram task?
  2. Apply convolutional filter to calculate the response.
  3. What kind of parameters are required to apply different image filters?
  4. How you will compute the gradients of the image and its benefits?

Typical questions for seminar classes (labs) within this section

  1. Implement Otsu Method
  2. Implement Sobel, Preweitt filters
  3. Implement Canny edge detector
  4. Perform analysis over the different filtering on the given images

Test questions for final assessment in this section

  1. Calculate the kernels for the given images
  2. Explain the difference between different filters
  3. What is image noise and how it contributes to make the computer vision task difficult?
  4. Apply different combination of the filters to achieve the required output of the given image.

Section 3

Section title:

Feature Extractors and Descriptors

Topics covered in this section:

  • Histogram of Gradients (HoG)
  • Scale-invariant feature transform (SIFT)
  • Harris corner detector
  • Template matching
  • Bag of visual words
  • Face Detection and Recognition (Viola Johns)

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1


Typical questions for ongoing performance evaluation within this section

  1. How feature extractor works over the given image?
  2. What is the difference between the feature extraction and descriptors?
  3. Explain the examples of descriptors and feature extractors.
  4. Write down the pros and cons of SIFT, HOG and Harris.

Typical questions for seminar classes (labs) within this section

  1. Implement template matching algorithm
  2. Implement histogram of gradient using CV2 library
  3. Implement of SIFT for the given task
  4. Implement Harris corner detection
  5. Analysis of different extractors for the given task

Test questions for final assessment in this section

  1. How you distinguish different feature extractors and descriptors?
  2. What are the possible methods to detect the corners?
  3. How corners are useful to help the robotic vision task?
  4. How you will patch the different images to construct the map of the location?

Section 4

Section title:

Deep learning models for computer vision

Topics covered in this section:

  • You Only Look Once: Unified, Real-Time Object Detection (YOLO)
  • Generative Adversarial Networks (GAN)
  • Fully Convolutional Networks (FCN) for semantic segmentation
  • Multi Domain Network (MDNet) for object tracking
  • Generic Object Tracking Using Regression Networks (GOTURN) for object tracking

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 1
Homework and group projects & 1
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1


Typical questions for ongoing performance evaluation within this section

  1. How classification task is different from detection task?
  2. Explain the transfer learning mechanism for object detection task.
  3. How many types of model exist for object tracking in videos.
  4. Write down the pros and cons of YOLO, FCN and MDNet.

Typical questions for seminar classes (labs) within this section

  1. Implement YOLO using transfer learning mechanism
  2. Implement GAN for MNIST dataset
  3. Implement FCN and GOTURN
  4. Analysis of different models for the given task

Test questions for final assessment in this section

  1. What are the loss functions used in YOLO?
  2. What are the learnable parameters of FCN for semantic segmentation?
  3. How semantic segmentation is different from instance segmentation?
  4. Write the application areas for object tracking in robotics.