MSc: Advanced Statistics
Advanced Statistics
- Course name: Advanced Statistics
- Code discipline: DS-03
- Subject area:
Short Description
This course in advanced statistics with a view toward applications in data sciences. It is intended for masters students who are looking to expand their knowledge of theoretical methods used in modern research in data sciences. The course presents some of the key probabilistic methods and results that may form an essential mathematical toolbox for a data scientist. This course places particular emphasis on random vectors, random matrices, and random projections. It teaches basic theoretical skills for the analysis of these objects, which include concentration inequalities, covering and packing arguments, decoupling and symmetrization tricks, chaining and comparison techniques for stochastic processes, combinatorial reasoning based on the VC dimension, and a lot more.
Prerequisites
Prerequisite subjects
- CSE329 - Empirical Methods
Prerequisite topics
Course Topics
Section | Topics within the section |
---|---|
Concentration of sums of independent random variables |
|
Intended Learning Outcomes (ILOs)
What is the main purpose of this course?
The main purpose of this course is to present the fundamentals of inferential statistics to the future software engineers and data scientists, on one side providing the scientific fundamentals of the disciplines, and on the other anchoring the theoretical concepts on practices coming from the world of software development and engineering. The course covers the statistical analysis of data with limited assumptions on the distribution, with reference to testing hypotheses, measuring correlations, building samples, and performing regressions.
ILOs defined at three levels
Level 1: What concepts should a student know/remember/explain?
By the end of the course, the students should be able to ...
- Remember the fundamentals of inferential statistics
- Remember the specifics and purpose of different hypothesis tests
- Distinguish between parametric and non parametric tests
Level 2: What basic practical skills should a student be able to perform?
By the end of the course, the students should be able to ...
- the basic concepts of inferential statistics
- the fundamental laws in statistics
- the concept of null and alternative hypotheses
- the hypotheses test procedure
Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?
By the end of the course, the students should be able to ...
- To understand the problems related to analyse statistically data not distributed normally
- To know the more recent computationally-intensive techniques that can help to describe samples and to infer properties of populations in absence of normality
- To identify situations when the data is on nominal scales so alternative techniques should be use, and act accordingly.
- To be able to run experiment to evaluate hypotheses for situation of scarce data, distributed non normally, on different kinds of scales.
Grading
Course grading range
Grade | Range | Description of performance |
---|---|---|
A. Excellent | 85-100 | - |
B. Good | 65-84 | - |
C. Satisfactory | 51-64 | - |
D. Poor | 0-50 | - |
There are four constraints for passing this course:
- You must attend all labs.
- You must submit all lab reports.
- You must have at least 50% on the Final Exam.
- You must have at least 50% overall.
Course activities and grading breakdown
Activity Type | Percentage of the overall course grade |
---|---|
Quiz during each lecture (weekly evaluations) | 15 |
Labs classes (weekly evaluations) | 15 |
Midterm | 20 |
Final exam | 50 |
Recommendations for students on how to succeed in the course
- Watch the video lecture and read the lecture notes before coming to the onsite lectures and to the labs.
- Attend the onsite lectures
- Ask questions and provide answers to the questions during the onsite lectures.
- Attend all of the labs and submit all of the lab reports.
- Prepare seriously for the midterm exam.
- Prepare seriously for the final exam.
Resources, literature and reference materials
Open access resources
- The lecture notes and the video lectures provided via Moodle are sufficient for passing this course with grade A.
Software and tools used within the course
- You can use any software by your choice to perform the lab tasks.
Teaching Methodology: Methods, techniques, & activities
Activities and Teaching Methods
Learning Activities | Section 1 |
---|---|
Testing (written or computer based) | 1 |
Discussions | 1 |
Formative Assessment and Course Activities
Ongoing performance assessment
Section 1
Activity Type | Content | Is Graded? |
---|---|---|
Question | Let X1,X2, ...,X10 be a random sample from a distribution whose probability density function is , otherwise 0). Based on the observed values 0.62, 0.36, 0.23, 0.76, 0.65, 0.09, 0.55, 0.26, 0.38, 0.24, test the hypothesis H0 : X UNIF(0, 1) against H1 : X UNIF(0, 1) at a significance level = 0.1. | 1 |
Question | If X1,X2, ...,Xn is a random sample from a distribution with density function , otherwise 0), what is the maximum likelihood estimator of ? | 1 |
Question | Let X1,X2, ...,Xn be a random sample of size n from a distribution with a probability density function otherwise 0), where is a parameter. Using the maximum likelihood method find an estimator for the parameter . | 1 |
Question | Suppose you are told that the likelihood of at is given by 1/4. Is this the probability that ? Explain why or why not. | 1 |
Question | If X1,X2, ...,Xn is a random sample from a distribution with density function otherwise 0), then what is the maximum likelihood estimator of ? | 0 |
Question | Let X1,X2, ...,Xn be a random sample from a normal population with mean and variance . What are the maximum likelihood estimators of and ? | 0 |
Question | Suppose that you have the following data points: 0.36, 0.32, 0.10, 0.13, 0.45, 0.11, 0.12, 0.09; compute Dn to determine if they come from the uniform distribution [0,0.5]. | 0 |
Question | The data on the heights of 12 infants are given below: 18.2, 21.4, 22.6, 17.4, 17.6, 16.7, 17.1, 21.4, 20.1, 17.9, 16.8, 23.1. Test the hypothesis that the data came from some normal population at a significance level = 0.1. | 0 |
Final assessment
The final assessment is in a written form. You mast have at least 50% on the final exam to pass the course.
The retake exam
The retake of the exam will be in oral form.