Empirical Methods

Course name: Empirical Methods
Course number: XYZ

Course Characteristics

Key concepts of the class

Goal-Question-Metric approach
Experimental design
Basics of statistics

What is the purpose of this course?

The main purpose of this course is to present the fundamentals of empirical methods and fundamental statistics to the future software engineers and data scientists, on one side providing the scientific fundamentals of the disciplines, and on the other anchoring the theoretical concepts on practices coming from the world of software development and engineering. As a side product, the course also refreshes the basics of statistics, providing the basis for more advanced statistical courses in the following semester(s) of study.

Prerequisites

This course will benefit from the knowledge of fundamental arithmetics and polynomial calculus, and well as knowing logarithms and exponentiation. Also this whole playlist can be helpful.

Empirical Methods: Fundamental arithmetics and polynomial calculus, logarithms Research Methods: English Advanced Statistics: Empirical Methods

Course objectives based on Bloom’s taxonomy

- What should a student remember at the end of the course?

By the end of the course, the students should be able to:

Remember the fundamentals of statistics and probability theory
Remember the basic models for experimentation and quasi-experimentation
Remember the specifics and purpose of different measurement scales
Distinguish between random variable and random process
Explain the difference between the correlation and causation

- What should a student be able to understand at the end of the course?

By the end of the course, the students should be able to understand:

the value of experimentation for software engineers and data scientists
the basic concepts of an hypothesis
the concept of correlation
the fundamental laws in statistics
the concept of Goal-Question-Metric approach

- What should a student be able to apply at the end of the course?

By the end of the course, the students should be able to ...

Apply Goal-Question-Metric approach in practice
Apply the fundamental principles of experimental design
Apply reduction to quasi-experimentation experimental design
Apply statistics and probability theory in practice
Apply hypothesis testing technique in software analysis

Course evaluation

The course has two major forms of evaluations:

a standard evaluation,
for very motivated students, an alternative form of evaluation.

The standard evaluation follows.

Course grade breakdown
		Points
Labs/seminar classes (weekly evaluations)	20 ¹
Interim performance assessment (class participation)	5
Midterm	30
Final exam	45

¹ Of which 10 class tests and 10 for lab tests. Absences from a test will trigger a 0, however, the 3 lowest grades will be disregarded from the computation of the average of this component.

The alternative evaluation follows.

Course grade breakdown. Alternative form assumes attendance to all lecture and always a grade above 95% in tests on average (minus 3).
		Points
Labs/seminar classes (weekly evaluations)	20 ¹
Interim performance assessment (class participation)	5
Midterm	5
Project	70 ²

¹ Of which 10 class tests and 10 for lab tests. Absences from a test will trigger a 0, however, the 3 lowest grades will be disregarded from the computation of the average of this component.

² Requiers a paper describing rigorously the individual experiment, the paper needs to be written incrementally in Overleaf.

In both cases each component apart from weekly reviews and tests will be assessed on a scale 0-10, where 6 is the minimum passing grade. In case of exceptional work a 10 cum laude will be assigned, with a numeric value from 10 to 13 at the discretion of the instructor. The weekly reviews component will be initially graded on a scale 0-2 weekly and then the overall grade will be assembled on a scale 0-10.

The grading, though, is not a simple linear combination of the components above. In particular:

failing any part of the evaluation will trigger a failure in the entire course,
if there are not failing components, the final grade will be computed as a weighted average of the components above approximated at the highest second digit and then rounded to the closest integer.

Retakes

Retakes will be run as comprehensive oral exam, where the student will be assessed the acquired knowledge coming from the textbooks, the lectures, the labs, and the additional required reading material, as supplied by the instructor. During such comprehensive oral the student could be asked to solve exercises and to explain theoretical and practical aspects of the course.

Grades range

Course grading range
	Range
A. Excellent	95-100
B. Good	75-94
C. Satisfactory	55-74
D. Poor	0-54

Resources and reference material

Donald T. Campbell and Julian C. Stanley. Experimental and Quasi-Experimental Designs for Research. Rand McNally College Publishing, 1963
Larry Wasserman. All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics. Springer, New York, 2004. ISBN 978-1-4419-2322-6. doi: 10.1007/978-0-387-21736-9
Oliver Laitenberger and Dieter Rombach. Lecture Notes on Empirical Software Engineering. chapter (Quasi) Experimental Studies in Industrial Settings, pages 167–227. World Scientific Publishing Co., Inc., River Edge, NJ, USA, 2003. ISBN 981-02-4914-4
Rini van Solingen and Egon Berghout. The Goal/Question/Metric Method: a practical guide for quality improvement of software development. The McGraw-Hill Companies, Cambridge, England, 1999. ISBN 077-709553-7.
Andrea Janes and Giancarlo Succi. Lean Software Development in Action. Springer, Heidelberg, Germany, 2014. ISBN 978-3-662-44178-7. doi: 10.1007/978-3-642-00503-9

Course Sections

The main sections of the course and approximate hour distribution between them is as follows:

**Course Sections**
Section	Section Title	Teaching Hours
1	Concept of Hypothesis Testing and Experimentation	12
2	Fundamentals of statistics	24

Section 1

Section title: Concept of measuring

Topics covered in this section:

Measurement: concept, definition and fundamentals of measurement
Goal-Question-Metric approach
Representational theory of measurement
Measurement scales and functions that can be applied to scales
Experimental designs

What forms of evaluation were used to test students’ performance in this section?

	Yes/No
Development of individual parts of software product code	0
Homework and group projects	0
Midterm evaluation	1
Testing (written or computer based)	1
Reports	0
Essays	0
Oral polls	0
Discussions	1

Typical questions for ongoing performance evaluation within this section

What are the phases of GQM? How are they connected to each other? What are steps of GQM method?
What does SWOT mean?
What is the measurement?
How measurement can help us to understand, control and improve development process??
What does Representation Condition mean?
What is the Measurement Scale?
What are characteristics of a good measurement? What is the difference between validity and reliability?

Typical questions for seminar classes (labs) within this section

Which benefits the GQM provides to you as a Software Engineer / Data Scientist?
Imagine your goal is to ”increase availability of some software system”. Provide Questions and Metrics for this goal.
What is measurement Reliability and measurement Validity? What are the differences between the two? Provide an example of reliable, but invalid measurement and an example of valid, but unreliable measurement
Which Measurement Scales do you know? What are the differences between them? Provide examples for each of them.
Provide an example of Representation Condition
Create metrics that measures your study progress, outline the properties of such metrics in terms of subjective vs. objective, direct vs. indirect, etc; detail how you will collect your metrics, concretely and check your metric on reliability & validity

Section 2

Section title: Fundamentals of statistics

Topics covered in this section:

Basic concepts of probability theory
Random variable and random process
Linear regression
Correlation and convolution
Moments and moment generating functions
Law of Large Numbers
Central Limit Theorem
Hypothesis testing

What forms of evaluation were used to test students’ performance in this section?

	Yes/No
Development of individual parts of software product code	0
Homework and group projects	0
Midterm evaluation	1
Testing (written or computer based)	1
Reports	0
Essays	0
Oral polls	0
Discussions	1

Typical questions for ongoing performance evaluation within this section

Describe the three approaches to probability.
Write the fundamental theorem of algebra.
Write the general structure of the OLS equation for one variable.
Write the general structure of the OLS equation for the case of multiple independent variables.
What is the connection between the correlation coefficient and the coefficient of determination? What does each of them show?
State and prove the Law of Large Numbers
State and prove the Central Limit Theorem
Explain what are H0 and H1 in hypothesis testing
Explain the role of the Bonferroni inequality in hypothesis testing

Typical questions for seminar classes (labs) within this section

Define and provide examples of sample space, events and probability measure
Write the formula for the coefficients of the simple linear regression. Explain the mathematical procedure you do to derive them and derive them
Fully deduce the value of the coefficient in OLS equation for multiple independent variables
Calculate the correlation between two functions and explain its meaning
Calculate the Pearson coefficient for the given functions
Write and prove Markov’s inequality. Write and prove Chebyshev’s inequality. How these theorems related to the LLN?
Deduce the MGF for normal distribution
Provide a concrete example of a test, detailing both H0 and H1
State and prove the Bonferroni inequality

Test questions for final assessment in the course

State the Fenton Measurement theory and explain what is Representation condition?
Define the steps needed to elaborate a GQM
Describe the Taylor
Describe the fundamental theorem of Algebra
Based on the concept the Taylor theorem and the fundamental theorem of Algebra, explain whether the number of datapoints should be equal, smaller or larger than the number of independent variables (features) and why
Explain how the GQM can be used to define appropriate number of variables
For the given function compute its mean, mode, median, standard deviation
Define the OLS linear regression in the case of one and multiple variables and deduce their parameters
For the given two functions compute their Covariance, Pearson’s correlation coefficient and describe results
State and prove weak and strong formulation of LLN
State Lindeberg–Lévy formulation of CLT and prove it
Compute LOC, MCC, FI, FO for given code. Describe how to apply MCC metrics and meaning of FI-FO output
For the given module compute the CK metrics for its classes

MSc: Applied statistics and experiments in science and engineering

Contents

Empirical Methods

Course Characteristics

Key concepts of the class

What is the purpose of this course?

Prerequisites

Course objectives based on Bloom’s taxonomy

- What should a student remember at the end of the course?

- What should a student be able to understand at the end of the course?

- What should a student be able to apply at the end of the course?

Course evaluation

Retakes

Grades range

Resources and reference material

Course Sections

Section 1

Section title: Concept of measuring

Topics covered in this section:

What forms of evaluation were used to test students’ performance in this section?

Typical questions for ongoing performance evaluation within this section

Typical questions for seminar classes (labs) within this section

Section 2

Section title: Fundamentals of statistics

Topics covered in this section:

What forms of evaluation were used to test students’ performance in this section?

Typical questions for ongoing performance evaluation within this section

Typical questions for seminar classes (labs) within this section

Test questions for final assessment in the course

Navigation menu

Search