MSc:MetricsAndEmpiricalMethods

From IU
Jump to navigation Jump to search

Metrics and Empirical Methods for Software Engineers and Data Scientists

  • Course name: Metrics and Empirical Methods for Software Engineers and Data Scientists
  • Course number: XYZ

Course Characteristics

Key concepts of the class

  • Goal-Question-Metric approach
  • Basics of statistics
  • Measurement and metrics in software development

What is the purpose of this course?

The main purpose of this course is to present the fundamentals of metrics and empirical methods to the future software engineers and data scientists, on one side providing the scientific fundamentals of the disciplines, and on the other anchoring the theoretical concepts on practices coming from the world of software development and engineering. As a side product, the course also refreshes the fundamentals of statistics, providing the basis for more advanced statistical courses in the following semester(s) of study.

Course objectives based on Bloom’s taxonomy

- What should a student remember at the end of the course?

By the end of the course, the students should be able to:

  • Remember the fundamentals of statistics and probability theory
  • Remember the specifics and purpose of different measurement scales
  • Distinguish between random variable and random process
  • Explain the difference between the correlation and causation
  • Remember existing object-oriented, functional and quality metrics

- What should a student be able to understand at the end of the course?

By the end of the course, the students should be able to understand:

  • the value of measurement for software engineers and data scientists
  • the basic concepts of measurement
  • the concept of correlation
  • the fundamental laws in statistics
  • the concept of Goal-Question-Metric approach

- What should a student be able to apply at the end of the course?

By the end of the course, the students should be able to ...

  • Apply Goal-Question-Metric approach in practice
  • Apply statistics and probability theory in practice
  • Apply hypothesis testing technique in software analysis
  • Analyze software development process using fundamental size and complexity measures
  • Apply static code analysis using object oriented metrics

Course evaluation

Course grade breakdown
Proposed points
Weekly quizzes ? 20
Personal GQM ? 20
Midterm ? 20
Final exam ? 40

Grades range

Course grading range
Proposed range
A. Excellent 90-100 95-100
B. Good 75-89 75-94
C. Satisfactory 60-74 55-74
D. Poor 0-59 0-54

Resources and reference material

  • Norman Fenton and Shari Lawrence Pfleeger. Software Metrics: A Rigorous and Practical Approach. International Thomson Computer Press, London, UK, 2nd edition, 1997
  • Donald T. Campbell and Julian C. Stanley. Experimental and Quasi-Experimental Designs for Research. Rand McNally College Publishing, 1963
  • Larry Wasserman. All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics. Springer, New York, 2004. ISBN 978-1-4419-2322-6. doi: 10.1007/978-0-387-21736-9
  • Oliver Laitenberger and Dieter Rombach. Lecture Notes on Empirical Software Engineering. chapter (Quasi) Experimental Studies in Industrial Settings, pages 167–227. World Scientific Publishing Co., Inc., River Edge, NJ, USA, 2003. ISBN 981-02-4914-4
  • Rini van Solingen and Egon Berghout. The Goal/Question/Metric Method: a practical guide for quality improvement of software development. The McGraw-Hill Companies, Cambridge, England, 1999. ISBN 077-709553-7.
  • Andrea Janes and Giancarlo Succi. Lean Software Development in Action. Springer, Heidelberg, Germany, 2014. ISBN 978-3-662-44178-7. doi: 10.1007/978-3-642-00503-9

Course Sections

The main sections of the course and approximate hour distribution between them is as follows:

Course Sections
Section Section Title Teaching Hours
1 Concept of measuring 12
2 Fundamentals of statistics 24
3 Measurement in software development 12

Section 1

Section title:

Concept of measuring

Topics covered in this section:

  • Measurement: concept, definition and fundamentals of measurement
  • Goal-Question-Metric approach
  • Representational theory of measurement
  • Measurement scales and functions that can be applied to scales

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 0
Homework and group projects & 0
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1


Typical questions for ongoing performance evaluation within this section

  1. What are the phases of GQM? How are they connected to each other? What are steps of GQM method?
  2. What does SWOT mean?
  3. What is the measurement?
  4. How measurement can help us to understand, control and improve development process??
  5. What does Representation Condition mean?
  6. What is the Measurement Scale?
  7. What are characteristics of a good measurement? What is the difference between validity and reliability?

Typical questions for seminar classes (labs) within this section

  1. Which benefits the GQM provides to you as a Software Engineer / Data Scientist?
  2. Imagine your goal is to ”increase availability of some software system”. Provide Questions and Metrics for this goal.
  3. What is measurement Reliability and measurement Validity? What are the differences between the two? Provide an example of reliable, but invalid measurement and an example of valid, but unreliable measurement
  4. Which Measurement Scales do you know? What are the differences between them? Provide examples for each of them.
  5. Provide an example of Representation Condition
  6. Create metrics that measures your study progress, outline the properties of such metrics in terms of subjective vs. objective, direct vs. indirect, etc; detail how you will collect your metrics, concretely and check your metric on reliability & validity

Test questions for final assessment in this section

Section 2

Section title:

Fundamentals of statistics

Topics covered in this section:

  • Basic concepts of probability theory
  • Random variable and random process
  • Linear regression
  • Correlation and convolution
  • Moments and moment generating functions
  • Law of Large Numbers
  • Central Limit Theorem
  • Hypothesis testing

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 0
Homework and group projects & 0
Midterm evaluation & 1
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1


Typical questions for ongoing performance evaluation within this section

  1. Describe the three approaches to probability.
  2. Write the fundamental theorem of algebra.
  3. Write the general structure of the OLS equation for one variable.
  4. Write the general structure of the OLS equation for the case of multiple independent variables.
  5. What is the connection between the correlation coefficient and the coefficient of determination? What does each of them show?
  6. State and prove the Law of Large Numbers
  7. State and prove the Central Limit Theorem
  8. Explain what are H0 and H1 in hypothesis testing
  9. Explain the role of the Bonferroni inequality in hypothesis testing

Typical questions for seminar classes (labs) within this section

  1. Define and provide examples of sample space, events and probability measure
  2. Write the formula for the coefficients of the simple linear regression. Explain the mathematical procedure you do to derive them and derive them
  3. Fully deduce the value of the coefficient in OLS equation for multiple independent variables
  4. Calculate the correlation between two functions and explain its meaning
  5. Calculate the Pearson coefficient for the given functions
  6. Write and prove Markov’s inequality. Write and prove Chebyshev’s inequality. How these theorems related to the LLN?
  7. Deduce the MGF for normal distribution
  8. Provide a concrete example of a test, detailing both H0 and H1
  9. State and prove the Bonferroni inequality

Test questions for final assessment in this section

Section 3

Section title:

Measurement in software development

Topics covered in this section:

  • Fundamental software measures of size and complexity
  • Halstead metrics
  • Object-oriented metrics
  • Function points
  • Quality metrics

What forms of evaluation were used to test students’ performance in this section?

|a|c| & Yes/No
Development of individual parts of software product code & 0
Homework and group projects & 0
Midterm evaluation & 0
Testing (written or computer based) & 1
Reports & 0
Essays & 0
Oral polls & 0
Discussions & 1


Typical questions for ongoing performance evaluation within this section

  1. What are advantages and disadvantages of LOC measure?
  2. How we can measure the complexity of the given program module?
  3. What is Halstead metrics, their strength and weaknesses?
  4. What is Chidamber & Kemerer metrics suite?
  5. What is the purpose of high cohesion and low coupling in object-oriented programming?
  6. What are function points and functional size measurement?
  7. What is Mk II functional point analysis?
  8. Which activities are included in the effort estimation of a software project?
  9. What is product and process quality metrics? What is the difference between them?

Typical questions for seminar classes (labs) within this section

  1. For the given example of code compute its LOC, FAN IN and FAN OUT metrics
  2. Analyze cyclomatic complexity of the given code using 4 different approaches? Which one is the easiest for you and why?
  3. For the given example of code compute its Halstead metrics
  4. For the given example of code compute its CK metrics?
  5. Describe the benefits of Function Point Analysis.
  6. Describe the difference between Mk II FPA and Albrecht’s FPA.

Test questions for final assessment in the course

  1. State the Fenton Measurement theory and explain what is Representation condition?
  2. Define the steps needed to elaborate a GQM
  3. Describe the Taylor
  4. Describe the fundamental theorem of Algebra
  5. Based on the concept the Taylor theorem and the fundamental theorem of Algebra, explain whether the number of datapoints should be equal, smaller or larger than the number of independent variables (features) and why
  6. Explain how the GQM can be used to define appropriate number of variables
  7. For the given function compute its mean, mode, median, standard deviation
  8. Define the OLS linear regression in the case of one and multiple variables and deduce their parameters
  9. For the given two functions compute their Covariance, Pearson’s correlation coefficient and describe results
  10. State and prove weak and strong formulation of LLN
  11. State Lindeberg–Lévy formulation of CLT and prove it
  12. Compute LOC, MCC, FI, FO for given code. Describe how to apply MCC metrics and meaning of FI-FO output
  13. For the given module compute the CK metrics for its classes