Difference between revisions of "IU:TestPage"
Jump to navigation
Jump to search
R.sirgalina (talk | contribs) |
R.sirgalina (talk | contribs) |
||
Line 1: | Line 1: | ||
+ | = Information Retrieval = |
||
− | = IT Product Development = |
||
− | * '''Course name''': |
+ | * '''Course name''': Information Retrieval |
− | * '''Code discipline''': |
+ | * '''Code discipline''': XYZ |
+ | * '''Subject area''': Data Science; Computer systems organization; Information systems; Real-time systems; Information retrieval; World Wide Web |
||
− | * '''Subject area''': Software Engineering |
||
== Short Description == |
== Short Description == |
||
Line 11: | Line 11: | ||
=== Prerequisite subjects === |
=== Prerequisite subjects === |
||
+ | * CSE204 — Analytic Geometry And Linear Algebra II: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue. |
||
− | |||
+ | * CSE113 — Philosophy I - (Discrete Math and Logic): graphs, trees, binary trees, balanced trees, metric (proximity) graphs, diameter, clique, path, shortest path. |
||
=== Prerequisite topics === |
=== Prerequisite topics === |
||
Line 22: | Line 23: | ||
! Section !! Topics within the section |
! Section !! Topics within the section |
||
|- |
|- |
||
− | | |
+ | | Information retrieval basics || |
− | # Introduction to |
+ | # Introduction to IR, major concepts. |
+ | # Crawling and Web. |
||
− | # Exploring the domain: User Research and Customer Conversations |
||
+ | # Quality assessment. |
||
− | # Documenting Requirements: MVP and App Features |
||
+ | |- |
||
− | # Prototyping and usability testing |
||
+ | | Text processing and indexing || |
||
+ | # Building inverted index for text documents. Boolean retrieval model. |
||
+ | # Language, tokenization, stemming, searching, scoring. |
||
+ | # Spellchecking and wildcard search. |
||
+ | # Suggest and query expansion. |
||
+ | # Language modelling. Topic modelling. |
||
|- |
|- |
||
− | | |
+ | | Vector model and vector indexing || |
+ | # Vector model |
||
− | # Product backlog and iterative development |
||
+ | # Machine learning for vector embedding |
||
− | # Estimation Techniques, Acceptance Criteria, and Definition of Done |
||
+ | # Vector-based index structures |
||
− | # UX/UI Design |
||
− | # Software Engineering vs Product Management |
||
|- |
|- |
||
+ | | Advanced topics. Media processing || |
||
− | | Hypothesis-driven development || |
||
+ | # Image and video processing, understanding and indexing |
||
− | # Hypothesis-driven product development |
||
+ | # Content-based image retrieval |
||
− | # Measuring a product |
||
+ | # Audio retrieval |
||
− | # Controlled Experiments and A/B testing |
||
+ | # Hum to search |
||
+ | # Relevance feedback |
||
|} |
|} |
||
== Intended Learning Outcomes (ILOs) == |
== Intended Learning Outcomes (ILOs) == |
||
Line 48: | Line 56: | ||
==== Level 1: What concepts should a student know/remember/explain? ==== |
==== Level 1: What concepts should a student know/remember/explain? ==== |
||
By the end of the course, the students should be able to ... |
By the end of the course, the students should be able to ... |
||
+ | * Search engine and recommender system essential parts, |
||
− | * Explain what are the main principles for building an effective customer conversation |
||
+ | * Quality metrics of information retrieval systems, |
||
− | * Describe various classification of prototypes and where each one is applied |
||
+ | * Contemporary approaches to semantic data analysis, |
||
− | * State the characteristics of a DEEP product backlog |
||
+ | * Indexing strategies. |
||
− | * Elaborate on the main principles of an effective UI/UX product design (hierarchy, navigation, color, discoverability, understandability) |
||
− | * List the key commonalities and differences between the mentality of a software engineer and a product manager |
||
− | * Explain what is hypothesis-driven development |
||
− | * Describe the important aspects and elements of a controlled experiment |
||
==== Level 2: What basic practical skills should a student be able to perform? ==== |
==== Level 2: What basic practical skills should a student be able to perform? ==== |
||
By the end of the course, the students should be able to ... |
By the end of the course, the students should be able to ... |
||
+ | * How to design a recommender system from scratch, |
||
− | * Design effective customer conversations |
||
+ | * How to evaluate quality of a particular information retrieval system, |
||
− | * Prototype UI, design and conduct usability tests |
||
+ | * Core ideas and system implementation and maintenance, |
||
− | * Prototype user interface |
||
+ | * How to identify and fix information retrieval system problems. |
||
− | * Design and conduct usability testing |
||
− | * Populate and groom a product backlog |
||
− | * Conduct Sprint Planning and Review |
||
− | * Choose product metrics and apply GQM |
||
− | * Integrate a third-party Analytics tools |
||
− | * Design, run and conclude Controlled experiments |
||
==== Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? ==== |
==== Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios? ==== |
||
By the end of the course, the students should be able to ... |
By the end of the course, the students should be able to ... |
||
+ | * Implement a proper index for an unstructured dataset, |
||
− | |||
+ | * Plan quality measures for a new recommender service, |
||
+ | * Run initial data analysis and problem evaluation for a business task, related to information retrieval. |
||
== Grading == |
== Grading == |
||
Line 79: | Line 81: | ||
! Grade !! Range !! Description of performance |
! Grade !! Range !! Description of performance |
||
|- |
|- |
||
− | | A. Excellent || |
+ | | A. Excellent || 84-100 || - |
|- |
|- |
||
− | | B. Good || |
+ | | B. Good || 72-83 || - |
|- |
|- |
||
− | | C. Satisfactory || 60- |
+ | | C. Satisfactory || 60-71 || - |
|- |
|- |
||
− | | D. |
+ | | D. Poor || 0-59 || - |
|} |
|} |
||
Line 94: | Line 96: | ||
! Activity Type !! Percentage of the overall course grade |
! Activity Type !! Percentage of the overall course grade |
||
|- |
|- |
||
− | | |
+ | | Labs/seminar classes || 35 |
|- |
|- |
||
+ | | Interim performance assessment || 70 |
||
− | | Quizzes || 15 |
||
|- |
|- |
||
− | | |
+ | | Exams || 0 |
− | |- |
||
− | | Demo day || 20 |
||
|} |
|} |
||
Line 109: | Line 109: | ||
=== Open access resources === |
=== Open access resources === |
||
+ | * Manning, Raghavan, Schütze, An Introduction to Information Retrieval, 2008, Cambridge University Press |
||
− | |||
+ | * Baeza-Yates, Ribeiro-Neto, Modern Information Retrieval, 2011, Addison-Wesley |
||
+ | * Buttcher, Clarke, Cormack, Information Retrieval: Implementing and Evaluating Search Engines, 2010, MIT Press |
||
=== Closed access resources === |
=== Closed access resources === |
||
Line 115: | Line 117: | ||
=== Software and tools used within the course === |
=== Software and tools used within the course === |
||
+ | |||
− | * Firebase Analytics and A/B Testing, |
||
= Teaching Methodology: Methods, techniques, & activities = |
= Teaching Methodology: Methods, techniques, & activities = |
||
== Activities and Teaching Methods == |
== Activities and Teaching Methods == |
||
{| class="wikitable" |
{| class="wikitable" |
||
− | |+ |
+ | |+ Activities within each section |
|- |
|- |
||
− | ! |
+ | ! Learning Activities !! Section 1 !! Section 2 !! Section 3 !! Section 4 |
|- |
|- |
||
− | | |
+ | | Development of individual parts of software product code || 1 || 1 || 1 || 1 |
|- |
|- |
||
− | | |
+ | | Homework and group projects || 1 || 1 || 1 || 1 |
|- |
|- |
||
+ | | Testing (written or computer based) || 1 || 1 || 1 || 1 |
||
− | | Differentiated learning (provide tasks and activities at several levels of difficulty to fit students needs and level) || 1 || 1 || 1 |
||
− | | |
+ | |} |
+ | == Formative Assessment and Course Activities == |
||
− | | развивающего обучения (задания и материал "прокачивают" ещё нераскрытые возможности студентов); || 1 || 1 || 1 |
||
+ | |||
− | |- |
||
+ | === Ongoing performance assessment === |
||
− | | концентрированного обучения (занятия по одной большой теме логически объединяются); || 1 || 1 || 1 |
||
+ | |||
− | |- |
||
+ | ==== Section 1 ==== |
||
− | | inquiry-based learning || 1 || 1 || 1 |
||
− | |} |
||
{| class="wikitable" |
{| class="wikitable" |
||
+ | |+ |
||
− | |+ Activities within each section |
||
|- |
|- |
||
+ | ! Activity Type !! Content !! Is Graded? |
||
− | ! Learning Activities !! Section 1 !! Section 2 !! Section 3 |
||
|- |
|- |
||
− | | |
+ | | Question || Enumerate limitations for web crawling. || 1 |
|- |
|- |
||
+ | | Question || Propose a strategy for A/B testing. || 1 |
||
− | | Interactive Lectures || 1 || 1 || 1 |
||
|- |
|- |
||
− | | |
+ | | Question || Propose recommender quality metric. || 1 |
|- |
|- |
||
+ | | Question || Implement DCG metric. || 1 |
||
− | | Development of individual parts of software product code || 1 || 1 || 1 |
||
|- |
|- |
||
− | | |
+ | | Question || Discuss relevance metric. || 1 |
|- |
|- |
||
+ | | Question || Crawl website with respect to robots.txt. || 1 |
||
− | | Quizzes (written or computer based) || 1 || 1 || 1 |
||
|- |
|- |
||
+ | | Question || What is typical IR system architecture? || 0 |
||
− | | Peer Review || 1 || 1 || 1 |
||
|- |
|- |
||
− | | |
+ | | Question || Show how to parse a dynamic web page. || 0 |
|- |
|- |
||
+ | | Question || Provide a framework to accept/reject A/B testing results. || 0 |
||
− | | Presentations by students || 1 || 1 || 1 |
||
|- |
|- |
||
+ | | Question || Compute DCG for an example query for random search engine. || 0 |
||
− | | Written reports || 1 || 1 || 1 |
||
|- |
|- |
||
+ | | Question || Implement a metric for a recommender system. || 0 |
||
− | | Experiments || 0 || 0 || 1 |
||
+ | |- |
||
+ | | Question || Implement pFound. || 0 |
||
|} |
|} |
||
+ | ==== Section 2 ==== |
||
− | == Formative Assessment and Course Activities == |
||
− | |||
− | === Ongoing performance assessment === |
||
− | |||
− | ==== Section 1 ==== |
||
{| class="wikitable" |
{| class="wikitable" |
||
|+ |
|+ |
||
Line 173: | Line 172: | ||
! Activity Type !! Content !! Is Graded? |
! Activity Type !! Content !! Is Graded? |
||
|- |
|- |
||
+ | | Question || Build inverted index for a text. || 1 |
||
− | | Quiz || 1. What is a product? What are the techniques for describing a product idea in a clear concise manner? <br> 2. What user research techniques do you know? In what situations are they applied? <br> 3. What are the key customer conversation principles according to the Mom Test technique? Bring an example of bad and good questions to ask. <br> 4. What are the 4 phases of the requirements engineering process? <br> 5. How do we document requirements? What techniques do you know? || 1 |
||
|- |
|- |
||
+ | | Question || Tokenize a text. || 1 |
||
− | | Presentation || Prepare a short 2-minutes pitch for your project idea (2-5 slides). <br> <br> Suggested structure: <br> What problem you are solving: <br> - State the problem clearly in 2-3 short sentences. <br> <br> Who are you solving it for: <br> - Who is your user/customer? <br> - Why will they be attracted to it? <br> <br> What is your proposed solution to solve that problem: <br> - One sentence description <br> - What main feature(s) will it have? || 0 |
||
|- |
|- |
||
+ | | Question || Implement simple spellchecker. || 1 |
||
− | | Individual Assignments || A1: Product Ideation and Market Research <br> Formulate 3 project ideas in the following format: <br> X helps Y to do Z – where X is your product’s name, Y is the target user, and Z is what user activity product help with. <br> <br> Submit Link to Screenshot board and Feature Analysis Table: <br> - Pick and explore 5 apps similar to your idea <br> - Take screenshots along the way and collect them on a board. <br> - Make a qualitative analysis table for app features. <br> <br> Prepare a short 2-minutes pitch for your project idea (2-5 slides). <br> <br> Suggested structure: <br> What problem you are solving: <br> - State the problem clearly in 2-3 short sentences. <br> <br> Who are you solving it for: <br> - Who is your user/customer? <br> - Why will they be attracted to it? <br> <br> What is your proposed solution to solve that problem: <br> - One sentence description <br> - What main feature(s) will it have? || 1 |
||
|- |
|- |
||
+ | | Question || Implement wildcard search. || 1 |
||
− | | Group Project Work || A2: Forming Teams and Identifying Stakeholders <br> Students are distributed into teams. <br> Meet your team <br> Discuss the idea <br> Agree on the roles <br> Setup task tracker (Trello or similar) <br> Identify 3-5 stakeholders and how to approach them <br> Compose a set of 5 most important questions you would ask from each stakeholder when interviewing them <br> <br> Submit <br> A pdf with the idea description, roles distribution among the team, identified stakeholders, ways to approach them, a set of questions for each stakeholder. <br> An invite link to join your task tracker <br> <br> A3: Domain Exploration and Requirements <br> User Research Process: <br> Compose the questionnaire for each stakeholder type. <br> Talk to 5-7 stakeholders. <br> Keep updating the questionnaire throughout the process <br> Compose an interview results table <br> Produce personas <br> Summarize most important learning points <br> Describe features your MVP will have (use case diagram + user story mapping) <br> <br> Submit a pdf report with: <br> Personas + corresponding questionnaires <br> Interview results table (can provide a link to spreadsheet, make sure to open access) <br> Learning points summary <br> MVP features. <br> <br> Optional: <br> Start implementation of the functionality you are certain about. <br> <br> Assignment 4. UI design, Prototyping, MVP, and Usability Testing <br> Break down MVP features into phases and cut down the specification to implement MVP V1 <br> Produce low and high fidelity designs for your product. <br> Review the phases breakdown. <br> Follow either the Prototyping or MVP path to complete the assignment. <br> <br> Prototyping path: <br> Make a clickable prototype with Figma or a similar tool <br> Make 5-10 offline stakeholders use your prototype, observe them and gather feedback <br> Embed your prototype into an online usability testing tool (e.g. Maze). <br> Run an online usability test with 5-10 online stakeholders. <br> Summarize key learning points <br> <br> MVP path: <br> Review your MVP phases. <br> Build MVP V1 <br> Make 5-10 offline stakeholders use your MVP, observe them and gather feedback <br> Integrate an online usability testing tool to observe user sessions (e.g. Smartlook). <br> Distribute the MVP to 5-10 online stakeholders and run an online usability test. <br> Summarize key learning points <br> <br> <br> Submit all of the below in one PDF: <br> Link to sketches and designs. <br> Link to your MVP/Clickable prototype. <br> Link to online usability test. <br> Names of people you conducted the tests with and which stakeholder type are they. <br> Key learning points summary. <br> <br> Make sure all links are accessible/viewable. || 1 |
||
+ | |- |
||
+ | | Question || Build inverted index for a set of web pages. || 0 |
||
+ | |- |
||
+ | | Question || build a distribution of stems/lexemes for a text. || 0 |
||
+ | |- |
||
+ | | Question || Choose and implement case-insensitive index for a given text collection. || 0 |
||
+ | |- |
||
+ | | Question || Choose and implement semantic vector-based index for a given text collection. || 0 |
||
|} |
|} |
||
− | ==== Section |
+ | ==== Section 3 ==== |
{| class="wikitable" |
{| class="wikitable" |
||
|+ |
|+ |
||
Line 187: | Line 194: | ||
! Activity Type !! Content !! Is Graded? |
! Activity Type !! Content !! Is Graded? |
||
|- |
|- |
||
+ | | Question || Embed the text with an ML model. || 1 |
||
− | | Quiz || 1. What does the acronym MVP stand for? What types of MVP do you know of? <br> 2. Define roles, activities, and artefacts of Scrum. What differentiates Scrum from other Agile frameworks, e.g. Kanban? <br> 3. What does DEEP criteria stand for when discussing Product Backlog? Explain each of the aspects with examples. <br> 4. Describe how Scrum activities are performed. Which of them are essential and which of them can vary depending on the product. || 1 |
||
|- |
|- |
||
+ | | Question || Build term-document matrix. || 1 |
||
− | | Presentation || Prepare a 5-mins presentation describing your: <br> product backlog <br> sprint results <br> MVP-launch plan <br> Each team will present at the class. The assessment will be based on the presentation delivery, reasoning for decision making and asking questions and providing suggestions for other teams. || 0 |
||
|- |
|- |
||
+ | | Question || Build semantic index for a dataset using Annoy. || 1 |
||
− | | Group Project Work || Assignment 5. Launching an MVP <br> 1. Populate and groom product backlog: <br> Comply with the DEEP criteria. <br> 2. Run two one-week sprints: <br> Conduct two Sprint plannings, i.e. pick the tasks for Sprint Backlog. <br> Conduct two Sprint reviews <br> Run one Sprint Retrospective <br> 3. Make a launch plan and release: <br> You need to launch in the following two weeks. <br> Decide what functionality will go into the release. <br> Release your first version in Google Play. <br> Hint: Focus on a small set of features solving a specific problem for a specific user, i.e. MVP. <br> 4. Prepare a 5-mins presentation describing your: <br> product backlog <br> sprint results <br> MVP-launch plan. <br> Demo for your launched MVP. <br> Each team will present at the class. The assessment will be based on the presentation delivery, reasoning for decision making and asking questions and providing suggestions for other teams. <br> 5. Submit a PDF with: <br> Backlogs and Launch plan <br> Link to the launched product <br> Assignment 6. AC, DoD and Midterm Presentation <br> 1. Produce acceptance criteria for 3-5 most important user stories in your product. <br> 2. Produce definition of done checklist <br> 3. Estimate the items in your product backlog <br> 4. Prepare a midterm presentation for 10-mins in which you cover: <br> The problem you are trying to solve <br> Your users and customers (personas) <br> Your solution and it's core value proposition <br> Current state of your product <br> Clear plan for the upcoming weeks <br> Your team and distribution of responsibilities <br> Demo <br> Retrospective and learning points <br> Link to your app <br> <br> Submit a pdf with: <br> Items 1, 2, 3 <br> link to the presentation <br> || 1 |
||
+ | |- |
||
+ | | Question || Build kd-tree index for a given dataset. || 1 |
||
+ | |- |
||
+ | | Question || Why kd-trees work badly in 100-dimensional environment? || 1 |
||
+ | |- |
||
+ | | Question || What is the difference between metric space and vector space? || 1 |
||
+ | |- |
||
+ | | Question || Choose and implement persistent index for a given text collection. || 0 |
||
+ | |- |
||
+ | | Question || Visualize a dataset for text classification. || 0 |
||
+ | |- |
||
+ | | Question || Build (H)NSW index for a dataset. || 0 |
||
+ | |- |
||
+ | | Question || Compare HNSW to Annoy index. || 0 |
||
+ | |- |
||
+ | | Question || What are metric space index structures you know? || 0 |
||
|} |
|} |
||
− | ==== Section |
+ | ==== Section 4 ==== |
{| class="wikitable" |
{| class="wikitable" |
||
|+ |
|+ |
||
Line 199: | Line 222: | ||
! Activity Type !! Content !! Is Graded? |
! Activity Type !! Content !! Is Graded? |
||
|- |
|- |
||
+ | | Question || Extract semantic information from images. || 1 |
||
− | | Quiz || 1. What are common product hypotheses present? How can we formulate them as questions about our UX? <br> 2. Explain what is hypothesis-driven development <br> 3. Describe the important aspects and elements of a controlled experiment || 1 |
||
|- |
|- |
||
+ | | Question || Build an image hash. || 1 |
||
− | | Presentation || Prepare a short 2-minutes pitch for your project idea (2-5 slides). <br> <br> Suggested structure: <br> What problem you are solving: <br> - State the problem clearly in 2-3 short sentences. <br> <br> Who are you solving it for: <br> - Who is your user/customer? <br> - Why will they be attracted to it? <br> <br> What is your proposed solution to solve that problem: <br> - One sentence description <br> - What main feature(s) will it have? || 0 |
||
|- |
|- |
||
+ | | Question || Build a spectral representation of a song. || 1 |
||
− | | Group project work || Assignment 7: Development, Observation, and Product Events. <br> 1. Continue with your development process: <br> - Hold sprint planning and reviews. <br> - Revisit estimations and keep track for velocity calculation. <br> - Host demos and release new versions to your users <br> <br> 2. Observing users: <br> - Integrate a user sessions recording tool into your product <br> - As a team: watch 100 user sessions and outline common user behavior patterns. <br> - Each team member: give product to 3 new people and observe them use it. <br> <br> 3. Product events: <br> Create a product events table. <br> Integrate a free analytics tool that supports events reporting (e.g. Amplitude, MixPanel). <br> <br> Write and submit a report: <br> - describe user behavior patterns (main ways how people use your product). <br> - learning points from the observations <br> - add the events table. <br> - describe which analytics tool you chose and why <br> <br> Assignment 8: GQM, Metrics, and Hypothesis-testing. <br> 1. GQM and Metrics Dashboard <br> - Compose a GQM for your product. <br> - Identify your focus and L1 metrics <br> - Setup an Analytics Dashboard with the metrics you chose. <br> - Add the instructors to your Analytics Dashboard. <br> <br> Hypothesis-testing: <br> - answer clarity and hypotheses: do users understand your product, is it easy for them to get started, and do they return? <br> - suggest product improvements to increase clarity, ease of starting and retention. <br> - based on the suggestions formulate 3 falsifiable hypotheses <br> - design a simple test to check each of them <br> - pick one test that could be conducted by observing your users <br> - conduct the test <br> <br> Submit: <br> - GQM, Focus and L1 Metrics breakdown. <br> - Report on the hypothesis-testing activities <br> - Access link to the dashboard. <br> Assignment 9: Running an A/B test <br> Compose an A/B test: <br> - Design a change in your product <br> - Hypothesis: Clearly state what you expect to improve as the result of the change. <br> - Parameter and Variants: Describe both A and B variants (and other if you have more). <br> - Intended sample size. <br> - OEC: Determine the target metric to run the experiment against. <br> <br> Then do one of the two options: <br> Option 1: Conduct the A/B test using a remote control and A/B testing tool (Firebase, Optimizely or like) <br> <br> Option 2: Do the statistical math yourself <br> Conduct an A/B test and collect data. <br> Do the math manually using the standard Student T-test. <br> <br> Submit a PDF with: <br> - the A/B test description <br> - report on how the experiment went. <br> - either screenshots from the tool or math calculations. || 1 |
||
+ | |- |
||
+ | | Question || Whats is relevance feedback? || 1 |
||
+ | |- |
||
+ | | Question || Build a "search by color" feature. || 0 |
||
+ | |- |
||
+ | | Question || Extract scenes from video. || 0 |
||
+ | |- |
||
+ | | Question || Write a voice-controlled search. || 0 |
||
+ | |- |
||
+ | | Question || Semantic search within unlabelled image dataset. || 0 |
||
|} |
|} |
||
=== Final assessment === |
=== Final assessment === |
||
'''Section 1''' |
'''Section 1''' |
||
+ | # Implement text crawler for a news site. |
||
− | |||
+ | # What is SBS (side-by-side) and how is it used in search engines? |
||
+ | # Compare pFound with CTR and with DCG. |
||
+ | # Explain how A/B testing works. |
||
+ | # Describe PageRank algorithm. |
||
'''Section 2''' |
'''Section 2''' |
||
+ | # Explain how (and why) KD-trees work. |
||
− | |||
+ | # What are weak places of inverted index? |
||
+ | # Compare different text vectorization approaches. |
||
+ | # Compare tolerant retrieval to spellchecking. |
||
'''Section 3''' |
'''Section 3''' |
||
+ | # Compare inverted index to HNSW in terms of speed, memory consumption? |
||
− | |||
+ | # Choose the best index for a given dataset. |
||
+ | # Implement range search in KD-tree. |
||
+ | '''Section 4''' |
||
+ | # What are the approaches to image understanding? |
||
+ | # How to cluster a video into scenes and shots? |
||
+ | # How speech-to-text technology works? |
||
=== The retake exam === |
=== The retake exam === |
||
Line 219: | Line 265: | ||
'''Section 3''' |
'''Section 3''' |
||
+ | |||
+ | '''Section 4''' |
Revision as of 14:19, 9 November 2022
Information Retrieval
- Course name: Information Retrieval
- Code discipline: XYZ
- Subject area: Data Science; Computer systems organization; Information systems; Real-time systems; Information retrieval; World Wide Web
Short Description
Prerequisites
Prerequisite subjects
- CSE204 — Analytic Geometry And Linear Algebra II: matrix multiplication, matrix decomposition (SVD, ALS) and approximation (matrix norm), sparse matrix, stability of solution (decomposition), vector spaces, metric spaces, manifold, eigenvector and eigenvalue.
- CSE113 — Philosophy I - (Discrete Math and Logic): graphs, trees, binary trees, balanced trees, metric (proximity) graphs, diameter, clique, path, shortest path.
Prerequisite topics
Course Topics
Section | Topics within the section |
---|---|
Information retrieval basics |
|
Text processing and indexing |
|
Vector model and vector indexing |
|
Advanced topics. Media processing |
|
Intended Learning Outcomes (ILOs)
What is the main purpose of this course?
ILOs defined at three levels
Level 1: What concepts should a student know/remember/explain?
By the end of the course, the students should be able to ...
- Search engine and recommender system essential parts,
- Quality metrics of information retrieval systems,
- Contemporary approaches to semantic data analysis,
- Indexing strategies.
Level 2: What basic practical skills should a student be able to perform?
By the end of the course, the students should be able to ...
- How to design a recommender system from scratch,
- How to evaluate quality of a particular information retrieval system,
- Core ideas and system implementation and maintenance,
- How to identify and fix information retrieval system problems.
Level 3: What complex comprehensive skills should a student be able to apply in real-life scenarios?
By the end of the course, the students should be able to ...
- Implement a proper index for an unstructured dataset,
- Plan quality measures for a new recommender service,
- Run initial data analysis and problem evaluation for a business task, related to information retrieval.
Grading
Course grading range
Grade | Range | Description of performance |
---|---|---|
A. Excellent | 84-100 | - |
B. Good | 72-83 | - |
C. Satisfactory | 60-71 | - |
D. Poor | 0-59 | - |
Course activities and grading breakdown
Activity Type | Percentage of the overall course grade |
---|---|
Labs/seminar classes | 35 |
Interim performance assessment | 70 |
Exams | 0 |
Recommendations for students on how to succeed in the course
Resources, literature and reference materials
Open access resources
- Manning, Raghavan, Schütze, An Introduction to Information Retrieval, 2008, Cambridge University Press
- Baeza-Yates, Ribeiro-Neto, Modern Information Retrieval, 2011, Addison-Wesley
- Buttcher, Clarke, Cormack, Information Retrieval: Implementing and Evaluating Search Engines, 2010, MIT Press
Closed access resources
Software and tools used within the course
Teaching Methodology: Methods, techniques, & activities
Activities and Teaching Methods
Learning Activities | Section 1 | Section 2 | Section 3 | Section 4 |
---|---|---|---|---|
Development of individual parts of software product code | 1 | 1 | 1 | 1 |
Homework and group projects | 1 | 1 | 1 | 1 |
Testing (written or computer based) | 1 | 1 | 1 | 1 |
Formative Assessment and Course Activities
Ongoing performance assessment
Section 1
Activity Type | Content | Is Graded? |
---|---|---|
Question | Enumerate limitations for web crawling. | 1 |
Question | Propose a strategy for A/B testing. | 1 |
Question | Propose recommender quality metric. | 1 |
Question | Implement DCG metric. | 1 |
Question | Discuss relevance metric. | 1 |
Question | Crawl website with respect to robots.txt. | 1 |
Question | What is typical IR system architecture? | 0 |
Question | Show how to parse a dynamic web page. | 0 |
Question | Provide a framework to accept/reject A/B testing results. | 0 |
Question | Compute DCG for an example query for random search engine. | 0 |
Question | Implement a metric for a recommender system. | 0 |
Question | Implement pFound. | 0 |
Section 2
Activity Type | Content | Is Graded? |
---|---|---|
Question | Build inverted index for a text. | 1 |
Question | Tokenize a text. | 1 |
Question | Implement simple spellchecker. | 1 |
Question | Implement wildcard search. | 1 |
Question | Build inverted index for a set of web pages. | 0 |
Question | build a distribution of stems/lexemes for a text. | 0 |
Question | Choose and implement case-insensitive index for a given text collection. | 0 |
Question | Choose and implement semantic vector-based index for a given text collection. | 0 |
Section 3
Activity Type | Content | Is Graded? |
---|---|---|
Question | Embed the text with an ML model. | 1 |
Question | Build term-document matrix. | 1 |
Question | Build semantic index for a dataset using Annoy. | 1 |
Question | Build kd-tree index for a given dataset. | 1 |
Question | Why kd-trees work badly in 100-dimensional environment? | 1 |
Question | What is the difference between metric space and vector space? | 1 |
Question | Choose and implement persistent index for a given text collection. | 0 |
Question | Visualize a dataset for text classification. | 0 |
Question | Build (H)NSW index for a dataset. | 0 |
Question | Compare HNSW to Annoy index. | 0 |
Question | What are metric space index structures you know? | 0 |
Section 4
Activity Type | Content | Is Graded? |
---|---|---|
Question | Extract semantic information from images. | 1 |
Question | Build an image hash. | 1 |
Question | Build a spectral representation of a song. | 1 |
Question | Whats is relevance feedback? | 1 |
Question | Build a "search by color" feature. | 0 |
Question | Extract scenes from video. | 0 |
Question | Write a voice-controlled search. | 0 |
Question | Semantic search within unlabelled image dataset. | 0 |
Final assessment
Section 1
- Implement text crawler for a news site.
- What is SBS (side-by-side) and how is it used in search engines?
- Compare pFound with CTR and with DCG.
- Explain how A/B testing works.
- Describe PageRank algorithm.
Section 2
- Explain how (and why) KD-trees work.
- What are weak places of inverted index?
- Compare different text vectorization approaches.
- Compare tolerant retrieval to spellchecking.
Section 3
- Compare inverted index to HNSW in terms of speed, memory consumption?
- Choose the best index for a given dataset.
- Implement range search in KD-tree.
Section 4
- What are the approaches to image understanding?
- How to cluster a video into scenes and shots?
- How speech-to-text technology works?
The retake exam
Section 1
Section 2
Section 3
Section 4