02-752
Advanced Algorithms for Biological Problems
Spring 2026
Course Schedule and Syllabus

Instructor: Yun William Yu, Associate Professor of Computational Biology
TAs: Aidan Zhang & Terry Ma
Classtime: Monday/Wednesday 5-6:30pm in POS 153

Open in new tab

Jump to contact chart   Jump to weekly schedule

Syllabus

Course description

This course will focus on advanced algorithms and algorithm design techniques for classes of algorithmic problems that frequently arise in biological contexts. The course presents both provably optimal algorithms as well as practical heuristics. Students will learn both proof techniques and procedures for empirical assessment of algorithmic performance. Algorithmic families that will be presented include efficient algorithms for manipulation of strings, graphs, trees, images, and other abstractions of biological data. Modern algorithmic techniques such as succinct data structures, amortized and probabilistic analysis of algorithms, sketching, and convex optimization will be applied to solving large-scale biological problems. Some tentative topics include networks for systems biology, game theory for modeling cancer and epidemiology, string algorithms for genomics, search indexes for biological databases, and phylogenetic trees, but topics will shift in response to developments in the field.

Audience

Our audience is students who want to learn algorithmic thinking and algorithmic topics that are relevant to modern biological analysis. We assume that many students will have seen classical bioinformatics algorithms for well established problems such as pairwise sequence alignment, muliple sequence alignment, sequence motifs, distance-based tree reconstruction (e.g. neighbor joining), sequence assembly, etc. A much smaller percentage will be familiar with recent algorithmic advances required for emerging computational problems associated with new biological assays and very large data sets.

Prerequisites

Equivalent of 15-210 ("Parallel & Sequential Data Structures and Algorithms") or 15-351 ("Algorithms and Advanced Data Structures).

This is NOT an "introductory" course. It is a Ph.D.-level course that focuses on the subset of algorithms that are of particular importance in biological applications for computer scientists. You may find it helpful to look over the topics covered in 15-351.

Objective

The objective of this course is to ensure that all CPCB students have:

Upon completion of this course, you should have:

Course Delivery

All lectures will be entirely in-person; there will NOT be any recordings made available. We will generally try to make pre-lecture notes from the instructor available, and also point to relevant readings for each lecture.

Online questions and discussions will be conducted via Piazza. General questions should be public to the rest of the class, but you may use private notes to communicate with just the teaching staff. As there are 3 of us on the teaching staff, please use private notes on Piazza instead of emailing us, as this way we can ensure that all of us get the message.

Reference texts

There is no textbook for the course. Each week, relevant papers will be posted. Algorithmic textbooks such as CLRS, Gusfield's Algorithms on Strings, Trees, and Sequences, Makinen et al.'s Genome-Scale Algorithm Design, and others may be helpful.


Assessment

Coursework will consist of (group) quizzes, participation, and two oral interviews (exams):

Each week will cover a different biological problem and a selection of the algorithms that can be employed to solve it. On lecture days, I will give a lecture going over the basics of the problem and relevant algorithms/data structures. You will then be assigned several papers to read in advance of the quiz/discussion day. On quiz/discussion days, class will start with a quiz assuming you have learned both the lecture material and carefully worked through the assigned papers. Then, there will be an in-class discussion/debate.

Group quizzes.
Quiz problems may ask you to develop an algorithm, prove the correctness of an algorithm, prove the running time of an algorithm, or compare and contrast several algorithms. To make the weekly quizzes a little less stressful, I will assign each of you to a pseudorandom group. Within your group, you may freely discuss the quiz and even turn in a joint response with all your names on it. If you disagree with your groups response, you may dissent and submit something different. You are allowed to write a partial dissent. Solutions should always be as short, clear, and concise as possible. We will be taking marks off for long, meandering solutions to otherwise short problems, even if all of the reasoning is technically correct. Brevity is the soul of wit.

Problems that ask for an algorithm should present: a clear English description, an argument that the algorithm is correct, and an analysis of the running time. Note: your goal is to explain the algorithm to a human, not a computer—as such, detailed pseudocode or source code is usually not the best way to explain an algorithm. Do not use pseudocode to obscure your answer.

Regrade requests should be made via Piazza within 1 week of the quiz being returned. Keep in mind that a regrade is exactly that, and there's not only no guarantee your score will go up, but it may go down if we spot additional errors while regrading. For this reason, regrade requests must have the consent of a strict majority of the group.

Participation.
For the in-class discussion/debate, we will announce the topics the lecture beforehand. However, you will not be assigned your position until the day of. You should come to class prepared to argue any of the possible positions. The format will begin with smaller group discussions (groups assigned by instructor). After the group discussions though, some fraction of you will be randomly chosen to be discussion leaders, representing your group's positions to the whole class.

Half of your participation score will come from actively participating in the small group discussions. This will be assessed by your peers on a simple rubric. To preserve anonymity, you will only be given averaged scores every several weeks, rather than every week. The other half of your participation score will come from the times you are randomly chosen to be a discussion leader, which will be evaluated by the instructor for how cogently you are able to defend your position and engage with the other discussion leaders on their positions.

Oral Interviews/Exams.
Once midway through the semester and once during the final exam period, the instructor will personally interview you. These exams will be 30 minutes each. During the interviews, the instructor will ask you questions similar to the ones from the quizzes, or perhaps even questions highlighting your responses on those quizzes. You should think of these as reminiscent of either a technical interview or an oral qualifying exam.

Class attendance.
On lecture days, there is no attendance or participation requirement, and no quizzes. You may choose to attend or not at your pleasure.

On the other hand, on discussion days, we do expect you to be present. However, we understand that you may have unavoidable conflicts during the semester (e.g. illness, family emergencies, conferences). Please let us know via private Piazza post as soon as you know about these conflicts, so that we know not to assign you as discussion leader on those days. So long as you are present for at least 75% of the discussion days, you will not receive any absence penalty on the participation scores.

Make-up quiz / quiz drops
Similarly, in keeping with Universal Design principles, we have built into the syllabus flexibility with respect to the quizzes. Any time before turning in the quiz, or if you happen to be ansent on the day of the quiz, you may choose to make your quiz take-home for an automatic 30 percentage point penalty. This choice must be made on an individual basis (not as part of the group). If you elect to make the quiz take-home, you must turn in the quiz by the following class. As a take-home quiz, you are allowed to use any and all resources (even including generative AI, classmates, etc.), but you must still handwrite your answers, using pen or pencil, on the quiz booklet provided.

Second, you may, at your discretion, drop up to three quiz scores. Each of these quiz scores will be replaced with your percentage score on the next oral exam. Please let us know via private Piazza post of any quiz drops before the exam, so that I can plan your exam accordingly.

Academic integrity

Group quizzes and oral exams are closed book. You may not use any outside resources of any kind. This will be strictly enforced.

See the university's policy on academic integrity. In part it reads "unauthorized assistance refers to the use of sources of support that have not been specifically authorized in this policy statement or by the course instructor(s) in the completion of academic work to be graded. Such sources of support may include but are not limited to advice or help provided by another individual, published or unpublished written sources, and electronic sources." You should be familiar with the policy in its entirety.

On the other hand, you are encouraged to use all resources available to you in preparation for the quizzes. Please do discuss the relevant algorithms and papers with your friends, your colleagues, or even with Gemini/ChatGPT. The goal is for you to understand the material for the quiz, and any resources that help you do that are welcome. Similarly, if you choose to take the 30 percentage point penalty and do a take-home quiz, you are also encouraged to use all resources available.

Accommodations for students with disabilities

If you have a disability and have an accommodations letter from the Disability Resources office, we encourage you to discuss your accommodations and needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, we encourage you to contact them at access@andrew.cmu.edu.

Religious conflicts:

Absences for religious observances must be submitted by email to the instructor during the first two weeks of the semester.

Frequently Asked Questions

Is there some extra work I can do to improve my grade?

No, we cannot make exceptions to the course work and grading policy. If you are concerned about your grade, please see me or one of the TAs ASAP. There will be no exceptions to this policy during or after the class has completed. It is our duty to ensure that the class is fair for everyone.

Final note on class difficulty and struggling

If you've made it this far in the syllabus, I salute you. I did want to add a personal note though, which is that you should be aware that this is a hard class. We cover a lot of advanced material fast, and y'all come from a variety of backgrounds. Students who come from a biology background may find it necessary to do extra reading to get up to speed. So, don't be discouraged if you find yourself struggling in parts of the class; that's expected, and after all, it's my job to challenge you to learn cool new things. Having said that, good luck and have fun! -Prof. Yu

Contact information

Office Hours:

Canvas: https://canvas.cmu.edu/courses/52631

Piazza: https://piazza.com/cmu/spring2026/02752a

We strongly encourage contacting course staff via Piazza (you may post to instructors only); we cannot guarantee a timely response for any other medium.

If you have an issue that you wish to raise to just the professor (for example, a complaint about a TA), you may email directly to ywyu@cmu.edu or yuny@andrew.cmu.edu. However, any other general course-related correspondence should be via Piazza.

 

Schedule


Here is the rough schedule of lectures, relevant papers, as well as my prelecture notes as available.

"Weeks" are typically Monday/Wednesday, but occasionally will be three classes due to university holidays.

First half

Week 1: Read mapping

Lectures:
  • Mon, 2026-01-12: Lecture notes. Hash functions, hash tables, FM-index.
  • Wed, 2026-01-14: QuizQuiz. Readings - BWA, Bowtie 2, SNAP.
2026-01-19: MLK Day: No Classes

Week 2: Genome Assembly

  • Wed, 2026-01-21: Lecture. Overlap-Layout-Consensus. Reading - Celera Asembler.
  • Mon, 2026-01-26: Lecture. De Bruijn Graphs, String graphs. Reading - Velvet.
  • Wed, 2026-01-28: Quiz. Readings - LJA, HiCanu, HiFiASM.

Week 3: Metagenomics similarity / set sketching

  • Mon, 2026-02-02: Lecture. HLL, MinHash, Bloom filters.
  • Wed, 2026-02-04: Quiz. Readings - Mash, Dashing 2, sequence bloom filters.

Week 4: Genome comparison / string sketching / probabilistic analysis

  • Mon, 2026-02-09: Lecture. Minimizers, FracMinHash, Average-case analysis.
  • Wed, 2026-02-11: Quiz. Readings - Minimizer Jaccard bias, Skani, Irber FracMinHash paper.

Week 5: Gene expression quantification

  • Mon, 2026-02-16: Lecture.
  • Wed, 2026-02-18: Quiz.

Week 6: Biological network analysis

  • Mon, 2026-02-23: Lecture.
  • Wed, 2026-02-25: Quiz.

Oral Exams for First Half scheduled 1-on-1

Second half: tentative topics subject to change

2026-03-02 - 2026-03-06: Spring break: No Classes

Week 7: Clustering data and phylogenies

  • Mon, 2026-03-09: Lecture.
  • Wed, 2026-03-11: Quiz.

Week 8: Cell fate trajectories and Markov chains

  • Mon, 2026-03-16: Lecture.
  • Wed, 2026-03-18: Quiz.

Week 9: Evolutionary algorithms and game theory

  • Mon, 2026-03-23: Lecture.
  • Wed, 2026-03-25: Quiz.

Week 10: Optimization algorithms

  • Mon, 2026-03-30: Lecture.
  • Wed, 2026-04-01: Quiz.

Week 11: Molecular mechanics

  • Mon, 2026-04-06: Lecture.
  • Wed, 2026-04-08: Quiz.

Week 12: Dimensionality reduction

  • Mon, 2026-04-13: Lecture.
  • Wed, 2026-04-15: Quiz.

Week 13: Image analysis

  • Mon, 2026-04-20: Lecture.
  • Wed, 2026-04-22: Quiz.

Oral Exams scheduled 1-on-1 during finals period

Icons made by xnimrodx from www.flaticon.com.