Course material for the 2025 edition of the BioSB machine learning course
View the Project on GitHub pdmoerland/BioSB-MachineLearning-2025
A yearly course, part of the BioSB Research School
Course coordinator:
Lecturers
For more information about the course programme, please contact Perry Moerland; for more information about registration or logistics, please contact Petra Aarnoutse.
The course is aimed at PhD students with a background in bioinformatics, systems biology, computer science or a related field, and life sciences. Participants from the private sector are also welcome. A working knowledge of basic statistics and linear algebra is assumed. Preparation material on statistics and linear algebra will be distributed before the course, to be studied by students missing the required background.
After having followed this course, the student has a good understanding of a wide range of machine learning techniques and is able to recognize what method is most applicable to data analysis problems (s)he encounters in bioinformatics and systems biology applications.
Modern biology is a data-rich science, driven by our ability to measure the detailed molecular characteristics of cells, organs, and individuals at many different levels. Interpretation of these large-scale biological data requires the detection of statistical dependencies and patterns in order to establish useful models of complex biological systems. Techniques from machine learning are key in this endeavour. Examples are the visualization of single-cell RNA-seq data using dimensionality reduction methods, base calling for nanopore sequencing data using (recurrent) neural networks, and classification of high-throughput microscopy image data using convolutional neural networks.
In this one-week course, the foundations of machine learning will be laid out and commonly used methods for unsupervised (clustering, dimensionality reduction, visualization) and supervised (mainly classification) learning will be explained in detail. Methods will be illustrated using recent examples from the fields of systems biology and bioinformatics. Methods discussed in the lectures will be put into practice during the computer lab sessions. The course can optionally be extended with an assignment in which you write a 5-10 page report describing the analysis of a biological dataset using some of the methods taught in the course.
Registration closed on January 16, 2025.
The course fee includes all course material. Lecture slides, a computer lab manual and software required for the computer labs (MATLAB toolboxes) will be made available online.
Have a look at the following documents before the start of the course:
Material used during the lectures:
Material used during the computer labs:
To use the code and data, download the ZIP file, unpack everything in the same directory and run prstartup from the Matlab command prompt. If you do not have access to a Matlab campus license, install the 30-day free trial. When indicating toolboxes to install, you should at least select the Deep Learning, Optimization and Statistics and Machine Learning toolboxes.
Participants requiring a certificate of successful completion (3 ECTS) should make a final assignment. You will analyse a biological dataset (preferably one from your own practice) using some of methods taught in the course, and write a small report (5-10 pages) on the results. If you have no dataset available, one will be provided. The report will have to be mailed to Perry Moerland no later than three weeks after the course has finished (February 14, 2025). We will strictly adhere to this deadline; if you require extension, you should contact us well in advance. The proposal will be graded "fail" or "pass", with one possible resubmission. Those who choose not to make the final assignment will receive a certificate of participation (1.5 ECTS).
Course classrooms (Monday-Thursday: L0-227, Friday: J1B-223) are located in the Amsterdam UMC, location Academic Medical Center (AMC; in Dutch: Academisch Medisch Centrum), Meibergdreef 9, Amsterdam.
Detailed travel information to the AMC can be found at https://www.amsterdamumc.nl/en/location-amc/address.htm. The metro/train station is at about 10 minutes walking distance from the class rooms and can be reached in less than 30 minutes from Amsterdam Central Station and Schiphol Airport.
The course will run in the week of January 20-24 2025.
Course days will generally have the following schedule:
On Friday January 24 there will be session in which each of you has to give a short presentation (5-10 minutes) on a machine learning problem you would like to solve using methods taught in the course, preferably using your own data.
Links to the slides will be added during the course.
Monday (January 20; L0-227, AMC) - Introduction, density estimation and classification
Lecturer Perry Moerland
Subjects Introduction to machine learning. Bayesian classification. Density estimation: histograms, nearest neighbour, Parzen. Parametric classifiers: (D)LDA, (D)QDA. Nonparametric classifiers: k-NN, Parzen. Discriminant analysis: logistic regression.
Slides Background
Tuesday (January 21; L0-227, AMC) - Classification (continued) and clustering
Lecturer Perry Moerland
Subjects Decision trees and random forests. Hierarchical clustering. Agglomerative clustering. Model-based clustering: mixtures-of-Gaussians, Expectation-Maximization.
Slides
Wednesday (January 22; L0-227, AMC) - Feature selection and extraction
Lecturer Lodewyk Wessels
Subjects Feature selection: criteria, search algorithms (forward, backward, branch & bound). Feature extraction: PCA, Fisher. Embeddings: MDS, t-SNE, UMAP, ViVAE. Sparse classifiers: Ridge, LASSO.
Slides
Thursday (January 23; L0-227, AMC) - Neural networks and support vector machines
Lecturer Marcel Reinders
Subjects Artificial neural networks. Support vector machines. Classifier ensembles. Complexity and regularisation.
Slides
At 5pm there will be drinks, bites and a quiz at Miss Scarlett (nice café, 5 minutes walk from the AMC) (in collaboration with YoungCB)
Friday (January 24; J1B-223, AMC)
9.00-10.30 Marcel Reinders. Deep learning: variational autoencoders, diffusion models. Slides
10.30-10.45 break
10.45-12.00 pitches
12.00-13.00 lunch
13.00-13.30 Tim Mocking. Computational assessment of measurable residual disease in acute myeloid leukemia using mixture models. Paper
13.30-16.00 pitches
Additional tools (not required for the course, but perhaps interesting):
Some good material for further reading: