# Overview of Statistical Machine Learning

Director: Nic Schraudolph (SML, NICTA and adjunct with CSL, RSISE)
The course is a general introduction to the methods and practice
of statistical machine learning.

## Pre-Requisites and Assumed Knowledge

A bachelor's degree in a relevant subject area;
confident use of a common programming language.

Mathematical training at the 2nd year undergraduate level,
including basic linear algebra and probability theory.
## Dates

- Registration: by 04 Apr 06
- Course Dates: 25 Apr to 01 Jun 06 (6 weeks)
- Lectures: Tue&Thu 10-12
- Tutorial/Exercise sessions: once a week, time and place TBD
- Assignments Due: by 09 Jun 05
- Notification: by 26 Jun 06

## Presenters

- Simon Guenter
- Nic Schraudolph
- Doug Aberdeen
- SVN Vishwanathan
- Alex Smola

(all SML, NICTA and adjunct with CSL, RSISE)
## Location

NICTA on Northbourne Ave., or RSISE on the ANU campus, depending on majority
of participants.
## Workload

- Weekly contact hours: 4h lecture, 2h tutorial
- Total contact hours: 24h lecture, 12h tutorial
- Assignments: 3 required, 5h each, 15h total
- Preparation/Reading: 1.5h per week, 9h total
- Total workload: 24 + 12 + 15 + 9 = 60h (3 units)

## Assessment

Only a pass or fail mark will be awarded. To pass the course, students
must gain a pass mark on at least 3 out of at least 4 offered assignments.
## Detailed Syllabus

**DRAFT** - subject to change at the discretion of the
course organizer.
- Bayesian Inference
- frequentists vs. Bayesians
- derivation of Bayes' Rule
- use for inference

Assignment 1 (theory): Ovarian Cancer Screening

Reading: Euro coin tosses (MacKay)
- Maximum Likelihood Modeling
- regression, classification, density estimation
- maximum likelihood loss functions

Reading: Maximum Likelihood--Mixture of Gaussians (Schiele)
- Density Estimation
- parametric vs. non-parametric
- classification via density estimation
- semi-parametric and mixture models
- Expectation-Maximisation (EM) algorithm

Assignment 2 (programming): EM

Reading: A Gentle Tutorial of the EM Algorithm (pages 1-3)
- Least Squares Regression
- linear vs. non-linear models
- simple gradient descent
- singular value decomposition
- basis functions, generalized least squares
- classification via regression

- Neural Networks
- biological background
- learning in neural networks
- backpropagation algorithm

Assignment 3 (programming): implement neural network
- Classical (Batch) Optimization
- Newton, quasi-Newton
- conjugate gradient

Reading: Conjugate Gradient Without the Pain (chapters 1-4)
- Stochastic (Online) Optimization
- need for online learning
- direct (gradient-free) methods
- gradient step size adaptation

Assignment 4
- Overfitting, Validation, and Regularisation
- empirical vs. true risk
- cross-validation techniques
- Ockham's razor, regularization
- minimum description length

- Reinforcement Learning (Doug Aberdeen)
- dynamic programming
- function approximation
- simulation
- policy based methods
- Tesauro's backgammon

Assignment 5 (programming): reinforcement learning
- Kernel Methods 1 (Alex Smola / SVN Vishwanathan)
- Kernel Methods 2 (Alex Smola / SVN Vishwanathan)

Assignment 6: kernel methods

10/05 - N. Schraudolph