Skip to content

Lecture Plan (11 Lectures)

Foundational (L01–L04)

L01: Setup + Debugging ✅ in progress

  • Compressed dev env setup (prereq covered basics)
  • Notebook hygiene and reproducibility
  • Defensive programming, debugging in VS Code + Jupyter
  • Adapt from: lectures_25/01 (setup), lectures_25/02 (debugging)

L02: SQL for Data Analysis

  • Core SQL: SELECT, JOIN, GROUP BY, window functions
  • SQLite + pandas/sqlalchemy integration
  • Adapt from: lectures_25/03 (was L03 last year)

L03: Larger-than-Memory Data

  • Polars: lazy vs eager evaluation
  • Out-of-core processing, chunking, parquet
  • Database as one solution for large data (ties to L02)
  • Adapt from: lectures_25/02 (was L02 last year—order swapped)

L04: NLP Foundations (student request)

  • Text preprocessing, tokenization, embeddings
  • Sentiment, NER, text classification
  • Clinical/health text applications
  • Sets up context for transformer/LLM lectures
  • New content + foundations from lectures_25/07

ML/AI Progression (L05–L09)

L05: Classification

  • Train/test splits, evaluation metrics, cross-validation
  • Logistic regression → Random Forest → XGBoost
  • Handling imbalanced data, feature selection
  • Adapt from: lectures_25/05

L06: Neural Networks

  • MLP, CNN, RNN/LSTM fundamentals (PyTorch)
  • Training loop, backprop, regularization
  • Adapt from: lectures_25/06

L07: Transformers & Deep Learning

  • Attention mechanism, transformer architecture
  • Hugging Face basics, tokenization
  • Adapt from: lectures_25/07

L08: LLMs – DIY & Understanding

  • Building intuition: nanoGPT walkthrough
  • Embeddings, context windows, fine-tuning concepts
  • Adapt from: lectures_25/07 demos

L09: LLMs – API, Agentic & Workflows

  • OpenAI/Anthropic/local APIs
  • Prompt engineering, structured outputs
  • Tool use, agents, practical workflows
  • Adapt from: lectures_25/07 + new content

Applied / Student Choice (L10–L11)

L10: Time Series & Forecasting

  • Time-based splits, lag features, rolling windows
  • ARIMA basics, ML regressors for forecasting
  • Adapt from: lectures_25/04

L11: TBD – Student Vote

Candidates: - Computer Vision – CNNs, transfer learning, medical imaging (from lectures_25/08) - Visualization & Dashboards – Altair, Streamlit, MkDocs reports (from lectures_25/09) - Experimentation & A/B Testing – causal inference, power analysis (from lectures_25/10) - Distributed Computing – threads/processes, HPC intro - End-to-End Project – CRISP-DM, capstone guidance


Notes

  • Each lecture: 90 min + additional demo time at 1/3, 2/3, end
  • Demos in Jupyter (draft as markdown, convert via jupytext)
  • Assignments: pass/fail, test understanding of core lecture content
  • Reference last year's content in lectures_25/ for examples, datasets, demos