Demo 0: Getting Started with Jupyter Notebooks 📓¶
Goal¶
Learn to use Jupyter notebooks for interactive data exploration and analysis.
Setup¶
1. Select or Create a Python Environment¶
In VS Code¶
- Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P)
- Click on "Select Kernel" -> "Python Environments"
- Choose "Create Environment" to create a new virtual environment or "".venv" (if available) to select an existing one
- Select "Venv" as the environment type
- Choose Python version (3.8+ recommended)
- VS Code will create and activate the environment automatically
(Alternative) Launch Jupyter from the Command Line¶
- Create and activate a virtual environment using
python3 -m venv .venvsource .venv/bin/activate- Install Jupyter
pip install jupyter - Launch Jupyter:
jupyter notebook - Create a new notebook
- In the top-right corner, click on the kernel selector
- Choose "Select kernel" > "Python Environments"
- Select your environment or create a new one
2. Install Required Packages¶
You can install packages directly in the notebook using magic commands (see Task 2.1 below).
Tasks¶
1. Using Magic Commands¶
- Install packages with
%pip:
- Measure execution time with
%timeit:
# Create a list of numbers
numbers = list(range(1000000))
# Measure time to calculate sum
%timeit sum(numbers)
- Display plots inline with
%matplotlib:
%matplotlib inline
import matplotlib.pyplot as plt
# Create a simple plot
plt.figure(figsize=(10, 6))
plt.plot([0, 1, 2, 3, 4], [0, 3, 1, 5, 2])
plt.title("Sample Patient Data")
plt.xlabel("Time (hours)")
plt.ylabel("Pain Level")
plt.grid(True)
plt.show()
2. Notebook Operations¶
- Create cells and run code:
- Create a new code cell
- Enter a simple Python statement:
print("Hello, nerds!") - Run the cell with Shift+Enter or the Run button
-
In the next cell, try
display("Hello, beautiful nerds!") -
Use markdown for documentation:
- Create a markdown cell (change cell type to "Markdown")
- Add a title, description, and bullet points:
# Patient Data Analysis
This notebook explores patient vital signs data.
Key metrics:
- Heart rate
- Blood pressure
- Oxygen saturation
- Use markdown for output:
- Display formatted text as output:
- Or show tables in a nicer format:
%pip install pandas --quiet
import pandas as pd
df = pd.DataFrame({
'Column 1': ['Hello'],
'Column 2': ['gorgeous'],
'Column 3': ['nerds']
})
print(df)
display(df)
3. Using Shell Commands¶
- List files in the current directory:
- Check Python version:
- Create a directory and check it exists:
4. Working with Data¶
- Create sample patient data:
import pandas as pd
import numpy as np
# Create sample patient data
np.random.seed(42) # For reproducibility
# Generate 100 patient records
n_patients = 100
data = {
'patient_id': range(1, n_patients + 1),
'age': np.random.randint(18, 90, size=n_patients),
'heart_rate': np.random.normal(75, 15, size=n_patients).round().astype(int),
'systolic_bp': np.random.normal(120, 20, size=n_patients).round().astype(int),
'diastolic_bp': np.random.normal(80, 10, size=n_patients).round().astype(int),
'temperature': np.random.normal(98.6, 1, size=n_patients).round(1),
'o2_saturation': np.random.normal(97, 2, size=n_patients).round().astype(int)
}
# Create DataFrame
patients_df = pd.DataFrame(data)
# Display first few rows
patients_df.head()
- Explore the data:
# Basic statistics
patients_df.describe()
# Check for missing values
patients_df.isna().sum()
# Count patients by age group
patients_df['age_group'] = pd.cut(patients_df['age'],
bins=[0, 30, 50, 70, 100],
labels=['<30', '30-50', '50-70', '>70'])
patients_df['age_group'].value_counts()
- Visualize the data:
# Create a histogram of heart rates
plt.figure(figsize=(10, 6))
plt.hist(patients_df['heart_rate'], bins=15, alpha=0.7)
plt.title('Distribution of Patient Heart Rates')
plt.xlabel('Heart Rate (bpm)')
plt.ylabel('Number of Patients')
plt.grid(True, alpha=0.3)
plt.show()
# Create a scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(patients_df['age'], patients_df['systolic_bp'], alpha=0.7)
plt.title('Age vs. Systolic Blood Pressure')
plt.xlabel('Age (years)')
plt.ylabel('Systolic BP (mmHg)')
plt.grid(True, alpha=0.3)
plt.show()
- Save the data:
# Save to CSV
patients_df.to_csv('data/patient_vitals.csv', index=False)
# Verify the file was created
!ls -la data/
Expected Outcomes¶
- Students can create and run Jupyter notebooks
- Students can use markdown for documentation
- Students can use magic commands and shell commands
- Students can create, explore, and visualize simple datasets
- Students understand the interactive nature of notebooks