Lab 1: Python & Environment Setup
🐍 Your journey into AI starts here. Master the tools of the trade.
Estimated Time: 3 hours
Part 1: Your AI Workbench - Jupyter & Colab
Before writing code, we need to set up our workspace. We will use an interactive environment to
experiment, visualize, and share our work. This is where Jupyter notebooks shine.
What is a Jupyter Notebook?
A Jupyter Notebook is an interactive document that lets you mix live code, equations, visualizations,
and narrative text in blocks called "cells". It's the standard for data science and AI because it's
perfect for exploratory work.
Google Colaboratory (Colab): Your Best Starting Point
Google Colab is a free Jupyter notebook environment that runs entirely in your browser.
No complex installation is required!
- Zero Configuration: All major AI libraries (NumPy, Pandas, TensorFlow, PyTorch) are
pre-installed.
- Free Hardware: You get access to powerful GPUs and TPUs for free, which can speed
up model training by 10-50x.
- Easy Collaboration: Share your notebooks like a Google Doc.
Getting Started with Colab
- Go to colab.research.google.com and sign in with a Google
account.
- Click "New notebook".
- You'll see your interactive workspace. At the top left, click "+ Code" to add a
cell for Python code and "+ Text" to add a cell for notes (like this text you're
reading).
- Type code in a code cell and run it by clicking the play button on the left or pressing
Shift + Enter.
🚀 Exercise: Your First Colab Notebook
1. Open a new Google Colab notebook and name it "Lab 1".
2. Create a text cell and write "My First AI Lab".
3. Below it, create a code cell and type `a = 10; b = 20; print(f"The sum is
{a+b}")`.
4. Run the code cell. The output should appear directly below it. This confirms your environment is
working!
Part 2: Python Fundamentals for AI
Now that your environment is ready, let's learn the language of AI. Python's simple syntax lets you focus
on the logic, not the programming. Below, we'll cover the essentials you'll use every day. For each
example, read the explanation, study the code, and then try the "Your Turn" task to solidify your
understanding.
2.1 Basic Data Types
In AI, you're always dealing with different types of data. Numbers are used for pixel values or model
weights, text is used for natural language processing, and booleans are used to control program flow.
pixel_intensity = 255
learning_rate = 0.001
model_name = "ImageClassifier_v1"
is_training = True
print(f"Model: {model_name}, LR:
{learning_rate}")
Model: ImageClassifier_v1, LR: 0.001
💡 Your Turn
Declare a variable for `batch_size` (an integer) and another for `model_accuracy` (a float). Print
them out in a formatted string.
2.2 Data Structures: Lists & Dictionaries
Lists are ordered collections, perfect for storing a sequence of features or data
points. Dictionaries are unordered collections of key-value pairs, ideal for storing
model configurations or labeled data.
house_features = [1500, 3, 10]
print(f"Number of bedrooms:
{house_features[1]}")
hyperparameters = {
"learning_rate": 0.01,
"epochs": 50,
"optimizer": "Adam"
}
print(f"Optimizer:
{hyperparameters['optimizer']}")
Number of bedrooms: 3
Optimizer: Adam
💡 Your Turn
Create a dictionary for a `student` with keys 'name', 'major', and a list of 'courses'. Print the
student's name and their second course.
2.3 Control Flow: Loops & Conditionals
Training a model involves iterating through your dataset thousands of times (`for` loops) and making
decisions based on performance (`if/else` statements).
loss_values = [0.8, 0.5, 0.2, 0.1]
for loss in loss_values:
if loss < 0.5:
print(f"Loss {loss:.2f} is
good. Continuing training.")
else:
print(f"Loss {loss:.2f} is
high. Check model.")
Loss 0.80 is high. Check model.
Loss 0.50 is high. Check model.
Loss 0.20 is good. Continuing training.
Loss 0.10 is good. Continuing training.
💡 Your Turn
Create a list of accuracies (e.g., `[0.91, 0.85, 0.99]`). Loop through them and print "Excellent" if
accuracy is > 0.9, "Good" if > 0.8, and "Needs Improvement" otherwise.
2.4 Functions
Functions are crucial for writing clean, reusable code. In AI, you'll write functions for data
preprocessing, model building, and training steps.
def preprocess_image(image_data, target_size):
print(f"Resizing image to
{target_size}x{target_size}...")
resized_image = "some_processed_data"
return resized_image
my_image = "raw_image_data"
processed = preprocess_image(my_image, target_size=224)
print(f"Function returned: {processed}")
Resizing image to 224x224...
Function returned: some_processed_data
💡 Your Turn
Write a function `calculate_average_loss` that takes a list of loss values and returns their average.
Part 3: The AI Power Tools
Let's explore the three most important libraries for any AI practitioner. These are pre-installed in
Colab.
3.1 NumPy: The Bedrock of AI Math
NumPy is critical because it provides the `ndarray` (N-dimensional array), the object on
which all modern AI frameworks (like TensorFlow and PyTorch) are built. Operations on NumPy arrays are
incredibly fast. Your image data, model weights, and feature vectors will all be NumPy arrays.
import numpy as np
vector = np.array([1, 2, 3])
matrix = np.array([ [1, 2], [3, 4] ])
print(f"Vector shape: {vector.shape}")
print(f"Matrix shape: {matrix.shape}")
scaled_vector = vector * 5
print(f"Scaled vector: {scaled_vector}")
Vector shape: (3,)
Matrix shape: (2, 2)
Scaled vector: [ 5 10 15]
💡 Your Turn
Create a 3x3 NumPy matrix of numbers from 1 to 9. Then, find the mean (average) of the entire matrix
using `matrix.mean()`.
3.2 Pandas: For Structuring and Cleaning Data
Pandas is your tool for data manipulation and analysis. Before you can train a model,
you need to load, explore, and clean your data. Pandas provides the DataFrame, a
powerful table-like structure, to make this easy.
import pandas as pd
data = {
'Model Name': ['ResNet50', 'MobileNetV2', 'EfficientNetB0'],
'Top-1 Accuracy': [0.76, 0.72, 0.77],
'Size (MB)': [102, 14, 21]
}
df = pd.DataFrame(data)
print("--- Full DataFrame ---")
print(df)
print("\n--- Accuracy Column ---")
print(df['Top-1 Accuracy'])
--- Full DataFrame ---
Model Name Top-1 Accuracy Size (MB)
0 ResNet50 0.76 102
1 MobileNetV2 0.72 14
2 EfficientNetB0 0.77 21
--- Accuracy Column ---
0 0.76
1 0.72
2 0.77
Name: Top-1 Accuracy, dtype: float64
💡 Your Turn
Add a new column to the DataFrame called `Performance` calculated as `Accuracy / Size`. Then, print
the updated DataFrame.
3.3 Matplotlib: Visualizing Your Results
Matplotlib is the classic library for creating plots and charts. Visualizing your
model's training progress (like loss over time) or your data's distribution is essential for
understanding and debugging.
import matplotlib.pyplot as plt
epochs = [1, 2, 3,
4, 5, 6]
training_loss = [0.8, 0.6, 0.45, 0.3, 0.25, 0.22]
validation_loss = [0.85, 0.68, 0.55, 0.48, 0.46,
0.45]
plt.figure(figsize=(8,5))
plt.plot(epochs, training_loss, marker='o', linestyle='--', label='Training Loss')
plt.plot(epochs, validation_loss, marker='s', linestyle='-', label='Validation Loss')
plt.title('Model Loss Over Time', fontsize=16)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True)
plt.legend()
plt.show()
💡 Your Turn
Using the Pandas DataFrame `df` from the previous section, create a bar chart comparing the 'Top-1
Accuracy' of the different models. Use `plt.bar(df['Model Name'], df['Top-1 Accuracy'])` as a
starting point.
Part 4: Lab Assignment
It's time to put everything together. This assignment requires you to perform a mini data analysis
project, combining all the skills you've learned above.
Assignment: Analyzing a Toy Dataset
You are given a small dataset of used car information. Your task is to load it, perform
some basic analysis, and visualize the results.
Task 1: Setup and Data Loading
- In a new Colab cell, import NumPy, Pandas, and Matplotlib with their standard aliases.
- Create a Pandas DataFrame using the following code snippet:
car_data = {
'Make': ['Honda', 'Toyota', 'Ford', 'Honda', 'Toyota', 'Ford'],
'Engine_Size_L': [1.5,
2.5, 5.0, 1.8, 2.4, 4.6],
'Price_USD': [22000,
28000, 35000, 24000, 29000, 32000]
}
car_df = pd.DataFrame(car_data)
- Print the entire DataFrame to verify it loaded correctly.
Task 2: Data Analysis with NumPy & Pandas
- Calculate and print the average (mean) price of all cars.
- Calculate and print the average engine size.
- Find the car with the highest price using Pandas functions (hint: look up `.idxmax()`). Print
the details of this car.
Task 3: Visualization with Matplotlib
- Create a scatter plot to visualize the relationship between 'Engine_Size_L'
(x-axis) and 'Price_USD' (y-axis).
- Give your plot a clear title ("Car Price vs. Engine Size") and label the x and y axes.
- Create a bar chart that shows the average price for each car 'Make'. You will
need to use the Pandas `.groupby()` function first. (Hint: `avg_price_by_make =
car_df.groupby('Make')['Price_USD'].mean()`).
Part 5: Bonus - Your First Kaggle Project
Ready to apply your skills to a world-famous dataset? This optional bonus section will guide you through
the first steps of a real data science project on Kaggle.
Kaggle & The Titanic Dataset
Kaggle is a platform where data scientists compete by building the best models for a
given problem. The "Titanic: Machine Learning from Disaster"
competition is the "Hello, World!" of data science. Your goal is to predict which passengers
survived the shipwreck.
Task 1: Get the Data
- Go to the Titanic data page on Kaggle. You will need to create
a free account.
- Download the `train.csv` and `test.csv` files to your computer.
- In your Colab notebook, click the "Files" icon on the left sidebar and upload `train.csv`.
Task 2: Load and Explore
Use Pandas to load the data and take your first look.
titanic_df = pd.read_csv('train.csv')
print("--- First 5 Rows ---")
print(titanic_df.head())
print("\n--- Data Info ---")
titanic_df.info()
Task 3: Your Challenge - Exploratory Data Analysis
Now, it's your turn to be the data detective. Answer the following questions using Pandas and Matplotlib.
There is no single correct answer for the code; your goal is to find the answer and visualize it.
- Survival Rate: What was the overall survival rate? Create a bar chart showing the
raw counts of passengers who survived (1) vs. those who did not (0). (Hint: use `.value_counts()` on
the 'Survived' column).
- Survival by Gender: Did gender play a role in survival? Create a bar chart showing
the survival rate for males vs. females. (Hint: use `.groupby('Sex')['Survived'].mean()`).
- Survival by Class: What about passenger class ('Pclass')? Create a bar chart
showing the survival rate for each of the three classes.
- Missing Data: Which important column has a lot of missing data? How might you
handle this in a real project? (Just answer in a text cell).
Part 6: Submission Guidelines
To complete this lab, please follow these instructions carefully.
- Complete all "Your Turn" tasks and the main "Lab Assignment" in a single Google Colab notebook. The
Kaggle project is a bonus, but we encourage you to try it!
- Use Text Cells in your notebook to label each section (e.g., "Part 2.4 Your Turn",
"Assignment Task 1", "Bonus Kaggle Project", etc.) to keep your work organized.
- Ensure all your code cells have been run so that their outputs are visible below them. An unrun
notebook is an incomplete notebook!
- When you are finished, generate a shareable link. In Colab, click the "Share"
button in the top right.
- In the popup, under "General access", change "Restricted" to "Anyone with the link"
and ensure the role is set to "Viewer".
- Click "Copy link" and submit this link as your assignment.