Lab 8: Object Detection System

Part 1: Beyond Classification

So far, we've built models that can look at an image and say "this is a cat." That's Image Classification. But what if there's a cat, a dog, and a duck in the same image? How do we find each one?

This is Object Detection. The goal is to produce two things for every object in an image:

A Bounding Box: The (x, y) coordinates of a rectangle surrounding the object.
A Class Label: The name of the object inside the box (e.g., "cat", "person").

Introducing YOLO: You Only Look Once

Older object detection systems were slow. They would look at many different regions of an image one by one. YOLO revolutionized this by being incredibly fast and accurate. It looks at the entire image just once and figures out all the bounding boxes and class probabilities simultaneously.

Think of it like this: to find your keys in a room, you don't scan every square inch. You glance around the whole room and your brain instantly picks out key-like shapes. YOLO works in a similar, clever way.

Key Idea of YOLO

YOLO divides the input image into a grid. For each grid cell, it predicts several bounding boxes and the probability that an object's center falls within that cell. It then combines this with class predictions to produce the final detections.

Part 2: Setting Up Your YOLO Environment

We'll be using the `ultralytics` library, which provides a super easy-to-use implementation of the latest YOLO models (like YOLOv8). We also need `OpenCV` for handling images and videos.

                # Install the necessary libraries

                !pip install -q ultralytics opencv-python

                import cv2

                from ultralytics import YOLO

                import matplotlib.pyplot as plt

                # Load a pre-trained YOLOv8 model

                # 'yolov8n.pt' is the nano version - small and fast!

                model = YOLO('yolov8n.pt')

💡 Your Turn: Explore Other Models

The `ultralytics` library offers different model sizes, each with a trade-off between speed and accuracy. Try loading a medium-sized model like `'yolov8m.pt'` or the large (and very accurate) `'yolov8l.pt'`. The first time you use a model, the library will download it automatically.

model_medium = YOLO('yolov8m.pt')

Part 3: Finding Objects in an Image

Let's start by detecting objects in a single image. First, we need an image. You can upload your own to Colab or download one from the internet.

                # Download a sample image from the web

                !wget -q https://ultralytics.com/images/bus.jpg -O bus.jpg

                image_path = 'bus.jpg'

                # Run inference on the image

                results = model(image_path)

                # The 'results' object contains all the detection information.

                # Ultralytics provides a handy plot() method to visualize the results.

                annotated_image = results[0].plot() # plot() returns a
                    NumPy array of the image with boxes drawn on it

                # Display the annotated image

                # Matplotlib expects RGB, but OpenCV (used by plot()) uses BGR, so we
                    convert.

                plt.imshow(cv2.cvtColor(annotated_image, cv2.COLOR_BGR2RGB))

                plt.axis('off')

                plt.show()

You should see the original image with bounding boxes drawn around the people and the bus, along with their class labels and confidence scores!

💡 Your Turn: Save the Result

Instead of just displaying the image, you can save the annotated version to a file. The `predict` method has a `save=True` argument that does this automatically, placing the result in a `runs/detect/predict` folder. Try it!

model.predict(image_path, save=True)

Understanding the Results Object

The `results[0]` object is powerful. You can access the raw data directly.

                # Access the bounding boxes (in xyxy format), confidence scores, and class
                    IDs

                boxes = results[0].boxes.xyxy.cpu().numpy()

                confidences = results[0].boxes.conf.cpu().numpy()

                class_ids = results[0].boxes.cls.cpu().numpy()

                class_names = results[0].names # Dictionary of class
                    ID -> class name

                print(f"Found {len(boxes)} objects.")

                for i in range(len(boxes)):

                  class_name = class_names[class_ids[i]]

                  print(f" - Object {i+1}: {class_name}
                    (Confidence: {confidences[i]:.2f})")

💡 Your Turn: Adjust the Confidence Threshold

By default, YOLO shows detections with a confidence of 0.25 or higher. You can be more strict. Run the prediction again, but this time set the `conf` argument. How do the results change if you set it to `0.7`?

                    results = model(image_path, conf=0.7)

                    plt.imshow(cv2.cvtColor(results[0].plot(), cv2.COLOR_BGR2RGB))

                    plt.show()

💡 Your Turn: Adjust the IoU Threshold

IoU (Intersection over Union) controls how overlapping boxes are handled. A lower `iou` threshold (e.g., `0.3`) will cause the model to discard more redundant boxes. Run the prediction again, this time adjusting the `iou` parameter. Do you see any difference in the bus image?

results = model(image_path, iou=0.3)

Part 4: Real-Time Detection on Video

This is where YOLO truly shines. We can run it on a live video stream from your webcam (if you're running this on your local machine) or on a video file in Colab.

Detection on a Video File

First, let's download a sample video.

                !wget -q https://storage.googleapis.com/s4a-prod-share-258/object_detection_files/people-walking.mp4 -O
                people.mp4
            

Now, we can call the model on the video path. We'll set `stream=True` to process it frame-by-frame efficiently.

                video_path = 'people.mp4'

                # This will run detection and save the output to a file in the
                    'runs/detect/predict...' folder

                results_generator = model(video_path, stream=True, save=True) 

                # We need to iterate through the generator to process it

                for result in results_generator:

                  pass # The work is done during
                    iteration

After running this, check your file browser in Colab. You'll find a new `runs` directory containing the processed video with bounding boxes drawn on it. You can download and play it!

💡 Your Turn: Filter by Class

What if you only want to detect people and ignore everything else? You can use the `classes` argument. The class ID for 'person' is 0. Run the prediction on the video again, but only look for people. Does the output video look cleaner?

                    model(video_path, stream=True, save=True,
                    classes=0)
                

💡 Your Turn: Speed vs. Accuracy

Processing at a higher resolution gives more accuracy but is slower. Use the `imgsz` argument to set the image size. Try running detection on the video at `imgsz=320` and then at `imgsz=1280`. It won't save a video, but it will print the processing speed for each frame. Compare the average speed (in ms) for each resolution.

                    # Low resolution (faster)

                    for result in model(video_path,
                    stream=True, imgsz=320): pass

                    # High resolution (slower, more accurate)

                    for result in model(video_path,
                    stream=True, imgsz=1280): pass

Part 5: Your Mission - Line-Crossing Vehicle Counter

Assignment: Advanced Vehicle Counting

Simply counting objects in each frame is flawed. A car that stays in view for 100 frames gets counted 100 times. A better approach is to count objects as they cross a virtual line. This is a real-world technique used in traffic analysis.

Your Tasks:

Find a Video: Find a short (15-30 seconds) traffic video where vehicles move horizontally across the screen (e.g., from a highway overpass). Upload it to Colab.
Define a Line: Define a vertical line on the screen. For an image of width `W`, a line at `x = W / 2` is a good start.
Loop and Detect: Open the video with OpenCV. Loop through each frame, and run YOLO detection on it. Focus only on vehicle classes ('car', 'bus', 'truck', 'motorcycle').
Track Object Centers: For each detected vehicle, calculate the center of its bounding box.
Implement Crossing Logic: To count a vehicle only once, you need to track its position relative to the line. A simple way is to store the center point of each object from the *previous* frame. In the current frame, if an object's previous x-coordinate was *less than* the line and its current x-coordinate is *greater than* the line (or vice versa), it has crossed. Increment your counter.
Visualize and Report: Draw your virtual line and the detection boxes on each frame using OpenCV. Display the running count on the video. At the end, report the final, accurate vehicle count.

Hint: You'll need a dictionary to store the positions of objects from the previous frame. This requires a simple tracking mechanism. A good starting point is to assume an object in a similar position in the next frame is the same object.

Part 6: Bonus - Custom Training

The true power of YOLO is training it on your own custom dataset. This is the biggest challenge in object detection and a highly valuable skill.

Dataset: COCO8

The `COCO8` dataset is a small example dataset provided by Ultralytics, perfect for learning how to train. It has 8 images and the corresponding labels in YOLO format.

Your Challenge: Fine-Tune YOLO

Your goal is to "fine-tune" the pre-trained YOLOv8 model on this tiny dataset. The `ultralytics` library makes this surprisingly easy.

                # You need a YAML file to define your dataset structure

                """
                    # coco8.yaml
                    train: ./coco8/images/train
                    val: ./coco8/images/val

                    # number of classes
                    nc: 80

                    # class names
                    names: ['person', 'bicycle', 'car', ..., 'toothbrush']
                    """

                # Load the model

                model = YOLO('yolov8n.pt')

                # Train the model

                results = model.train(data='coco8.yaml', epochs=100, imgsz=640)

Running this code will download the dataset, start the training process, and save your new, fine-tuned model weights in the `runs/detect/train/` directory. You can then load this new model and use it just like you used the pre-trained one!

Part 7: Submission Guidelines

Complete all "Your Turn" tasks and the main "Lab Assignment" (Line-Crossing Vehicle Counter) in a single Google Colab notebook. The custom training is a bonus.
For the assignment, include the code that loops through the video frames, implements the line-crossing logic, and draws visualizations.
Add a Text Cell at the end reporting your final, accurate vehicle count. Explain your line-crossing logic.
Ensure all your code cells have been run so that their outputs are visible.
When you are finished, generate a shareable link. In Colab, click "Share" and set access to "Anyone with the link" can "Viewer".
Click "Copy link" and submit this link as your assignment.