Lab 10: πŸ’­ Sentiment Analysis Application

From a trained model in a notebook to a live web app anyone can use.

Libraries: TensorFlow, Keras, Pandas, Streamlit β€’ Estimated Time: 3 hours

Part 1: The Final Step - Deployment

So far, all our amazing models have lived inside a Colab notebook. They can't do anything unless we are there to run the code. The final, crucial step in any real-world AI project is deploymentβ€”packaging your model into a user-friendly application.

Today, we'll build a simple web app for our sentiment analysis model. This means anyone, even someone with no coding knowledge, can visit a webpage, type in a sentence, and get a prediction.

Introducing Streamlit: Your AI Swiss Army Knife

Building a web app from scratch usually involves learning HTML, CSS, and JavaScript frameworks. It's complicated! Streamlit is a magical Python library that lets you build beautiful, interactive web apps for machine learning with just a few lines of Python. No web development experience needed!

Part 2: Train and Save a Sentiment Classifier

Before we can build an app, we need a model. We'll train an LSTM on a real-world dataset of tweets to classify them as positive or negative. The most important new step is saving our trained model and tokenizer so our web app can use them later.

Step 1: Get the Data

We'll use the "Sentiment140" dataset, which contains 1.6 million tweets with sentiment labels (0 = negative, 4 = positive).

import pandas as pd
import tensorflow as tf
import numpy as np
import pickle
import matplotlib.pyplot as plt

# Download the dataset
!wget --no-check-certificate \ http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip \ -O /tmp/trainingandtestdata.zip

# Unzip it
import zipfile
zip_ref = zipfile.ZipFile("/tmp/trainingandtestdata.zip", 'r')
zip_ref.extractall("/tmp/")
zip_ref.close()
# Load into a pandas DataFrame
cols = ['sentiment', 'id', 'date', 'query', 'user', 'text']
df = pd.read_csv('/tmp/training.1600000.processed.noemoticon.csv', header=None, names=cols, encoding='ISO-8859-1')

# We only need the sentiment and text
df = df[['sentiment', 'text']]
df['sentiment'] = df['sentiment'].replace(4, 1) # Change label 4 to 1 for easier use

# For this lab, we'll use a smaller subset to speed up training
df = df.sample(20000, random_state=42)

πŸ’‘ Your Turn: Explore the Dataset

Before diving in, it's always good practice to inspect your data. Use `df.head()` to see the first few rows and `df['sentiment'].value_counts()` to see if the number of positive and negative tweets is balanced.

Step 2: Preprocess and Tokenize

This is the same process as Lab 9: convert text to numbers.

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

vocab_size = 10000
embedding_dim = 16
max_length = 100
trunc_type='post'
padding_type='post'
oov_tok = "<OOV>"

tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(df['text'])

sequences = tokenizer.texts_to_sequences(df['text'])
padded = pad_sequences(sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)

labels = df['sentiment'].values

Step 3: Build, Train, and SAVE

model = tf.keras.Sequential([
  tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
  tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
  tf.keras.layers.Dense(24, activation='relu'),
  tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

history = model.fit(padded, labels, epochs=10, validation_split=0.2, verbose=2)

πŸ’‘ Your Turn: Plot Training History

A key skill is diagnosing your model's training. The `history` object contains the accuracy and loss for each epoch. Use Matplotlib to plot the training accuracy vs. the validation accuracy. Is the model overfitting? (Hint: `history.history['accuracy']` and `history.history['val_accuracy']`)

plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.legend()
plt.show()

CRUCIAL STEP: Save Your Tools!

To use this model in another script (our web app), we must save two things: the trained model's weights and the tokenizer. The tokenizer is just as important as the model, because our app needs to preprocess user input in the exact same way.

# Save the trained model
model.save('sentiment_model.h5')

# Save the tokenizer object
with open('tokenizer.pickle', 'wb') as handle:
  pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

After running this, use the file browser in Colab to download `sentiment_model.h5` and `tokenizer.pickle` to your computer. You'll need them for the next part.

Part 3: Building The App Locally

Now we leave Colab and move to our own computer. Create a new folder on your desktop, and place the two files you just downloaded (`sentiment_model.h5` and `tokenizer.pickle`) inside it.

Step 1: Install Streamlit and TensorFlow

Open a terminal or command prompt on your computer and run:

pip install streamlit tensorflow

Step 2: Create your app file

Inside your new folder, create a new Python file named `app.py`. Open it in a text editor (like VS Code, Sublime Text, or even Notepad) and paste the following code:

import streamlit as st
import tensorflow as tf
import pickle
from tensorflow.keras.preprocessing.sequence import pad_sequences

# --- Load Model and Tokenizer ---
st.set_page_config(page_title="Sentiment Analyzer", page_icon="πŸ’­")

# Use a cache to avoid reloading the model on every interaction
@st.cache(allow_output_mutation=True)
def load_model_and_tokenizer():
  model = tf.keras.models.load_model('sentiment_model.h5')
  with open('tokenizer.pickle', 'rb') as handle:
    tokenizer = pickle.load(handle)
  return model, tokenizer

model, tokenizer = load_model_and_tokenizer()

# --- App Layout ---
st.title("Twitter Sentiment Analyzer")
st.write("Type a sentence below and I'll tell you if it's positive or negative!")

user_input = st.text_area("Enter your text here:")

if st.button("Analyze"):
  if user_input:
    # 1. Preprocess the input
    sequence = tokenizer.texts_to_sequences([user_input])
    padded = pad_sequences(sequence, maxlen=100, padding='post', truncating='post')

    # 2. Make prediction
    prediction = model.predict(padded)
    sentiment_score = prediction[0][0]

    # 3. Display result
    if sentiment_score > 0.5:
      st.success(f"Positive Sentiment! πŸ‘ (Score: {sentiment_score:.2f})")
    else:
      st.error(f"Negative Sentiment! πŸ‘Ž (Score: {sentiment_score:.2f})")
  else:
    st.warning("Please enter some text to analyze.")

Step 3: Run the App!

Go back to your terminal, make sure you are in the folder where `app.py` is located, and run this command:

streamlit run app.py

Your web browser should automatically open with your new application running! Try it out!

πŸ’‘ Your Turn: Add a Spinner

Model prediction isn't instant. To improve the user experience, wrap the prediction logic in a `with st.spinner('Analyzing...'):` block. This will show a nice loading animation while the model is working.

πŸ’‘ Your Turn: Visualize the Score

Instead of just printing the score, use a visual element. After the `st.success` or `st.error`, add `st.progress(sentiment_score)` to show a progress bar. Or, try `st.bar_chart({'sentiment': [sentiment_score, 1-sentiment_score]})` to show both positive and negative probabilities.

Part 4: Your Mission - Upgrade the App

Assignment: Add More Features

Your base application works, but we can make it better. Your mission is to add new features to `app.py`.

Your Tasks:

  1. Add a Header and Sidebar: Use `st.header()` and `st.sidebar.header()` to organize your app. Put an "About" section in the sidebar explaining what the app does.
  2. Show More Details: After a prediction, display the raw sentiment score with more precision. Also, explain what the score means (e.g., "Scores closer to 1 are more positive.").
  3. Display an Image: After a prediction, use `st.image()` to display a happy emoji picture for a positive result and a sad one for a negative result. Find these images online.
  4. Add a "Clear" Button: Add a second button that clears the text area and the previous result.

Part 5: Bonus - A More Complex Classifier

The model we built was simple to keep training fast. A more powerful and modern approach is to use pre-trained word embeddings like GloVe.

Kaggle: IMDB Movie Reviews

This is a classic binary sentiment classification dataset, but the text is much longer and more complex than tweets.

Your Challenge:

Your goal is to build a highly accurate sentiment classifier for this dataset and deploy it with Streamlit.

Part 6: Submission Guidelines

Since this project involves multiple local files, submitting a Colab link isn't enough.

  1. Create a new public repository on GitHub. This is an essential skill for all developers.
  2. Your repository should contain at least three files:
    • The final, enhanced `app.py`.
    • The saved `sentiment_model.h5`.
    • The saved `tokenizer.pickle`.
  3. Create a `README.md` file in your repository. In this file, write a brief description of your project and include instructions on how to run it (e.g., "1. Clone the repo. 2. Run `pip install -r requirements.txt`. 3. Run `streamlit run app.py`").
  4. Submit the URL to your public GitHub repository as your assignment.