Lab 10: Sentiment Analysis Application

Part 1: The Final Step - Deployment

So far, all our amazing models have lived inside a Colab notebook. They can't do anything unless we are there to run the code. The final, crucial step in any real-world AI project is deployment—packaging your model into a user-friendly application.

Today, we'll build a simple web app for our sentiment analysis model. This means anyone, even someone with no coding knowledge, can visit a webpage, type in a sentence, and get a prediction.

Introducing Streamlit: Your AI Swiss Army Knife

Building a web app from scratch usually involves learning HTML, CSS, and JavaScript frameworks. It's complicated! Streamlit is a magical Python library that lets you build beautiful, interactive web apps for machine learning with just a few lines of Python. No web development experience needed!

Part 2: Train and Save a Sentiment Classifier

Before we can build an app, we need a model. We'll train an LSTM on a real-world dataset of tweets to classify them as positive or negative. The most important new step is saving our trained model and tokenizer so our web app can use them later.

Step 1: Get the Data

We'll use the "Sentiment140" dataset, which contains 1.6 million tweets with sentiment labels (0 = negative, 4 = positive).

                import pandas as pd

                import tensorflow as tf

                import numpy as np

                import pickle

                import matplotlib.pyplot as plt

                # Download the dataset

                !wget --no-check-certificate \
                http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip \
                -O /tmp/trainingandtestdata.zip

                # Unzip it

                import zipfile

                zip_ref = zipfile.ZipFile("/tmp/trainingandtestdata.zip", 'r')

                zip_ref.extractall("/tmp/")

                zip_ref.close()

                # Load into a pandas DataFrame

                cols = ['sentiment', 'id', 'date', 'query', 'user', 'text']

                df = pd.read_csv('/tmp/training.1600000.processed.noemoticon.csv',
                header=None, names=cols, encoding='ISO-8859-1')

                # We only need the sentiment and text

                df = df[['sentiment', 'text']]

                df['sentiment'] = df['sentiment'].replace(4, 1) # Change label 4 to 1
                    for easier use

                # For this lab, we'll use a smaller subset to speed up training

                df = df.sample(20000, random_state=42)

💡 Your Turn: Explore the Dataset

Before diving in, it's always good practice to inspect your data. Use `df.head()` to see the first few rows and `df['sentiment'].value_counts()` to see if the number of positive and negative tweets is balanced.

Step 2: Preprocess and Tokenize

This is the same process as Lab 9: convert text to numbers.

                from tensorflow.keras.preprocessing.text import Tokenizer

                from tensorflow.keras.preprocessing.sequence import pad_sequences

                vocab_size = 10000

                embedding_dim = 16

                max_length = 100

                trunc_type='post'

                padding_type='post'

                oov_tok = "<OOV>"

                tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_tok)

                tokenizer.fit_on_texts(df['text'])

                sequences = tokenizer.texts_to_sequences(df['text'])

                padded = pad_sequences(sequences, maxlen=max_length, padding=padding_type,
                truncating=trunc_type)

                labels = df['sentiment'].values

Step 3: Build, Train, and SAVE

                model = tf.keras.Sequential([

                  tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),

                  tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),

                  tf.keras.layers.Dense(24, activation='relu'),

                  tf.keras.layers.Dense(1, activation='sigmoid')

                ])

                model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

                history = model.fit(padded, labels, epochs=10, validation_split=0.2, verbose=2)

💡 Your Turn: Plot Training History

A key skill is diagnosing your model's training. The `history` object contains the accuracy and loss for each epoch. Use Matplotlib to plot the training accuracy vs. the validation accuracy. Is the model overfitting? (Hint: `history.history['accuracy']` and `history.history['val_accuracy']`)

                    plt.plot(history.history['accuracy'], label='Training Accuracy')

                    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')

                    plt.legend()

                    plt.show()

CRUCIAL STEP: Save Your Tools!

To use this model in another script (our web app), we must save two things: the trained model's weights and the tokenizer. The tokenizer is just as important as the model, because our app needs to preprocess user input in the exact same way.

                # Save the trained model

                model.save('sentiment_model.h5')

                # Save the tokenizer object

                with open('tokenizer.pickle', 'wb') as handle:

                  pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

After running this, use the file browser in Colab to download `sentiment_model.h5` and `tokenizer.pickle` to your computer. You'll need them for the next part.

Part 3: Building The App Locally

Now we leave Colab and move to our own computer. Create a new folder on your desktop, and place the two files you just downloaded (`sentiment_model.h5` and `tokenizer.pickle`) inside it.

Step 1: Install Streamlit and TensorFlow

Open a terminal or command prompt on your computer and run:

pip install streamlit tensorflow

Step 2: Create your app file

Inside your new folder, create a new Python file named `app.py`. Open it in a text editor (like VS Code, Sublime Text, or even Notepad) and paste the following code:

                import streamlit as st

                import tensorflow as tf

                import pickle

                from tensorflow.keras.preprocessing.sequence import pad_sequences

                # --- Load Model and Tokenizer ---

                st.set_page_config(page_title="Sentiment Analyzer", page_icon="💭")

                # Use a cache to avoid reloading the model on every interaction

                @st.cache(allow_output_mutation=True)

                def load_model_and_tokenizer():

                  model = tf.keras.models.load_model('sentiment_model.h5')

                  with open('tokenizer.pickle', 'rb') as handle:

                    tokenizer = pickle.load(handle)

                  return model, tokenizer

                model, tokenizer = load_model_and_tokenizer()

                # --- App Layout ---

                st.title("Twitter Sentiment Analyzer")

                st.write("Type a sentence below and I'll tell you if it's positive or
                    negative!")

                user_input = st.text_area("Enter your text here:")

                if st.button("Analyze"):

                  if user_input:

                    # 1. Preprocess the input

                    sequence = tokenizer.texts_to_sequences([user_input])

                    padded = pad_sequences(sequence, maxlen=100,
                padding='post', truncating='post')

                    # 2. Make prediction

                    prediction = model.predict(padded)

                    sentiment_score = prediction[0][0]

                    # 3. Display result

                    if sentiment_score > 0.5:

                      st.success(f"Positive Sentiment! 👍 (Score:
                    {sentiment_score:.2f})")

                    else:

                      st.error(f"Negative Sentiment! 👎 (Score:
                    {sentiment_score:.2f})")

                  else:

                    st.warning("Please enter some text to analyze.")

Step 3: Run the App!

Go back to your terminal, make sure you are in the folder where `app.py` is located, and run this command:

streamlit run app.py

Your web browser should automatically open with your new application running! Try it out!

💡 Your Turn: Add a Spinner

Model prediction isn't instant. To improve the user experience, wrap the prediction logic in a `with st.spinner('Analyzing...'):` block. This will show a nice loading animation while the model is working.

💡 Your Turn: Visualize the Score

Instead of just printing the score, use a visual element. After the `st.success` or `st.error`, add `st.progress(sentiment_score)` to show a progress bar. Or, try `st.bar_chart({'sentiment': [sentiment_score, 1-sentiment_score]})` to show both positive and negative probabilities.

Part 4: Your Mission - Upgrade the App

Assignment: Add More Features

Your base application works, but we can make it better. Your mission is to add new features to `app.py`.

Your Tasks:

Add a Header and Sidebar: Use `st.header()` and `st.sidebar.header()` to organize your app. Put an "About" section in the sidebar explaining what the app does.
Show More Details: After a prediction, display the raw sentiment score with more precision. Also, explain what the score means (e.g., "Scores closer to 1 are more positive.").
Display an Image: After a prediction, use `st.image()` to display a happy emoji picture for a positive result and a sad one for a negative result. Find these images online.
Add a "Clear" Button: Add a second button that clears the text area and the previous result.

Part 5: Bonus - A More Complex Classifier

The model we built was simple to keep training fast. A more powerful and modern approach is to use pre-trained word embeddings like GloVe.

Kaggle: IMDB Movie Reviews

This is a classic binary sentiment classification dataset, but the text is much longer and more complex than tweets.

Your Challenge:

Your goal is to build a highly accurate sentiment classifier for this dataset and deploy it with Streamlit.

Download GloVe embeddings (e.g., `glove.6B.100d.txt`).
Create an embedding matrix that maps words in your tokenizer's vocabulary to their GloVe vectors.
In your Keras model, use this embedding matrix to initialize the `Embedding` layer and set `trainable=False`. This injects powerful pre-trained knowledge into your model.
Train this new model, save it, and update your Streamlit app to use it. Compare its performance to your original model.

Part 6: Submission Guidelines

Since this project involves multiple local files, submitting a Colab link isn't enough.

Create a new public repository on GitHub. This is an essential skill for all developers.
Your repository should contain at least three files:
- The final, enhanced `app.py`.
- The saved `sentiment_model.h5`.
- The saved `tokenizer.pickle`.
Create a `README.md` file in your repository. In this file, write a brief description of your project and include instructions on how to run it (e.g., "1. Clone the repo. 2. Run `pip install -r requirements.txt`. 3. Run `streamlit run app.py`").
Submit the URL to your public GitHub repository as your assignment.