Guiding Students Through Deep Learning Assignments

Introduction

In the Fall of 2024, I served as a Teaching Assistant (TA) for Carnegie Mellon University’s 14-757 course, AI Applications in Information Security. This course explores the applications of deep learning in cybersecurity, combining rigorous theoretical knowledge with practical implementation. In this blog, I’ll share how I guided students during office hours to tackle the challenges of coding assignments, shedding light on the teaching strategies and solutions I employed.

Helping Students Build a Naive Bayes SMS Classifier

One assignment required students to implement a Naive Bayes classifier to distinguish spam messages from legitimate ones. While the task appeared straightforward, many students encountered issues during the implementation. In one office hour, a student approached me with a perplexing problem: their model performed poorly on the test set, with accuracy far below expectations.

Upon reviewing their code, I noticed they had skipped important preprocessing steps, such as removing stop words and non-alphabetic characters. I explained the significance of these steps and demonstrated how to use Python’s regular expression library to clean text effectively. Together, we restructured their feature extraction process, ensuring consistency between training and testing data. By the end of the session, their model achieved a much higher accuracy, and the student gained a deeper understanding of how data preprocessing impacts machine learning models.

Guiding the Implementation of Stochastic Gradient Descent for SVMs

Another challenging assignment required students to implement a support vector machine (SVM) from scratch and optimize it using stochastic gradient descent (SGD). Many students struggled with implementing the gradient update rules. During one office hour, a group of students expressed frustration with their model’s slow convergence and inability to train effectively.

After reviewing their implementation, I identified the issue: they were using a fixed learning rate, which hindered convergence. I explained the necessity of learning rate decay and guided them through implementing a dynamic learning rate adjustment strategy. Additionally, I clarified the role of the regularization term in balancing model complexity and generalization. With these adjustments, their model not only trained faster but also achieved significantly improved accuracy on the test set.

Debugging and Optimizing a Convolutional Neural Network

The assignment to extend a convolutional neural network (CNN) posed one of the toughest challenges in the course. One student added extra convolutional layers but found that the training time increased significantly, with only marginal improvements in accuracy. During office hours, I examined their network design and discovered that the additional layers had too many parameters, and the input image resolution was unnecessarily high.

I recommended reducing the number of filters in each layer and resizing the input images to match the network’s capacity. Furthermore, I introduced them to using dropout layers to mitigate overfitting and discussed the benefits of data augmentation for improving model robustness. After implementing these suggestions, the student observed substantial improvements in both training efficiency and model performance.

Reflections on Teaching

Through weekly office hours, I not only helped students complete their assignments but also enhanced my own technical and teaching skills. Witnessing students transition from confusion to clarity was immensely rewarding. This TA experience has deepened my understanding of how to deconstruct complex technical problems and communicate them effectively, equipping me for future endeavors in education and research.