Journey to Spam Detection

Spiralogics is hosting a first of it’s kind, AI Conference in Nepal this February - a platform for AI veterans and students to come together and network. This year, Spiralogics is focused on exploring and explaining the ways how AI can create positive impact on different aspects of our life. It is a platform which will act as a driving force in encouraging the implementation of AI (especially in the Nepalese environment). Their purpose is to bring together professionals and students under one roof to rediscover the possibilities that AI can create in our day to day lives.

Spirathon

How was I involved?

I was asked to teach a 5 hour session to introduce the students to a concept of AI. I decided to conduct a workshop based session on Naive Bayes Classifier to detect spam.

Topics covered

  • Probability Primer: We look into some few probability problems and Monte Carlo Simulations to get a basic primer on probability and Python.
  • Conditional Probability: We look into few examples on conditional probability.
  • Bayes Theorem: We connect Conditional Probability with Bayes Theorem. We move from what we know to what we can infer and connect the dots.
  • Sensitivity, Specificity and Confusion matrix: We look into concepts of Sensitivity, Specificity, TP, TN, FP and FN. We also look into Type I and Type II errors.
  • Accuracy, Precision and Recall: We look into different ways to evaluate our models and realize when one is more important than the other. We also study F1 score which combines Precision and Recall.
  • Naive Bayes: We move from Bayes Theorem to Naive Bayes and define what Naive means.
  • NB Spam detection classifier: We build a binary classifier using Scikit learn and publicly available SMS datasets. We also evaluate our model’s performance.
  • NB vs Bagging, Random Forest and Adaboost: We will not spend much time here. This notebook basically compares Naive bayes with other algorithmns for the same dataset.

Link to the github repo.