Email Spam Detection with LSTM and TF-IDF

LSTM Classification Model • Natural Language Processing • Real-time Spam Filtering

Email Spam/Ham Classification using Deep Learning

This project uses a hybrid approach combining TF-IDF (Term Frequency-Inverse Document Frequency) for feature extraction and an LSTM (Long Short-Term Memory) network to classify emails as spam or ham. It enables robust spam detection leveraging both statistical features and contextual memory in sequences.

The life cycle includes:

Data Preprocessing: Removing HTML tags, stopwords, punctuation, tokenizing, stemming/lemmatization, and transforming with TF-IDF.
Model Building: Constructing an LSTM-based deep learning classifier using TensorFlow/Keras for sequence modeling.
Evaluation: Using Accuracy, Precision, Recall, F1-Score, and ROC-AUC metrics to evaluate model performance.
Model Saving: Exporting the trained model in `.h5` format for integration and deployment.
Deployment: Embedding the model into a lightweight Flask-based web application for real-time email classification.
Monitoring: Logging predictions, monitoring spam trends, and updating the model with new data for continual improvement.

Challenges: Handling noisy email data, maintaining balance between false positives and false negatives, avoiding overfitting, and ensuring fast prediction times in real-world use.

Email Spam Detection with LSTM and TF-IDF

Email Spam/Ham Classification using Deep Learning

📺 Watch the Demo