Email Spam/Ham Classification using Deep Learning
This project uses a hybrid approach combining TF-IDF (Term Frequency-Inverse Document Frequency) for feature extraction and an LSTM (Long Short-Term Memory) network to classify emails as spam or ham.
It enables robust spam detection leveraging both statistical features and contextual memory in sequences.
The life cycle includes:
- Data Preprocessing: Removing HTML tags, stopwords, punctuation, tokenizing, stemming/lemmatization, and transforming with TF-IDF.
- Model Building: Constructing an LSTM-based deep learning classifier using TensorFlow/Keras for sequence modeling.
- Evaluation: Using Accuracy, Precision, Recall, F1-Score, and ROC-AUC metrics to evaluate model performance.
- Model Saving: Exporting the trained model in `.h5` format for integration and deployment.
- Deployment: Embedding the model into a lightweight Flask-based web application for real-time email classification.
- Monitoring: Logging predictions, monitoring spam trends, and updating the model with new data for continual improvement.
Challenges: Handling noisy email data, maintaining balance between false positives and false negatives, avoiding overfitting, and ensuring fast prediction times in real-world use.