Skip to content

Movie Recommender System: Content-Based Filtering on IMDB Dataset (Natural Language Processing Project(NLP)

Notifications You must be signed in to change notification settings

MissNeerajSharma/Movie-Recommender-System

Repository files navigation

Movie Recommender System: Content-Based Filtering on IMDB Dataset (Natural Language Processing Project(NLP)

Sample Projection using streamlit: Pic

Link to dataset kaggle : https://www.kaggle.com/tmdb/tmdb-movie-metadata?select=tmdb_5000_movies.csv

This repository contains code and resources for building an end-to-end movie recommender system using content-based filtering techniques. The system utilizes the TMDB dataset to recommend movies to users based on their preferences and the characteristics of movies they have enjoyed in the past.

Overview

This project aims to build a movie recommender system using content-based filtering techniques. The system utilizes the IMDB dataset to recommend movies to users based on their preferences and the characteristics of movies they have enjoyed in the past. The recommendation engine employs natural language processing (NLP) techniques to analyze movie attributes such as genres, actors, directors, and plot keywords, creating a profile for each movie and recommending similar ones to users.

Key Features

IMDB Dataset: The project utilizes the IMDB dataset, which contains comprehensive information about movies, including genres, cast, crew, and plot keywords. Content-Based Filtering: The recommendation system implements content-based filtering algorithms to suggest movies based on their similarity to movies the user has liked. Natural Language Processing (NLP): NLP techniques are employed to preprocess textual data, tokenize movie attributes, remove stopwords, and perform stemming or lemmatization. Streamlit Deployment: The recommender system is deployed as a web application using Streamlit, allowing users to interact with the system through an intuitive and user-friendly interface.

Libraries Used

pandas: For data manipulation and analysis. scikit-learn: For machine learning algorithms and tools. NLTK (Natural Language Toolkit): For NLP tasks such as tokenization, stopwords removal, stemming, and lemmatization. Gensim: For word embedding and topic modeling. Streamlit: For building and deploying interactive web applications.

Workflow

Data Collection: Obtain the IMDB dataset containing information about movies, including titles, genres, cast, crew, and plot keywords.

Data Preprocessing: Preprocess the dataset by cleaning the data, handling missing values, and selecting relevant features for analysis.

Feature Engineering: Extract features from the textual data (e.g., genres, cast, crew, plot keywords) and preprocess them using NLP techniques.

Content-Based Filtering: Implement content-based filtering algorithms to recommend movies based on their similarity to movies the user has liked.

Streamlit App Development: Develop a web application using Streamlit to provide an interactive interface for users to input their preferences and receive movie recommendations.

Deployment: Deploy the Streamlit app to a web server or platform like Heroku, making it accessible to users through a web browser.

Requirements

  • Python 3.x
  • Jupyter Notebook (for interactive usage)
  • Required Python libraries (pandas, scikit-learn, Streamlit, etc.)
  • TMDB dataset

Example

# Sample code snippet for content-based filtering
# (Assuming movie profiles and user preferences are already processed)

def recommend_movies(user_preferences, movie_profiles, num_recommendations=10):
    # Calculate similarity between user preferences and movie profiles
    similarity_scores = calculate_similarity(user_preferences, movie_profiles)
    
    # Sort movies based on similarity scores
    recommended_movies = sort_movies_by_similarity(similarity_scores)
    
    # Return top N recommended movies
    return recommended_movies[:num_recommendations]

# Example usage
user_preferences = {'genres': ['Action', 'Adventure'], 'actors': ['Tom Cruise'], 'directors': ['Steven Spielberg']}
recommended_movies = recommend_movies(user_preferences, movie_profiles)
print(recommended_movies)

## Deployment
The recommender system is deployed using Streamlit, a Python library for building interactive web applications. The deployment process involves the following steps:

App Development: Develop the Streamlit app, including the user interface and functionality for receiving user preferences and generating movie recommendations.

Setup Streamlit: Install Streamlit and ensure all necessary dependencies are installed.

Deploy to Server: Deploy the Streamlit app to a web server or platform like Heroku, following the deployment guidelines provided by the platform.

Access the App: Once deployed, users can access the recommender system through a web browser by visiting the app's URL.

## Conclusion
This project demonstrates the development and deployment of a movie recommender system using content-based filtering techniques and the IMDB dataset. By leveraging NLP techniques and Streamlit for deployment, the system provides personalized movie recommendations to users, enhancing their movie-watching experience.

About

Movie Recommender System: Content-Based Filtering on IMDB Dataset (Natural Language Processing Project(NLP)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published