Skip to content

Cameronwood611/datamining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Oscars Clustering and Prediction

Background

Many of us have watched the Oscars but have you ever wondered if there was a way to predict which movies would win ahead of time? That's our goal and we have a few different ways to try and do this.

In order to make this happen, we must start with a fairly large dataset of all the Oscar nominees and winners so far. Not sure if you realize this, but it's a lot. Thankfully, we found a dataset on kaggle that has some interesting attributes about each entry but not enough! Therefore, we had to manually scrape a bunch of data off imdb.

Once we collected our data, we had to combined the two documents and clean them. Merging the documents was fairly straight forward but cleaning was tricky. There were many subtleties (having commas in large numbers, abbreviating 1000 as 1K, special characters not translating etc.) but after some time, we cleaned it up nicely.

With our clean dataset, we were able to move on to some really cool clustering and predictive algorithms.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •