Shell scripts for AWS EMR clusters
-
Updated
Jan 25, 2018 - Shell
Shell scripts for AWS EMR clusters
Analysis of Airline On Time Performance Dataset
Performing various product review analysis on Amazon dataset using Apache Spark and MongoDB
Detect Tight Communities in a social Network
Lambda to start EMR and run a map reduce job
Analysis performed on data from the Steam platform using Apache Spark and Cloud services such as Amazon Web Services.
TU Berlin Cloud Computing - correctly implemented assignment4
Load data from the Million Song Dataset into a final dimensional model stored in S3.
Data Engineering Projects including Data Modeling, Data Warehouse, Data Lake Development
Example for provisioning AWS EMR service with Terraform
Built a data model, data warehouse and pipeline for extracting transforming and loading data into a star schema-based data model in a redshift database
Run a Spark job within Amazon EMR
AWS EMR backed Spark cluster for analyzing Yelp Data
MLP for Sentiment Analysis on Movie's Reviews.
Udacity project: implementing an ETL to process data with Apache Spark and store them in AWS S3 storage
ETL Pipeline extracts JSON files from AWS S3 bucket and transforms these using an AWS EMR Spark Cluster and stores the data into an AWS S3 bucket in parquet file format.
Data Pipeline Analytics Platform is an end-to-end generic Big Data pipeline. Involves following tech stack: AWS S3, AWS Redshift, AWS EMR Cluster, Apache Spark, Apache Airflow.
EMR + Hadoop to Redshift ELT workflow using spark steps API and orchestrated by Apache-Airflow, which ingests disparate datasets focused around 7Gb of I94 arrivals information to produce a simple star schema in Redshift
Credit defaulting results in a large profit loss to banks and other credit lenders. The success of the banking industry results in the ability to understand risk. This project uses big data technologies like Mapreduce, HDFS along with PySpark and AWS for analysis of credit history and its prediction
Add a description, image, and links to the aws-emr-clusters topic page so that developers can more easily learn about it.
To associate your repository with the aws-emr-clusters topic, visit your repo's landing page and select "manage topics."