Skip to content

Container for testing AWS Glue pyspark scripts

Notifications You must be signed in to change notification settings

chriswessells/awsgluepyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Supported Tags

Quick Reference

What is the awsgluepyspark container

Docker image with dependencies Spark, PySpark, Hadooop, and awsglue modules to speed the development of AWS Glue ETL scripts. The images are built with the amazonlinux2 base image.

AWSGluePySpark is a Docker container where you can run AWS Glue PySpark scripts. The AWSGluePySpark container is one piece of a larger process of applying the Test Driven Development (TDD) processes to developing AWS Glue scripts. The TDD process can increase the velocity when developing software.

You can retrieve the docker image from docker hub:

Python 3 libs

  • Python 3.7.5
  • pip3
  • Glue 1.0
  • pytest
  • boto3
  • scipy
  • numpy
  • pandas
  • PyGreSQL
  • scikit-learn

Python 2 libs

  • Python 2.7.5
  • pip
  • Glue 1.0
  • pytest
  • boto3
  • scipy
  • numpy
  • pandas
  • PyGreSQL
  • scikit-learn

Adding libraries

The intended use is to help in automating Analytics workloads using AWS Glue. If you need libraries outside the default list of dependencies installed in the default endpoints, AWS Glue supports including packages to extend the builtin functionality.

testing code with the container

Download the docker container for your version of Python. A how-to for testing AWS Glue scripts are outside the scope. I included enough details for you to fill in the gaps and understand how the container works.

AWS Glue testing commands

Container PATH includes the commands to test the glue scripts.

  • gluepytest
  • gluepyspark
  • gluesparksubmit

Strategies to test scripts

Instructions to setup environments are outside the scope of this repo.

Contact

If there is a problem using the container feel free to open an issue.

About

Container for testing AWS Glue pyspark scripts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published