This position is located in NYC.
What you will do:
As the ideal candidate you will have experience working on backend data pipelines and be comfortable navigating remote VMs and using containerization technology. You will be expected to build scalable data ingress and egress pipelines across data storage products, deploy new ETL pipelines and diagnose, troubleshoot and improve existing data architecture.
Responsibilities
- Be able to create and deploy ETL pipelines on AWS cloud architecture, using tools including but not limited to AWS Glue, RedShift, S3, EMR & Lambda functions.
- Troubleshoot and fix bugs and diagnose data issues
- Work independently to solve issues while collaborating with your direct Data Engineering team as well as the larger Product & Technology organization.
- Write unit and integration tests with a goal of full critical path code coverage with a robust logging & reporting focus.
- Provide training to team members on areas of expertise
Experience
- 3-5+ years’ professional experience
- Fluency in one (or more) programming language such as Python, PySpark
- Experience writing ETL pipelines on a cloud infrastructure, especially in Python
- Solid understanding of SQL and RDBMS
- Understanding of distributed data processing (Hadoop, map/reduce, ect)
- Knowledge of Docker and Kubernetes
- AWS experience is Key, as well as accessing VMs, serverless technology and distributed data processing