Our customer is one of the biggest multi-brand online retailer in the UK. It's our long-playing customer and now they aim to enhance the use of its existing data by improving its storage capabilities and accessibility. The goal of the project is to build a system from the scratch based on Data Lake architecture with Query Layer which provides ability for multiple data consumer systems to query data form the lake. Additional challenge in Query Layer design is to make it auto-scalable to handle peaks in demand with high uptime percentage as for HA characteristicEXPECTED STACK OF TECHNOLOGIES:
- AWS Services - EMR, S3, Lambda, EC2, RDS, CloudFormation and etc.
- Hortonworks Data Platform(HDP)
- Apache Knox
- Experience with AWS IaaS components
- 2+ years of experience in support/configuring of Linux/Unix servers;
- Configuration management system (Chef/Puppet/Ansible)
- Understanding of CI/CD and project life-cycle principles.
- Knowledge of Scripting: Python/Bash;
- Experience in Hadoop, Hive, HDFS (mandatory) will be a big plus
- Basic Scala - will be a big plus
- Good spoken and written English.
- Monitoring and support for servers and clients.