SITE RELIABILITY ENGINEER
YOU HAVE:
3+ years of SRE/DevOps experience working on highly scalable distributed systems;
A solid understanding of cloud-based architectures and concepts, with hands-on experience using Public Clouds and Kubernetes;
Knowledge of Linux Systems Admin and Database at Minimum;
Familiarity with logging and monitoring technologies such as Nagios, Grafana, Prometheus, DataDog, Wavefront, ELK, Rollbar, Sentry, etc.
Experience with AWS, Azure, and Google Cloud Platform services
Experience working on Terraform/Ansible/Helm
Knowledge of relational and non-relational databases, networking, Linux internals, filesystems, web architecture, CI/CD principles using Git
Strong analytical and problem-solving skills
Strong communication skills — must be able to interact and collaborate with other product and development teams
Ability to work in team environment, while being self-directed, proactive and action-oriented
YOU MIGHT ALSO HAVE:
Develop and deploy and maintain the network, storage, and server infrastructure for cloud environments that require 24/7 accessibility;
Develop and maintain scalable and maintainable software solutions;
Ensure our services meet stability, performance and availability requirements;
Experience with data analysis and data mappings;
Actively engage in design reviews, code, and operational reviews;
Build robust, self-healing features and automation that reduce operational cost;
Experience with programming languages such as Node.js, Go, Python, Groovy.
TECH STACK:
Docker, Jenkins, PowerShell, Terraform, Kibana, Prometheus, Grafana, AWS, Linux, PostgreSQL, MongoDB
RESPONSIBILITIES:
Infrastructure maintenance;
Maintenance and containerization of the environment;
Creation of CI/CD with Git, GitLab, or Jenkins.
APPLY FOR THIS JOB: