SITE RELIABILITY ENGINEER

YOU HAVE:

  • 3+ years of SRE/DevOps experience working on highly scalable distributed systems;

  • A solid understanding of cloud-based architectures and concepts, with hands-on experience using Public Clouds and Kubernetes;

  • Knowledge of Linux Systems Admin and Database at Minimum;

  • Familiarity with logging and monitoring technologies such as Nagios, Grafana, Prometheus, DataDog, Wavefront, ELK, Rollbar, Sentry, etc.

  • Experience with AWS, Azure, and Google Cloud Platform services

  • Experience working on Terraform/Ansible/Helm

  • Knowledge of relational and non-relational databases, networking, Linux internals, filesystems, web architecture, CI/CD principles using Git

  • Strong analytical and problem-solving skills

  • Strong communication skills — must be able to interact and collaborate with other product and development teams

  • Ability to work in team environment, while being self-directed, proactive and action-oriented

YOU MIGHT ALSO HAVE:

  • Develop and deploy and maintain the network, storage, and server infrastructure for cloud environments that require 24/7 accessibility;

  • Develop and maintain scalable and maintainable software solutions;

  • Ensure our services meet stability, performance and availability requirements;

  • Experience with data analysis and data mappings;

  • Actively engage in design reviews, code, and operational reviews;

  • Build robust, self-healing features and automation that reduce operational cost;

  • Experience with programming languages such as Node.js, Go, Python, Groovy.

TECH STACK:

  • Docker, Jenkins, PowerShell, Terraform, Kibana, Prometheus, Grafana, AWS, Linux, PostgreSQL, MongoDB

RESPONSIBILITIES:

  • Infrastructure maintenance;

  • Maintenance and containerization of the environment;

  • Creation of CI/CD with Git, GitLab, or Jenkins.

APPLY FOR THIS JOB: