Location: Mumbai / Bangalore
Experience: 5-10 years
Full-Time
Roles & Responsibilities:
Design, deploy, and manage cloud infrastructure (AWS / Azure / GCP).
Build and maintain CI/CD pipelines to automate build, test, and deployment workflows.
Containerize applications using Docker and manage deployments on Kubernetes clusters.
Implement Infrastructure as Code using Terraform / CloudFormation.
Monitor system performance and reliability using tools like Prometheus, Grafana, ELK, Datadog, etc.
Ensure system security, scalability, and high availability across environments.
Optimize resource usage and cost efficiency in cloud architectures.
Troubleshoot production issues and lead root cause analysis and post-mortems.
Collaborate closely with development teams to streamline release processes and improve deployment velocity.
Maintain documentation for architecture, workflows, and operational procedures.
Requirements:
5+ years of hands-on experience in DevOps / Cloud / Site Reliability Engineering roles.
Strong experience with Docker and Kubernetes in production environments.
Proven ability to build and maintain CI/CD pipelines (GitHub Actions / GitLab CI / Jenkins / ArgoCD / etc.).
Solid understanding of Linux systems, networking, DNS, load balancing, and security best practices.
Working knowledge of at least one major cloud provider (AWS, Azure, or GCP).
Experience using Infrastructure as Code (Terraform / CloudFormation).
Familiar with logging, monitoring, and alerting stacks (e.g., Prometheus, Grafana, ELK).
Strong problem-solving and debugging skills, especially in distributed systems.
Ability to collaborate across engineering, product, and support teams.
Clear communication and documentation.