Senior Multi-Cloud Infrastructure Engineer
OpsWerks

About the job:
As a Senior Multi-Cloud Support Engineer, you will play a critical role in managing and optimizing multi-cloud environments, with a primary focus on AWS and/or Alibaba Cloud. You will be responsible for ensuring high availability, performance, security, and operational efficiency in mission-critical systems. You will also serve as a key escalation point for troubleshooting complex cloud infrastructure issues and implementing automation-driven solutions to enhance reliability.
This role requires deep hands-on expertise in AWS and Alibaba Cloud, with additional experience in GCP and Azure as a plus. You will work in a dynamic, fast-paced environment, collaborating with DevOps, SRE, and engineering teams to improve cloud operations.
Multi-Cloud Support & Incident Management:
- Serve as a senior escalation point for complex infrastructure issues across AWS and Alibaba Cloud, ensuring timely resolution and minimal downtime.
- Lead incident response, troubleshooting, and root cause analysis (RCA) for cloud service failures, performance degradation, and security incidents.
- Develop and implement incident management playbooks to standardize troubleshooting and reduce resolution times.
- Participate in a 24×7 on-call rotation, ensuring continuous monitoring and rapid response to critical incidents.
Cloud Infrastructure Operations & Optimization:
- Monitor, maintain, and optimize cloud resources across AWS and Alibaba Cloud, ensuring scalability, cost efficiency, and compliance with best practices.
- Manage multi-cloud networking including VPCs, Transit Gateways, VPNs, Load Balancers (AWS ALB/NLB, Alibaba SLB), and Firewalls.
- Perform patch management, system updates, and security hardening to maintain a robust and secure cloud environment.
- Optimize resource utilization and cost management across cloud platforms, leveraging FinOps best practices.
Automation & Infrastructure as Code (IaC):
- Automate routine cloud operations using Terraform, AWS CloudFormation, Alibaba ROS, Pulumi, and Ansible.
- Develop and maintain automation scripts in Bash, Python, or Go to improve cloud reliability and efficiency.
- Implement self-healing mechanisms, auto-scaling strategies, and proactive alerting systems to improve infrastructure resilience.
Security, Compliance & Best Practices:
- Ensure cloud security best practices by managing IAM/RAM policies, encryption, and security group configurations.
- Collaborate with security teams to conduct cloud security audits, risk assessments, and vulnerability management.
- Implement and maintain compliance with industry standards
Collaboration & Technical Leadership:
- Work closely with DevOps, SRE, and development teams to resolve cloud infrastructure challenges and enhance performance.
- Provide mentorship and technical leadership to internal teams on multi-cloud best practices, automation, and cost optimization.
- Document operational procedures, troubleshooting guides, and automation frameworks for knowledge sharing.
About you:
Education:
- Bachelor’s degree or higher in Computer Science, Information Technology, or a related field.
Experience:
Multi-Cloud Infrastructure:
- Primary focus on AWS and Alibaba Cloud – deploying, managing, and optimizing workloads across both platforms with at least 5 years solid working experience in AWS ecosystem.
- Experience with multi-cloud architectures, including hybrid solutions with AWS + Alibaba Cloud, or AWS mixed with GCP/Azure.
Cloud Services & Technologies:
- AWS: EC2, VPC, IAM, S3, RDS, Lambda, Route 53, CloudFormation, CloudFront, CloudWatch, Auto Scaling, ALB/NLB, and Transit Gateway.
- Alibaba Cloud: ECS, VPC, RAM, OSS, RDS, SLB, Function Compute, DNS, CloudMonitor, Auto Scaling, and Resource Orchestration Service (ROS).
- Strong understanding of cloud security best practices, compliance, and cost optimization strategies.
Linux Systems Administration:
- Expertise in Red Hat Enterprise Linux (RHEL), Ubuntu, or other major Linux distributions.
- Experience in hardening, troubleshooting, and performance tuning.
Networking:
- Proficiency in networking across cloud environments (AWS VPC Peering, Transit Gateway, Alibaba Cloud CEN, VPN, Direct Connect).
- In-depth knowledge of TCP/IP, Firewalls (Security Groups, IPTables, ACLs), Load Balancing (AWS ELB/ALB, Alibaba SLB, Citrix NetScaler, NGINX, Envoy Proxy), and routing.
Build & Release Management:
- Hands-on experience with Git (GitHub/GitLab), Artifactory, and CI/CD tools like Jenkins, AWS CodePipeline/CodeDeploy, GitHub Actions, Spinnaker.
- Experience with Java build tools (Maven, Gradle) and automated deployment strategies.
Containerization & Orchestration:
- Kubernetes (EKS, ACK), Docker, HashiCorp Nomad, AWS ECS.
- Experience in container security, networking, and scaling strategies.
Logging & Monitoring:
- AWS CloudWatch, Alibaba CloudMonitor, Prometheus, Thanos, Grafana, Splunk, New Relic.
- Strong observability and alerting strategies for multi-cloud environments.
Scripting & Automation:
- Proficiency in Bash, Python, or Go for automation, cloud resource management, and infrastructure deployment.
- Expertise in Infrastructure as Code (IaC): Terraform, Pulumi, AWS CloudFormation, Alibaba ROS, Ansible, Puppet, Chef.
SRE & DevOps Expertise:
- Strong background in Site Reliability Engineering (SRE), DevOps culture, and automation-driven operations.
- Experience in implementing highly available, scalable, and resilient cloud-native architectures.
Communication & Collaboration:
- Excellent verbal and written communication skills, with experience in cross-functional collaboration.
Certifications (Preferred):
- AWS Certified Solutions Architect / DevOps Engineer / Security
- Alibaba Cloud Professional / Expert Certifications
- Relevant Kubernetes (CKA/CKS), Terraform, or DevOps certifications are a plus.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
ActiveOne: Alpha Wellness Specialist

Manager, Sales Data Analytics

Piping Engineer
