About Us
Platinumlist.net, a pioneering leader in the online event guide and ticketing solution industry, has been revolutionizing the event landscape in the Gulf region since 2009. As the largest ticketing provider in the GCC, we proudly serve an extensive array of events across the United Arab Emirates, Saudi Arabia, Oman, Bahrain, Qatar, and Kuwait from our Dubai‑based headquarters.
About the Role
We’re looking for a Senior Dev Ops / SRE Engineer to own and evolve our AWS infrastructure with a strong focus on reliability, scalability, performance under peak load, and safe delivery of new AWS capabilities. You’ll partner with engineering teams to ensure our platform stays fast and resilient during traffic spikes while continuously improving automation, observability, security, and cost efficiency.
Key Responsibilities
Own production reliability on AWS: availability, latency, throughput, capacity, and incident response.
Architect and operate scalable infrastructure (multi‑AZ as a baseline; DR strategy and regular testing).
Build and maintain Infrastructure as Code (Terraform / Cloud Formation / CDK) and Git‑based workflows.
Improve CI/CD pipelines and deployment strategies (blue/green, canary, progressive delivery).
Implement strong observability: metrics, logs, traces, alerting, dashboards; define SLO/SLI and reduce noise.
Own database operations on AWS (Aurora/RDS My SQL): backups/restores (including restore drills), read replicas, performance troubleshooting, and capacity planning.
Improve caching and traffic handling (CDN, Redis/Elasti Cache, queues) to sustain peak demand.
Harden security posture: IAM least privilege, secrets management, patching, WAF, audit trails.
Drive adoption of relevant AWS managed services (where it increases reliability and reduces ops burden).
Drive cloud cost efficiency (Fin Ops): cost visibility, tagging, budgets/alerts, rightsizing, and smart usage of AWS pricing models without compromising reliability.
Lead post‑incident reviews (RCA, corrective actions, prevention), and ensure improvements are implemented and verified.
Qualifications
10+ years of experience in a similar role.
Strong hands‑on AWS in production (typical stack: VPC, IAM, EC2, ALB/NLB, Auto Scaling, S3, Cloud Front, Route53, Cloud Watch/Cloud Trail, WAF; plus Aurora/RDS).
Proven experience designing/operating high‑load web systems with strict uptime requirements.
Production My SQL on AWS (Aurora/RDS): backups & restores (including restore drills), read replicas, monitoring, and performance troubleshooting.
Ability to troubleshoot production web stacks (Nginx + PHP‑FPM) and identify bottlenecks across app ↔ DB ↔ infrastructure.
Containers and deployment automation (ECS/EKS, Docker; understanding of scaling and rollout patterns).
Solid Linux + networking fundamentals (DNS, TLS, routing, LB, troubleshooting).
Observability practices and incident management experience.
Must be reachable for critical production incidents; occasional after‑hours support may be required (critical‑only).
Nice‑to‑Have
PHP ecosystem familiarity (PHP‑FPM/Nginx, Composer; Laravel/Symfony is a plus).
My SQL internals/performance tuning and advanced replication/proxying (e.g., Proxy SQL).
Serverless & event‑driven AWS (Lambda, SQS/SNS, Event Bridge, Step Functions).
Security & compliance frameworks; chaos testing/load testing.
Benefits
Competitive salary. Remote‑friendly work setup. A chance to make a real impact in a fast‑growing market. Space to grow, experiment, and push boundaries.