Site Reliability Engineer (SRE), Paya Lebar
Site Reliability Engineer (SRE), Paya Lebar
-
Paya Lebar
-
Posted: yesterday
-
Save
Description
We are partnering with a fast-growing regional technology platform to hire an experienced Site Reliability Engineer (SRE) to support large-scale, high-availability internet systems across Southeast Asia. This role will focus on system reliability, incident response, infrastructure optimization, and operational excellence in a high-traffic production environment. Responsibilities
Ensure the stability, reliability, and performance of business-critical systems and applications
Manage application deployment, configuration changes, monitoring, capacity planning, and operational maintenance
Perform root cause analysis and troubleshooting for production incidents and critical system failures
Drive system reliability improvements including high availability, fault tolerance, disaster recovery, rate limiting, and service degradation mechanisms
Optimize system performance and critical service links through performance analysis and architecture improvements
Develop and maintain operational SOPs, incident response procedures, and disaster recovery plans
Establish and track SLO metrics and follow up on reliability improvement initiatives
Build and improve operational tooling, automation, and platformization to enhance operational efficiency and security
Collaborate closely with engineering and business teams to ensure smooth delivery and stable operations
Provide IT and infrastructure troubleshooting support for Singapore office network-related issues, including LAN/WIFI/connectivity troubleshooting with cloud vendors
Requirements
Minimum 5 years of experience in Site Reliability Engineering / DevOps / Infrastructure Operations within internet or technology companies
Strong troubleshooting and incident management experience in large-scale production environments
Familiar with JVM memory management and GC mechanisms, with ability to troubleshoot Java process-related issues
Hands-on experience with middleware and distributed systems including Nginx, Zookeeper, Kafka, RocketMQ, Redis, Memcache, Twemproxy, etc.
Familiar with monitoring and observability tools such as Grafana, Prometheus, Zabbix, etc.
Experience supporting high-concurrency, high-availability, and microservices architecture environments
Proficiency in at least one or two scripting/programming languages such as Python, Shell, Go, or Java
Experience in capacity planning, service governance, and end-to-end system reliability management is highly preferred
Familiarity with SRE operational frameworks and best practices is advantageous
Basic networking knowledge with ability to troubleshoot office and cloud-related networking issues
Strong analytical thinking, communication skills, and ability to work effectively under pressure
Ability to communicate effectively in Mandarin to support coordination with Mandarin-speaking stakeholders and regional technical teams TRULYYY PTE. LTD. Senior Consultant Yang Suyu EA License No: 20S0118 EA Registration Number: R2199541
Ensure the stability, reliability, and performance of business-critical systems and applications
Manage application deployment, configuration changes, monitoring, capacity planning, and operational maintenance
Perform root cause analysis and troubleshooting for production incidents and critical system failures
Drive system reliability improvements including high availability, fault tolerance, disaster recovery, rate limiting, and service degradation mechanisms
Optimize system performance and critical service links through performance analysis and architecture improvements
Develop and maintain operational SOPs, incident response procedures, and disaster recovery plans
Establish and track SLO metrics and follow up on reliability improvement initiatives
Build and improve operational tooling, automation, and platformization to enhance operational efficiency and security
Collaborate closely with engineering and business teams to ensure smooth delivery and stable operations
Provide IT and infrastructure troubleshooting support for Singapore office network-related issues, including LAN/WIFI/connectivity troubleshooting with cloud vendors
Requirements
Minimum 5 years of experience in Site Reliability Engineering / DevOps / Infrastructure Operations within internet or technology companies
Strong troubleshooting and incident management experience in large-scale production environments
Familiar with JVM memory management and GC mechanisms, with ability to troubleshoot Java process-related issues
Hands-on experience with middleware and distributed systems including Nginx, Zookeeper, Kafka, RocketMQ, Redis, Memcache, Twemproxy, etc.
Familiar with monitoring and observability tools such as Grafana, Prometheus, Zabbix, etc.
Experience supporting high-concurrency, high-availability, and microservices architecture environments
Proficiency in at least one or two scripting/programming languages such as Python, Shell, Go, or Java
Experience in capacity planning, service governance, and end-to-end system reliability management is highly preferred
Familiarity with SRE operational frameworks and best practices is advantageous
Basic networking knowledge with ability to troubleshoot office and cloud-related networking issues
Strong analytical thinking, communication skills, and ability to work effectively under pressure
Ability to communicate effectively in Mandarin to support coordination with Mandarin-speaking stakeholders and regional technical teams TRULYYY PTE. LTD. Senior Consultant Yang Suyu EA License No: 20S0118 EA Registration Number: R2199541
Highlights
-
Company nametrulyyy pte. ltd.
-
Job positionSite Reliability Engineer (SRE)
Safety Tips
Be careful: if it seems too good to be true, it most likely is.
More info about this ad
Site Reliability Engineer (SRE) has been posted in the Geylang Engineering category on Locanto.
Right now, this is the only ad posted in this category in Geylang.
Interested in more? Widen your search to view ads in nearby areas of Geylang. This includes Engineering in Hougang, Serangoon and Singapore. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.