Principal Application Reliability/Site Reliability Engineer

Requisition Number:  50663
Location: 

Trevose, PA, US, 19053

International SOS is the world’s leading medical and security services company with over 12,000 employees working in 1,000 locations in 90 countries. We were founded on the principle of putting our clients’ employees first and this is still true today. Led by 5,200 medical professionals and 200 security specialists our teams work night and day to find solutions to protect our clients and their employees in whatever situation they may be facing; we assess, advise and assist from a medical, security and logistical perspective on a global scale to protect and save lives and thereby enable our clients to achieve their business goals. As we’ve delivered on this mission over the last 35 years, we have become the market leader in global telehealth services and digital health solutions for an extensive client base of Fortune 500 companies, NGO’s and governments around the world.

About the role:

 

We are seeking a Principal ARE/SRE to be responsible for keeping all user-facing services and production systems running smoothly. Application/Site Reliability Engineer(s) a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to environments and the codebase.

Key responsibilities:

 

  • Be on rotation for availability incidents and provide support for customer service engineers.
  • Proactively develop scripts and tools to prevent incidents from ever happening.
  • Run infrastructure and applications with modern tools and automation like Puppet, Terraform, Kubernetes, etc.
  • Develop a comprehensive monitoring and alerting alert on symptoms and potential issues to prevent outages.
  • Measure and optimize system performance, to push our capabilities forward, get ahead of customer needs, and innovate to continually improve
  • Provide primary operational support and engineering for multiple large distributed software applications
  • Document every action so findings turn into repeatable actions–and then into automation.

Key responsibilities cont:

 

  • Improve the deployment process to make it as smooth and effortless as possible.
  • Design, build and maintain core infrastructure pieces that allow scaling to support enterprise-level of concurrent users.
  • Debug production issues across services and levels of the stack.
  • Plan the growth of infrastructure and capacity planning.
  • Provide technical leadership of the SRE team (internal or through Managed Services)
  • Proactively working with development leads, client service leads, solution architects, and infrastructure leads to enhance system reliability, scalability, and robustness.

About you:

 

  • Thorough, detailed, and careful planning, development, and execution
  • Proactively looking for areas to improve
  • Clear communication with all involved parties
  • Calm under pressure
  • Clear sense of ownership and accountability
  • 15+ years of hands-on experience with Windows and Linus operating systems, databases (SQL & Non-SQL)
  • In-depth knowledge and proven record of building and operating highly available, scalable, large-scale enterprise applications on AWS or Azure, or other Open Stack clouds.
  • Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
  • 10+ years of SRE or closely related experiences for large-scale cloud SaaS 10+ years of hands-on technical experiences in DevOp, Release Management Engineering, or similar areas.
  • Strong experience with Monitoring tools: Datadog, Prometheus, Grafana, Cloudwatch, ELK, etc. 
  • Extensive knowledge of config management systems
  • Strong programming skills, Net, Java, Python, JavaScript, etc.
  • B.S. in Computer Science or Software Engineering. M.S. in similar fields preferred.
  • Minimal Occasional travel domestically in US.
  • On call Rotation

Benefits:

  • Competitive salary and incentive program
  • Warm, supportive, and open company culture
  • An opportunity to thrive in a global environment 
  • Hybrid working
  • Comprehensive Benefits and Vacation Package

International SOS is an equal opportunity employer and does not discriminate against employees or job applicants on the basis of race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, genetic information, marital status, amnesty or status as a covered veteran in accordance with applicable federal, state and local laws.


Nearest Major Market: Philadelphia