Book a Demo
careers
Careers

Senior DevOps Engineer

Senior DevOps Engineer
  • Remote (US or India)
  • Full Time
  • Competitive
We're seeking a Senior DevOps Engineer to build and lead our infrastructure monitoring, observability, and reliability practices from the ground up. We're ready to modernize our infrastructure monitoring - we need someone to transform it into a proactive, modern DevOps operation. You'll select and implement monitoring tools, establish alerting strategies, create runbooks, and build a culture of reliability across our development teams. This role requires someone who can see beyond traditional network monitoring to implement full-stack observability - from application performance to user experience metrics. You'll work with our development teams to bake monitoring into everything we build and establish SLIs/SLOs that actually matter for our healthcare platform. Perfect for DevOps professionals who love building observability practices from scratch and can transform how an organization thinks about reliability.

Responsibilities:

  • Design and implement comprehensive monitoring strategy for applications and infrastructure
  • Select, install, and configure monitoring tools (APM, logging, metrics, tracing)
  • Set up application performance monitoring to track response times, errors, and throughput
  • Implement infrastructure monitoring for servers, databases, and cloud resources
  • Create intelligent alerting rules that minimize noise and catch real issues
  • Establish SLIs (Service Level Indicators) and SLOs (Service Level Objectives)
  • Build dashboards for different stakeholders (developers, management, support)
  • Set up centralized logging and log aggregation systems
  • Implement distributed tracing for debugging complex issues
  • Create automated incident response and escalation procedures
  • Develop runbooks and automation for common issues
  • Train development teams on observability best practices
  • Establish on-call rotations and incident management processes
  • Conduct post-mortems and drive continuous improvement
  • Implement synthetic monitoring and automated testing
  • Set up cost monitoring and optimization for cloud resources

Requirements:

  • Bachelor’s degree in Computer Science or related field
  • 5+ years of experience in DevOps, SRE, or infrastructure roles
  • Strong experience with monitoring tools and platforms
  • Expertise in cloud platforms (Azure preferred, AWS acceptable)
  • Proficiency in scripting (Python, PowerShell, Bash)
  • Experience with Infrastructure as Code (Terraform, ARM templates)
  • Understanding of application architecture and performance patterns
  • Knowledge of networking, security, and system administration
  • Experience with CI/CD pipelines and automation
  • Strong analytical and troubleshooting skills
  • Excellent communication skills to work with diverse teams
  • Experience building monitoring practices from scratch

Will be a plus:

  • Specific Tool Experience:
    • APM Tools: New Relic, Datadog, AppDynamics, Dynatrace, Application Insights
    • Metrics/Monitoring: Prometheus, Grafana, Zabbix, Nagios
    • Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Sumo Logic
    • Tracing: Jaeger, Zipkin, AWS X-Ray
    • Incident Management: PagerDuty, OpsGenie, VictorOps
    • Cloud Monitoring: Azure Monitor, CloudWatch, Google Operations
  • Healthcare industry experience and HIPAA compliance knowledge
  • Experience with .NET application monitoring
  • Knowledge of database performance monitoring (SQL Server)
  • Experience with container monitoring (Docker, Kubernetes)
  • Chaos engineering and reliability testing experience
  • FinOps and cloud cost optimization experience
  • ITIL or incident management certifications
  • Experience transforming traditional IT teams to DevOps

We offer:

  • Competitive senior-level compensation package
  • Opportunity to build DevOps practices from the ground up
  • Full remote work with flexible schedule
  • Budget for tools and platform implementation
  • Generous PTO and flexible time off
  • Performance-based bonuses
  • Authority to select and implement tools
  • Direct impact on platform reliability and performance
  • Collaboration with development teams globally
  • Low on-call burden once systems are properly built
Send Resume