Careers

Senior DevOps Engineer

Remote (US or India)
Full Time
Competitive

We're seeking a Senior DevOps Engineer to build and lead our infrastructure monitoring, observability, and reliability practices from the ground up. We're ready to modernize our infrastructure monitoring - we need someone to transform it into a proactive, modern DevOps operation. You'll select and implement monitoring tools, establish alerting strategies, create runbooks, and build a culture of reliability across our development teams. This role requires someone who can see beyond traditional network monitoring to implement full-stack observability - from application performance to user experience metrics. You'll work with our development teams to bake monitoring into everything we build and establish SLIs/SLOs that actually matter for our healthcare platform. Perfect for DevOps professionals who love building observability practices from scratch and can transform how an organization thinks about reliability.

Responsibilities:

Design and implement comprehensive monitoring strategy for applications and infrastructure
Select, install, and configure monitoring tools (APM, logging, metrics, tracing)
Set up application performance monitoring to track response times, errors, and throughput
Implement infrastructure monitoring for servers, databases, and cloud resources
Create intelligent alerting rules that minimize noise and catch real issues
Establish SLIs (Service Level Indicators) and SLOs (Service Level Objectives)
Build dashboards for different stakeholders (developers, management, support)
Set up centralized logging and log aggregation systems
Implement distributed tracing for debugging complex issues
Create automated incident response and escalation procedures
Develop runbooks and automation for common issues
Train development teams on observability best practices
Establish on-call rotations and incident management processes
Conduct post-mortems and drive continuous improvement
Implement synthetic monitoring and automated testing
Set up cost monitoring and optimization for cloud resources

Requirements:

Bachelor’s degree in Computer Science or related field
5+ years of experience in DevOps, SRE, or infrastructure roles
Strong experience with monitoring tools and platforms
Expertise in cloud platforms (Azure preferred, AWS acceptable)
Proficiency in scripting (Python, PowerShell, Bash)
Experience with Infrastructure as Code (Terraform, ARM templates)
Understanding of application architecture and performance patterns
Knowledge of networking, security, and system administration
Experience with CI/CD pipelines and automation
Strong analytical and troubleshooting skills
Excellent communication skills to work with diverse teams
Experience building monitoring practices from scratch

Will be a plus:

Specific Tool Experience:
- APM Tools: New Relic, Datadog, AppDynamics, Dynatrace, Application Insights
- Metrics/Monitoring: Prometheus, Grafana, Zabbix, Nagios
- Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Sumo Logic
- Tracing: Jaeger, Zipkin, AWS X-Ray
- Incident Management: PagerDuty, OpsGenie, VictorOps
- Cloud Monitoring: Azure Monitor, CloudWatch, Google Operations
Healthcare industry experience and HIPAA compliance knowledge
Experience with .NET application monitoring
Knowledge of database performance monitoring (SQL Server)
Experience with container monitoring (Docker, Kubernetes)
Chaos engineering and reliability testing experience
FinOps and cloud cost optimization experience
ITIL or incident management certifications
Experience transforming traditional IT teams to DevOps

We offer:

Competitive senior-level compensation package
Opportunity to build DevOps practices from the ground up
Full remote work with flexible schedule
Budget for tools and platform implementation
Generous PTO and flexible time off
Performance-based bonuses
Authority to select and implement tools
Direct impact on platform reliability and performance
Collaboration with development teams globally
Low on-call burden once systems are properly built