Overview:
If you've seen the good, the bad, and the ugly and want your turn to build it right, come join us.
We're a startup (with excellent benefits) that provides a Platform-as-a-Service for industrial customers. Key to success is automating everything, quickly learning from problems, and continually innovating better ways to work.
We are seeking a talented individual with experience architecting, developing, deploying, monitoring and managing cloud and on-premises workloads at scale. We need someone who knows what to do, has the skills to build it, and feels the ownership to support it.
You might be a good fit if:
- You are hands on, deep in Kubernetes, curious, and talented.
- You thrive in the excitement, camaraderie, empowerment, responsibility, and flexibility of a startup environment.
- You consistently focus on what matters most, shifting effort away from lower-value tasks.
- You are both opinionated and open to new ideas.
- You appreciate that the customer experience is paramount.
- You value and contribute to a culture of psychological safety.
As part of this job, you will:
- Design, build, and maintain infrastructure-as-code, app deployment, and system update solutions for customers around the world in cloud and on-prem.
- Develop and maintain automation tools and processes for deployment, monitoring, and configuration management with tools such as k8s, Azure, and Pulumi.
- Define our Site Reliability Engineering (SRE) strategy and determine appropriate SLOs and SLIs.
- Develop and implement best practices for system reliability, operability, and security.
- Reduce toil and boring work by scripting routine tasks and automating self-repair.
- Collaborate with team to determine functional and non-functional requirements, reliability strategies, and influence the product roadmap.
- Solve problems relating to production issues and create solutions to prevent problem recurrence.
- Apply troubleshooting skills, use debugging tools, and examine logs, telemetry, and other methods to verify assumptions and customer impact. Proactively address findings.
- Stay current with industry trends, emerging technologies, and best practices in site reliability engineering and cloud/edge computing.
Required Skills and Qualifications:
- Kubernetes Certified Application Developer (or experience demonstrating k8s expertise)
- 5+ years experience with infrastructure-as-code tools
- 3+ years DevSecOps experience
- 3+ years Azure Experience
- Experience managing high availability, stateful workloads
Bonus Points for:
- Pulumi Experience
- Azure Arc experience
- Azure Certifications
- Experience with CI/CD and GitOps (Flux, ArgoCD, or Rancher Fleet)
- Experience with Inductive Automation's Ignition Platform
- Experience managing workloads on-prem
- Knowledge of configuration management best practices and tools (e.g. Azure App Configuration, Ansible, Chef, Puppet, etc.)
- Experience with OT and industrial/manufacturing customers
What We Offer:
- A dynamic and fast-paced work environment with opportunities for rapid growth and development.
- A competitive salary and benefits package (Medical/Dental/Vision, 401K).
- An opportunity to work with a talented team of engineers and developers on a cutting-edge product.
- The chance to shape the future of our company and its platform.
- Flexible work arrangements, including remote work options.
- A culture that values innovation, collaboration, openness, and continuous learning.