Nexthink is the global leader in digital employee experience management. Our products allow enterprises to create highly productive digital workplaces for their employees by delivering optimal end-user experiences. Through a unique combination of real-time analytics, automation and employee feedback, Nexthink gives IT teams the insight they need to empower and even delight people at work.
Headquartered in Switzerland with US headquarters in Boston, Nexthink also has offices in France, UK, Germany, Spain and UAE. Our growing team of Nexthinkers is proud to be making the digital work lives of nearly ten million employees across 1,000 customers more productive.
At Nexthink, we believe actions are stronger than words when it comes to diversity, inclusivity, and equity in the workplace. Nexthinkers are multinational and multilingual, and come from all walks of life. We are committed to hiring a genuinely representative workforce that can create solutions and foster innovation for the modern digital employee experience. Join us today!
Nexthink is looking for passionate and innovative professionals that are keen to join a newly formed and fast growing Cloud Operations team in Boston. The team is being built to ensure our Cloud platform is operated using best in class methodologies and tools and allow us to delight our clients with the best cloud experience.
The team is responsible for maintaining our Cloud solutions with top performance, availability and service level, but also ensure that it runs in a cost-efficient way. The Cloud Operations Engineer will also use her/his Software Engineering skills to prototype and deliver tools and products that will help reaching those goals, and will also participate into the operational requirements process.
Finally, you will be part of a fast growing, international company with an opportunity to join the Cloud team, a strategic initiative that will help accelerate this growth. Responsibilities
Monitoring: Use and own the specifications of our tooling set related to monitoring, telemetry, reliability, automation for End to End service
Incident management and response: Detect, diagnose and fix incidents finding solutions to achieve required Service Levels (rollback, restore backups, etc). Owner of the post-mortem process of such incidents by writing technical content both for customers and internal stakeholders.
Operations: Define or build automation mechanisms for cloud operations: build, deploy, update, patch, backup, restore, scale, extend, protect, etc. Use past experience to solve most relevant issues in a proactive fashion by either writing product or platform specifications, or building the required automation to prevent the issues to surface again.
Change Control: Owning the product update process for live client instances
Reliability: Manage the availability of the production instances of our cloud services. Understand and be able to communicate the scale, capacity, security, redundancy and performance attributes and requirements of the cloud services
Subject Matter Expert: be the ultimate escalation point for major platform related incidents
Monitors the applications and compliance of security administration procedures and reviews information systems for actual or potential breaches in security.
Ensures that all identified breaches in security are promptly and thoroughly investigated and that any system changes required to maintain security are implemented.
Ensures that security records are accurate and complete and that request for support are dealt with according to set standards and procedures.
Contributes to the creation and maintenance of policy, standards, procedures and documentation for security.
Experience with monitoring solutions, such as: AWS CloudWatch, Azure Monitor, New Relic, Datadog, Nagios, Zabbix, etc
Solid understanding of cloud architectures (compute, storage, databases) and networks (TCP/IP, VPN, HTTP, SSL, routing, etc.)
Experience administering and deploying on Public Cloud platforms (Azure, AWS)
Practice knowledge of containers technology (docker) and orchestration (Kubernetes, docker swarm)
3 years of experience in Software Development Lifecycle Management, with knowledge of:
Infrastructure and automation (e.g. Terraform, ARM, CloudFormation)
CI/CD, testing and deployment tools (e.g. Jenkins, Git, GitHub)
Configuration Mgmt tools (e.g. Ansible, Rundeck)
At ease with operating and managing production systems, solving issues striking the right balance between urgency and methodology.
Strong problem solving and analytical skills
Experience in coordinating teams and persons to maintain a SLA.
Excellent written and verbal skills in English
This is an exceptional opportunity to join a fast-growing, successful and innovative company. Nexthink allows you to thrive in a unique work environment where the emphasis is on excellence, innovation, openness and collaboration.