Principal Cloud DevOps Engineer, Research Computing

We have an exciting opportunity for a Principal Cloud DevOps Engineer to join the R&D Research Computing Engineering and DevOps team.


The team is responsible for the design and implementation of a cloud research computing environment as a service, to enable R&D machine learning and data science with security, agility, and scalability. You will directly influence and shape the environment that is utilized to develop the latest AI technology to Nuance products, such as Dragon Ambient Experience and Agent AI, among many more. 


Principal duties and responsibilities: 

  • Be part of a team leading a transformation from an on-premise to a cloud/hybrid environment supporting evolving cutting-edge AI model development, data management, model training and inference.
  • Design and implement new tools and processes to automate configuration, deployment, monitoring, and operational activities using cloud infrastructure DevOps best practices and security-first principles.
  • Actively participate in team architecture and code reviews for design and implementation of scalable software and infrastructure-as-code; continuously seek opportunities for reuse and efficiency, automation and testing capabilities.
  • Troubleshoot issues until root causes are understood on deployed systems; identify methodology to address in monitoring and testing, for diagnosis and prevention in the future.
  • Drive continual improvements for availability, performance, observability, quality and cost effectiveness.
  • Collaborate with R&D users and understand their needs / expectations while enforcing best practices.



Bachelor/Master’s degree in Technology discipline (Computer Science/Engineering)

Years of work experience: 5+ years cloud infrastructure and software delivery experience


Required skills:

  • Strong background in DevOps concepts and tools; well-disciplined in continuous integration and delivery, change management.
  • Proficiency with orchestration and configuration automation frameworks, such as Terraform, Chef, Ansible, or Puppet, necessary for deploying and managing a full-stack cloud environment.
  • Proficiency with Linux system administration and with the development of IAAS and automation tools.
  • Ability to create and utilize templates to automate build processes, including image generation.
  • A team player capable of high performance, flexibility in a dynamic working environment with Agile methodologies.
  • Excellent people and negotiation skills, able to listen to a diversity of technical opinions and build consensus, as well as propose new ideas and gain support for them.
  • Skill and ability to train others on technical and procedural topics.
  • Experience with:

                 -Azure, AWS, or GCP

                 -Slurm or other scheduling / workflow managers

                 -Kubernetes, Kubeflow, Docker, Singularity

                 -Nvidia GPUs, TensorFlow and PyTorch, and other widely used deep learning frameworks.


Preferred skills:

  • Experience designing and building Research-focused or data-centric/machine-learning environments on primary cloud provider infrastructure, including services for big data analytics, scale out HPC applications, GPU-centric ML and deep learning.
  • Multi-Cloud provider certified professional.
  • Previous experience in the Healthcare industry and/or with HPC Research Environments
  • Experience with securing environments on one or many of the major cloud providers to an industry certification standard, such as PCI or HITRUST.

Additional Info

Job Type : Full-Time

Education Level : Bachelors Degree

Experience Level : Mid to Senior Level

Job Function : Engineering

Apply at: :

Powered By GrowthZone