HPC Software Engineer
Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to our customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the Emerging Technologies & Computing (ETAC) Group in the Research Computing Support Division (RCSD) of the Information Technology Services Directorate (ITSD) at Oak Ridge National Laboratory (ORNL) to recruit a qualified HPC Software Engineer to support the integration of computing hardware and software tools for accomplishing research tasks across a variety of scientific research areas.
ORNL delivers scientific discoveries and technical breakthroughs needed to realize solutions in energy and national security and provides economic benefit to the nation. This premier research institution located near Knoxville in Oak Ridge, TN, addresses national needs through impactful research and world-leading research centers.
This is a full-time, permanent position follows a Hybrid working model.
Why Cadre5?
- Working with highly talented team members
- 3 weeks’ vacation
- Excellent medical insurance, up to 100% paid by employer
What will you be doing?
ETAC focuses on supporting ORNL researcher’s HPC computing, Data Engineering and Management, Infrastructure as a Service, and new technology needs. You'll be working directly with researchers, supporting their science and understanding scientific problems and the application of advanced research computing tools to help achieve research outcomes. The HPC Scientific software engineers play a crucial role in optimizing computational methods and facilitating groundbreaking research across multiple scientific areas. As an HPC team member, you will recommend computational and/or visualization tools, techniques, and methodologies for the scientific computing aspect of research investigations.
Job Responsibilities:
- Scientific Software and Application Management:
- Understand scientific software users’ requirements: work closely with researchers to understand their computational needs and translate them into efficient HPC applications. Analyze application performance to identify bottlenecks and develop strategies to improve scalability and efficiency on HPC systems. This may involve profiling code, analyzing communication patterns, and tuning system parameters.
- Install and manage scientific software: deploy and maintain a wide range of scientific applications, libraries, and development tools on HPC systems to support research activities.
- Develop custom tools and scripts: develop tools to automate common tasks, improve systems management, and facilitate sophisticated computational workflows. Develop, maintain, and install software for HPC and data intensive architectures, including Graphic Processing Units (GPUs), parallel systems, and other computing environments.
- User support and collaboration:
- Provide software technical support: collaborate with HPC support and scientists on technical issues related to scientific software problems. Following industry standards, implement HPC software with novel programming and optimization techniques. Provide solutions and technical recommendations for code optimization, resource utilization, and system tuning.
- Collaborate on research projects: work closely with researchers to understand their computational requirements and assist in developing efficient computational strategies, code optimization, and parallelization. This includes working with a highly diverse and multidisciplinary team (such as mathematicians, physicists, computer scientists, and engineers) in the research, development, integration, testing, and deployment of research software, data platforms, and machine learning systems for large-scale data analysis.
- Research information dissemination: support research staff in disseminating results in peer-reviewed journals, technical reports, relevant conferences, and open-source software project repos.
- Research and development:
- Stay informed about latest research in HPC and AI.
- Develop and recommend ideas for new programs, products, and features by staying abreast of new technology developments and trends.
- Partnerships and collaboration:
- As applicable/possible- establish and maintain partnerships and collaborations with industry, other groups at ORNL, and HPC networks to share knowledge and best practices.
Basic Qualifications:
- A BS in computer science, computer engineering, information systems, or a related field of study and five (5) to seven (7) years of proven and aligned experience is required. An overall combination of equivalent experience may be considered.
- Three (3) or more years of demonstrated abilities in the following areas:
- High Performance Computing (HPC) environments and HPC scheduling software.
- Software development including version control using GitWith open-source tools and software.
- Python and data analysis modules such as Pandas, NumPy, and Dask.
- Developing software in C/C++, Fortran or other programming languages
- The ability to obtain and maintain a Department of Energy "Q" clearance is required. This requires US Citizenship.
Preferred Qualifications:
- In-depth understanding of HPC architectures and their optimization techniques.
- Experience in the following areas:
- Optimizing and parallelizing software products for HPC using MPI or other open-source tools.
- HPC debugging tools such as DDT, GDB or Valgrind.
- AI toolkits such as PyTorch, RAPIDSAI, TensorFlow, or Keras.
- Statistical analysis software such as Python or R.
- Building and running containerized applications in an HPC environment.
- Cluster deployment tools such as Warewulf, PXEboot, and/or Bright.
- Managing systems.
- Working in a government, scientific, or other highly technical environment.
- Knowledge of multiple operating systems including Linux.
- Exposure to microservices concepts and understanding of container environments including Podman, Docker, and Kubernetes.
- Proven ability to balance sophisticated research and security requirements.
Benefits
Cadre5 offers excellent pay and benefits, to include full medical, dental, and vision coverage coupled with 401K match, 15 days PTO, and 10 holidays.
Cadre5 is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. Cadre5 is an E-Verify Employer.