Position Overview
We are looking for a skilled NVIDIA DGX Infrastructure Engineer to join our dynamic team. As a DGX Infrastructure Engineer, you will be responsible for managing NVIDIA DGX-based infrastructure. You will play a crucial role in ensuring the optimal performance, reliability, and scalability of NVIDIA DGX infrastructure.
Key Responsibilities
- Managing NVIDIA DGX systems and related infrastructure.
- Configuring and optimizing DGX clusters for performance, reliability, and scalability.
- Collaborating with data scientists, AI engineers, and IT teams to integrate DGX systems into the overall AI and deep learning workflows.
- Monitoring system performance and implementing proactive measures to maintain optimal operation.
- Troubleshooting and resolving issues related to DGX systems, including hardware, software, and network components.
- Implementing security measures and best practices to ensure the integrity and confidentiality of DGX-based data and workflows.
- Documenting infrastructure configurations, processes, and procedures.
- Providing technical guidance and training to team members on DGX-related technologies and best practices.
- Staying current with NVIDIA DGX hardware and software advancements and recommending upgrades or enhancements as needed.
Requirements
- Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- Proven experience in managing NVIDIA DGX systems in production environments.
- Understanding of AI and deep learning frameworks and their integration with NVIDIA DGX systems.
- Proficiency in scripting languages such as Python for automation and configuration management.
- Experience with virtualization technologies (e.g., Docker, Kubernetes) in conjunction with DGX systems.
- Knowledge of storage solutions (e.g., NFS, Ceph) and their integration with DGX clusters.
- Familiarity with networking principles, protocols, and configurations related to DGX infrastructure.
- Excellent troubleshooting and problem-solving skills.
- Ability to work independently and collaboratively in a team environment.
- Effective communication skills, both verbal and written.