As a Sr. Systems Engineer on the HPC Grid Systems infrastructure team, you will be working on designing and maintaining the next generation HPC Compute Grid for Samsung’s low-power mobile chip design EDA engineering team. Primary responsibilities include defining and refining deployment and provisioning processes, designing and implementing performance monitoring, instrumentation and metrics gathering solutions, end-to-end system level testing, evaluating new technologies and providing stability and refinement throughout the entire compute stack.
Responsibilities:
• Automating and improving existing processes
• Improving and refining system and service monitoring
• Executing large-scale system changes using the latest technologies
• Hands-on server/hardware debugging and problem remediation
• Implementing solutions to problems of diverse scope in specific areas, including data analysis to identify root cause and contributing factors.
• Improving and refining server OS deployments and provisioning processes
• Core refresh, expansion planning and implementation projects
• Documentation, service management, service improvement, service delivery
• Reviewing, improving, and generating documentation for multiple functional areas
Qualifications:
• Associate’s Degree required (Bachelor’s Preferred)
• 5+ years of experience
• Extensive knowledge of building, configuring, and administering production Linux computer systems
• Experience programming in Bash, Perl, Python, or similar scripting languages
• Hands-on experience with CFEngine, Cobbler, PXE, Kickstart, Chef, Puppet, Ansible, Salt, or similar configuration and automation tools and practices
• Advanced knowledge of networking concepts and practices
• Expert skills with most Linux operating system commands and utilities,
• Linux storage administration experience including file systems, LVM and RAID adapters
• Virtualization environments and tools such as VMWare, vCenter, vCenter Orchestrator
• Solid understanding of automated deployments
• Strong understanding of networking technologies (routing, switching, firewalls, iptables, etc)
• Experience with source code management tools (git)
• Experience with high speed interconnects (InfiniBand, 10GigE, 40GigE)
• Experience with scale-out/scale-up NFS file systems; specifically Isilon S, X, NL, HD
• Hands-on experience with performance analysis tools, benchmarks and applications
• Proficient software skills for light programming and scripting, operating system administration, and hardware administration.
• Deep expertise with Linux operating systems including operations in a large scale production environment.
• Deep experience with system and application monitoring, software distribution, patching and maintenance in a Linux environment