Job Title: Senior Systems Administrator - Windows & Linux - Scientific Computing & Data
Job Location: NYC, NY (Hybrid)
Duration: Fulltime
About Edify Technologies: Transforming Businesses with Innovative Digital Solutions!
Headquartered in Naperville, IL, we are a dynamic team with over two decades of industry expertise, dedicated to delivering robust business solutions, staff augmentation, and a comprehensive range of application and web services. As a former recipient of INC. Magazine's prestigious '5000 Fastest Growing Private Companies' award, we take immense pride in our proven track record of success.
At Edify Technologies, we partner with customers globally, empowering them to enhance their technology footprint, reduce unnecessary costs, develop sustainable IT solutions, and gain a competitive edge in today's digital world. We believe in creating an impact through innovation, driving tangible results that propel businesses forward.
Join Our Team: Are you a Senior Systems Administrator - Windows & Linux - Scientific Computing & Data, looking for a dynamic opportunity to accelerate your career?
Description:
- The Senior Systems Administrator/Engineer, as a member of the Scientific Computing and Data group, is responsible for a computational and data science ecosystem for researchers.
- The Administrator is the principal technology expert for Windows and Linux systems and help support high-performance computing (HPC) environment in the Scientific Computing group.
- The incumbent utilizes a thorough understanding of available technology, tools and best practices to design, manage, maintain, upgrade and monitor Scientific Computing’s systems.
- The incumbent will develop and implement solutions responsive to researcher needs, in conjunction with other technology professionals and consistent with IT policies and Compliance.
- The systems will support a wide array of applications, including VMware, REDCap, Jira, Confluence, Postgres, MySQL, SQL server, Tivoli Storage Manager (TSM), and other custom Sinai-developed software.
- In total, there are >100 servers including physical servers and VMs along with an archival storage system containing over 20 petabytes of data.
- The TSM system is integrated with the 25,000-core, 30 petabyte HPC system. This position reports to the Director for Computational & Data Ecosystem in Scientific Computing. Specific responsibilities are listed below.
Responsibilities:
- Design, develop, implement all system administration tasks, including hardware and software configuration and maintenance, configuration management, system monitoring, upgrade, usage monitoring and reporting, system performance, security, networking and metrics, etc. The infrastructure includes both Windows and Linux systems with file servers in multiple physical locations, and a HPC system with 25,000-cores and 30 petabyte of storage.
- Design and develop scripts for system administration and monitoring for Ansible configuration management, Grafana/Nagios/Zabbix system monitoring, Splunk and other tools.
- Research, deploy and manage security infrastructure, including implementation of policies and procedures from IT Security and Compliance.
- Plan, implement, troubleshoot and maintain software including databases (SQL, MySQL, PostgreSQL, and other databases), REDCap, Jira, Confluence, TSM, VMware and other software.
- Troubleshoot system and application issues across multiple environments and operating platforms.
- Research, suggest and implement new uses of information technologies, policies and procedures for continued improvement.
- Develop processes and policies for a 20-petabyte TSM tape archival storage system with thousands of users. Perform system administration support for TSM, including management of the 300 terabyte TSM disk cache, 12 LTO9 tape drives and 12 LTO5 tape drives. Assist with end researcher support to place and retrieve files. Develops and implements backup policies.
- Assist in the management and maintenance of HPC cluster and data center work, including troubleshooting for resolving system problems, coordinating with users and vendors, monitoring, audit and logging etc.
- Answer and resolve user tickets.
- Develop and create effective system documentation for all.
- Provide off-hours support for critical and other production issues.
- Performs other duties as assigned or requested.
Qualifications
- Bachelors degree in a technical discipline; Masters degree preferred
- Experience working in a research environment preferred
- 10 years of experience installing, configuring, managing, provisioning, automating tasks and monitoring hardware and software. Experience with data and security best practices.
- At least 6 years of experience in designing, administering and troubleshooting Linux and Windows systems, storage systems, network and VMs.
- The ability to communicate effectively and manage multiple conflicting priorities and projects simultaneously.
- Excellent analytical ability, strong judgment and management skills, and the ability to work effectively and independently with clients, vendors, IT management and staff.
- Experience with JIRA, Confluence administration, databases (MS SQL, MySQL, MySQL Galera, Oracle, PostgreSQL, etc.), container and VMWare preferred.
- Ability to lead the project to successful completion with little or no guidance
- Experience with supporting HPC environments including configuration management (such as xCAT, Puppet or Ansible), node installation and provision, networking, storage and job scheduler are preferred.
We Believe in Diversity & Inclusion:
As a minority-owned company, we deeply value and prioritize inclusion and diversity within our organization. We believe that a diverse and inclusive workforce fosters innovation, creativity, and empathy, leading to a richer and more rewarding work environment. We are committed to cultivating a workplace where every team member feels valued, respected, and empowered to contribute their unique perspectives and talents. Join us and be a part of a team that celebrates diversity, cherishes different perspectives, and fosters a collaborative and supportive community.
#InclusionAndDiversity #Empowerment #EdifyTechnologies #JoinOurTeam #Hiring