Overview:
The Senior Site Reliability Engineer plays a critical role in ensuring the reliability, scalability, and performance of our systems and applications. This role is essential in maintaining the infrastructure and operations to support the organization's technology goals.
Key Responsibilities:
- Design, build, and deploy infrastructure in our three areas of focus 1) building and running network nodes, 2) building and running validators, and 3) building and running our next generation wallet infrastructure
- Develop tools and automation that integrate these systems in a secure way
- With a focus on our next generation wallet infrastructure, improve the capabilities of the existing infrastructure with a mindset towards infrastructure as code
- Improve availability and reliability while maintaining acceptable security, especially in monitoring and automation
- Integrate the use of cloud-based security mechanisms into the build infrastructure. Example security mechanisms include identity and access management and key management
- Participate in disaster recovery (DR) scenarios to validate operability of physical and digital material
Required Qualifications:
- 5+ years implementing cloud software while building “infrastructure as code”
- Experience in at least one area of software development, operating systems or device driver development, hardware, secure protocols, encryption, authentication, key management, or applied cryptography
- Hands-on experience in at least one or more cloud platforms (e.g., AWS, GCP, Azure, or others)
- Hands-on expertise with one or more of the following including ansible, puppet, docker, KMS, IAM, jenkins
- Proficiency in a common scripting language including but not limited to Python, Ruby, etc.
- Able to troubleshoot and debug issues, and demonstrate a methodical approach to root cause analysis
- Strong written and verbal communication skills; attentive to details
Preferred Qualifications:
- 6+ years implementing software
- Ability to read and write code written in one or more of Go, Java, Scala, and C/C++
- 3+ years implementing software in AWS
- 1+ years using monitoring, alerting, and automation tooling
- Previous experience in one of the three focus areas of blockchain node operations, validators as a service, and wallet infrastructure
- Experience in a code-first environment, developing automated solutions to solve support and operational issues
- Experience working with engineering teams, teaching, training, and mentoring on how to implement best-practice technical solutions
- Demonstrated ability to convert theoretical security concepts into production
- Solid understanding of Product Management and Product Ownership, Agile practices and methodologies