Description: We are seeking a highly skilled and motivated Sr. LLM Engineer to join our team in driving the advancement of our Language Model infrastructure. As a key member of our AI/ML team, you will be responsible for the training, hosting, and optimization of Large Language Model (LLM) instances within our compute environment. The ideal candidate should possess a strong passion for pushing the boundaries of language technology, a deep understanding of LLM architectures, and the grit to tackle complex challenges head-on. This role requires a self-reliant individual with a drive to identify and fix inefficiencies, constantly striving to improve the codebase and optimize model performance. If you thrive in a fast-paced environment and have an unwavering commitment to delivering cutting-edge language solutions, this position is for you.
Responsibilities:
·Design, develop, and maintain the infrastructure for training, hosting, and serving LLM instances.
·Optimize model training pipelines to achieve high performance and resource efficiency.
·Implement and integrate state-of-the-art LLM architectures and techniques.
·Collaborate with cross-functional teams to understand business requirements and deliver impactful language solutions.
·Monitor and analyze model performance metrics, identifying areas for improvement and implementing optimizations.
·Develop and maintain documentation, best practices, and coding standards for LLM development and deployment.
·Stay up-to-date with the latest advancements in LLM research and industry trends, and incorporate them into our projects.
·Mentor and guide junior engineers, fostering a culture of continuous learning and knowledge sharing.
Skills Requirements:
·12+ years of experience in software engineering, with a focus on machine learning or natural language processing.
·Degree in Computer Science, Artificial Intelligence, or a related field.
·Strong expertise in deep learning frameworks such as TensorFlow, PyTorch, or MXNet.
·Proficiency in programming languages such as Python, C++, or Java.
·Solid understanding of LLM architectures, training techniques, and evaluation methodologies.
·Familiarity with cloud platforms (e.g., AWS, GCP) and their machine learning services.
·Knowledge of software engineering best practices, including version control, testing, and continuous integration/deployment.
·Excellent problem-solving and debugging skills.
·Strong communication and collaboration abilities to work effectively with cross-functional teams.
Nice to Haves:
·Advanced degree (Master's or Ph.D.) in Computer Science, Artificial Intelligence, or a related field.
·Proven track record of implementing and deploying large-scale LLM systems in production environments.
·Experience with distributed computing frameworks like Apache Spark or Hadoop.
·Experience with natural language understanding, generation, and dialogue systems.
·Familiarity with techniques such as transfer learning, few-shot learning, and reinforcement learning.
·Contributions to open-source projects or research publications in the field of LLMs.
·Experience with serving models using APIs and building scalable inference pipelines.
·Knowledge of DevOps practices and tools like Docker, Kubernetes, and Jenkins.
YOE Requirement: 12 yrs., B.S. in a technical discipline or 4 additional yrs. in place of B.S.