Skip to main content

LLM Operations Engineer

The LLM Operations Engineer serves as the DevOps specialist within our AI team, focusing on managing the operational aspects of our AI platform, particularly our Large Language Models (LLMs) that power Our client's One Intelligence. You will build and maintain robust LLM workflows, implement monitoring systems, integrate feedback loops, and optimize the performance of our AI solutions. You'll work closely with AI Engineers and Product Owners to ensure our AI systems are reliable, secure, observable, and continuously improving. Advantages Preferred Qualifications Experience with prompt engineering and testing tools like Promptfoo Familiarity with vector databases and retrieval-augmented generation (RAG) systems Knowledge of serverless architectures and event-driven systems Experience with AWS Guardrails for LLM security Background in data engineering or machine learning operations Understanding of financial systems and data security requirements in the finance industry Familiarity with implementing technical solutions to meet compliance requirements outlined in SOC2, ISAE 3402, and ISO 27001 Responsibilities What You Will Do Design, implement, and maintain LLM operations workflows using tools like Langfuse to monitor performance, track usage, and create feedback loops for continuous improvement Develop and maintain infrastructure-as-code for AI deployments using Terraform and AWS services (Lambda, SQS, API Gateway, OpenSearch, CloudWatch) Build and enhance monitoring, logging, and alerting systems to ensure optimal performance and reliability of our LLM infrastructure Collaborate with AI engineers to design and implement evaluation frameworks (including LLM-as-judge systems) to measure and improve model performance Manage prompt versioning, testing, and deployment pipelines through Concourse CI/CD and custom tooling Implement and maintain security guardrails for LLM interactions, ensuring compliance with best practices Create comprehensive documentation for LLM operations, including runbooks for production incidents Participate in on-call rotations to support mission-critical AI systems Drive innovation in LLM operations by researching and implementing best practices and emerging tools in the rapidly evolving GenAI space Qualifications Required Qualifications 3+ years of experience in DevOps, SRE, or similar roles, with at least 1 year specifically working with LLMs or AI systems in production Strong hands-on experience with AWS cloud services, particularly Bedrock, Lambda, SQS, API Gateway, OpenSearch, and CloudWatch Experience with infrastructure-as-code using Terraform, CloudFormation, or similar tools Proficiency in Python and experience building automation tooling and pipelines Familiarity with LangOps platforms such as Langfuse for LLM observability and evaluation Experience with CI/CD pipelines using Concourse or similar tools Knowledge of logging, monitoring, and alerting systems Understanding of security best practices for AI systems, including prompt injection mitigation techniques Excellent troubleshooting and problem-solving skills Strong communication skills and ability to work effectively with cross-functional teams Must be legally entitled to work in the country where the role is located Summary To succeed in this role, you will need a combination of experience, technology skills, personal qualities, and education. Randstad Canada is committed to fostering a workforce reflective of all peoples of Canada. As a result, we are committed to developing and implementing strategies to increase the equity, diversity and inclusion within the workplace by examining our internal policies, practices, and systems throughout the entire lifecycle of our workforce, including its recruitment, retention and advancement for all employees. In addition to our deep commitment to respecting human rights, we are dedicated to positive actions to affect change to ensure everyone has full participation in the workforce free from any barriers, systemic or otherwise, especially equity-seeking groups who are usually underrepresented in Canada's workforce, including those who identify as women or non-binary/gender non-conforming; Indigenous or Aboriginal Peoples; persons with disabilities (visible or invisible) and; members of visible minorities, racialized groups and the LGBTQ2+ community. Randstad Canada is committed to creating and maintaining an inclusive and accessible workplace for all its candidates and employees by supporting their accessibility and accommodation needs throughout the employment lifecycle. We ask that all job applications please identify any accommodation requirements by sending an email to accessibility@randstad.ca to ensure their ability to fully participate in the interview process.

LLM Operations Engineer

Randstad Canada
Toronto, ON
Full time

Published on 03/20/2025

Share this job now