| Job Description: |
Must Have Skills: ⦁ 5+ years in ML engineering or applied ML, with at least 2 years hands-on experience with LLMs (GPT-4, Claude, Llama, Mistral, or similar); strong Python proficiency and experience with ML frameworks (PyTorch, HuggingFace Transformers). ⦁ Production experience building RAG systems, including vector databases (Pinecone, Weaviate, pgvector, FAISS), embedding models, chunking strategies, and retrieval optimization; experience with prompt engineering and chain-of-thought patterns. ⦁ Experience with ML evaluation and experimentation - building evaluation harnesses, A/B testing, regression testing for LLM outputs, and defining quality metrics for non-deterministic AI systems.
Nice to Have Skills: ⦁ Experience with model fine-tuning (LoRA, QLoRA), model serving (vLLM, TGI, Triton), multi-agent orchestration frameworks, reinforcement learning from human feedback (RLHF), MLOps/LLMOps platforms, knowledge graph construction, cost optimization for LLM inference, airline or travel domain experience.
Detailed Job Description: ⦁ As a Machine Learning Engineer on the Agentic System Layer (ASL) team, you will build the ML-powered components that make American Airlines’ agentic AI systems intelligent and reliable. ⦁ Day-to-day responsibilities include: developing and optimizing LLM-powered agent pipelines including prompt engineering, chain-of-thought reasoning, and tool-use patterns; building RAG (Retrieval Augmented Generation) systems with vector search, embedding models, and knowledge retrieval pipelines; implementing agent evaluation, benchmarking, and regression testing frameworks; fine-tuning and optimizing model inference for latency and cost (quantization, caching, batching, model routing); developing guardrails, content filtering, and safety mechanisms for production agent deployments; collaborating with software engineers on model serving infrastructure and with architects on system design; staying current with rapid advances in agentic AI, LLM capabilities, and evaluation methodologies. ⦁ A great candidate lives at the intersection of ML research and production engineering. They can read the latest papers on agentic AI, understand what is practically useful versus academic noise, and turn those ideas into production-ready code. They obsess over evaluation - they know that if you cannot measure it, you cannot ship it. They are comfortable with the inherent non-determinism of LLM systems and have practical strategies for building reliability on top of probabilistic foundations. ⦁ Candidate will be working hybrid; onsite 3 days a week and 2 days virtually ⦁ Interviews will have 2 rounds. ⦁ Top 3 Mandatory Skills and Experience: 1) 5+ years in ML engineering or applied ML, with at least 2 years hands-on experience with LLMs (GPT-4, Claude, Llama, Mistral, or similar); strong Python proficiency and experience with ML frameworks (PyTorch, HuggingFace Transformers). 2) Production experience building RAG systems, including vector databases (Pinecone, Weaviate, pgvector, FAISS), embedding models, chunking strategies, and retrieval optimization; experience with prompt engineering and chain-of-thought patterns. 3) Experience with ML evaluation and experimentation - building evaluation harnesses, A/B testing, regression testing for LLM outputs, and defining quality metrics for non-deterministic AI systems. ⦁ Nice to Have Skills: Experience with model fine-tuning (LoRA, QLoRA), model serving (vLLM, TGI, Triton), multi-agent orchestration frameworks, reinforcement learning from human feedback (RLHF), MLOps/LLMOps platforms, knowledge graph construction, cost optimization for LLM inference, airline or travel domain experience.
Minimum Years of Experience: ⦁ 5+ years
Certifications Needed: ⦁ None
Top 3 responsibilities you would expect the Subcon to shoulder and execute:
Interview Process (Is face to face required?) ⦁ Interview will be 2 rounds |