Job Description:
Detailed JD (Roles and Responsibilities) Key Responsibilities
Architect, build, and own end-to-end Agentic AI solutions on AWS — from design through production deployment. Lead and mentor a cross-functional Scrum team of engineers, ML practitioners, and data specialists. Define technical standards, code-review practices, and engineering best practices for the team. Design and implement RAG pipelines, LLM-powered agents, and multi-agent orchestration frameworks. Drive cloud architecture decisions: serverless, microservices, containers (ECS/EKS), and data services on AWS. Collaborate with Product Owners to translate business requirements into robust technical solutions. Actively participate in sprint ceremonies — planning, stand-ups, retrospectives — as the technical authority. Establish CI/CD pipelines, infrastructure-as-code (IaC), and automated testing strategies. Evaluate and integrate emerging AI tools, models, and frameworks into the product roadmap. Ensure security, scalability, observability, and cost-efficiency of all cloud-based AI workloads.
Required Expertise
AWS Full-Stack EngineeringHands-on experience with core AWS compute, storage, networking, and AI/ML services. Lambda, ECS/EKS, EC2, API Gateway, S3, RDS, DynamoDB, Bedrock, SageMaker, Step Functions IAM, VPC, CloudWatch, CloudFormation / CDK / Terraform Proven ability to design resilient, highly available, and cost-optimised architectures (Well-Architected Framework). CI/CD implementation using CodePipeline, GitHub Actions, or equivalent tooling. Artificial Intelligence, RAG & AgentsPractical experience with LLM integration (OpenAI, Anthropic Claude, AWS Bedrock, Mistral, or similar). Design and deployment of Retrieval-Augmented Generation (RAG) pipelines at scale.Embedding models, vector stores (Pinecone, pgvector, OpenSearch, Weaviate), chunking strategies Building autonomous and multi-agent systems using frameworks such as LangChain, LangGraph, AutoGen, CrewAI, or Amazon Bedrock Agents. Prompt engineering, chain-of-thought reasoning, tool/function calling, and agent memory management. Evaluation, monitoring, and observability of AI systems (hallucination detection, latency, cost tracking). Python DevelopmentExpert-level Python for backend services, data processing, and AI/ML workflows. Proficiency with key libraries: FastAPI / Flask, Pydantic, asyncio, boto3, LangChain / LlamaIndex. Strong software engineering fundamentals: clean code, SOLID principles, unit and integration testing. Experience supporting project-to-operations transition in onsite, client-facing environments Based in Houston, TX with the ability to work out of the Woodside client site
Must Have Capabilities
AWS Architecture Python Expert LLM Integration RAG Pipelines Agent Frameworks CI/CD & IaC Scrum / Agile Leadership API Design Cloud Security Production AI Systems Demonstrable hands-on delivery (not just oversight) of AI-powered, cloud-native applications. Strong execution discipline with experience in tracking deliverables, managing competing priorities, and ensuring quality outcomes Strong communication skills to bridge technical depth with business stakeholders. Track record of delivering production systems within Agile sprints.
Nice to Have
Experience with AWS Bedrock Agents, Knowledge Bases, or Guardrails. Knowledge of fine-tuning or RLHF for domain-specific LLM adaptation. Familiarity with graph databases (Neptune) or knowledge graphs for agent reasoning. Frontend experience (React, Next.js) for full-stack ownership of AI-powered interfaces. Data engineering background: Glue, Athena, Redshift, or Spark. Exposure to MLOps practices and tooling (MLflow, W&B, SageMaker Pipelines). AWS certifications: Solutions Architect Professional, Machine Learning Specialty. Contributions to open-source AI / ML projects. Experience with multi-modal AI (vision, speech, embeddings beyond text).
Ideal Profile
The successful candidate is a builder at heart — someone who moves fluidly between whiteboard architecture and writing production code. They are naturally curious about the AI landscape, stay ahead of rapidly evolving agent frameworks, and bring a pragmatic engineering mindset that turns experimentation into reliable, scalable products.
10+ years of software engineering experience, with at least 3–5 years in cloud-native AWS environments. 2+ years of hands-on work with LLMs, RAG, or autonomous AI agents in a production context. Prior experience leading technical delivery within a Scrum or scaled-Agile (SAFe) team. A portfolio or demonstrable examples of AI-powered products shipped to production. Total Experience 10+years in Software Engineering Relevant Experience 5+years in AWS & 2+ years in AI Mandatory skills Skill Area Minimum Requirement AWS (Compute & Networking) 3+ years hands-on: Lambda, ECS/EKS, API GW, VPC, IAM AWS AI/ML Services Practical use of Bedrock, SageMaker, or equivalent Python 5+ years; async, OOP, testing, packaging LLM Integration Production integration with ≥1 major LLM provider RAG Pipeline Development End-to-end design incl. vector stores & retrieval tuning Agentic Frameworks LangChain / LangGraph / Bedrock Agents / AutoGen (≥1) CI/CD & IaC GitHub Actions / CodePipeline + CloudFormation / CDK / Terraform Agile / Scrum Technical lead or senior developer role in active Scrum team
Desired skills Skill Area Value Added AWS Bedrock Agents & Guardrails Accelerates safe, managed agent deployment LLM Fine-tuning / RLHF Enables domain-specific model customisation MLOps (MLflow, W&B) Improves model lifecycle and experiment tracking Graph Databases (Neptune) Supports complex knowledge graph reasoning Frontend (React / Next.js) Enables full product ownership end-to-end Multi-modal AI Expands solution space beyond text-only agents AWS Certifications (Pro / ML) Validates cloud architecture depth Open-source Contributions Signals community engagement & initiative
Domain (Industry) Oil and Gas Work Location Houston, TX (client office is a must, no remote allowed)