Apply Now

Requirement ID: 91249
Job Title: AI/ML Ops Application Support Engineer
Job Type: Contract
Duration: 6 - 9 months
Location: Atlanta, GA
Job Description:

Job Summary:
We are seeking an experienced AI/ML Ops Application Support Engineer to support, monitor, and maintain AI/ML platforms and applications in a production environment. The role involves ensuring the stability, performance, and reliability of machine learning pipelines, model deployments, and related cloud infrastructure, with a strong focus on operational excellence and incident management.

Key Responsibilities:
Provide L2/L3 production support for AI/ML applications, pipelines, and model deployments
Monitor model performance, drift, and data quality issues in production environments
Troubleshoot and resolve incidents, alerts, and system failures across ML workflows
Support CI/CD pipelines for model deployment and versioning
Collaborate with Data Scientists, ML Engineers, and DevOps teams for issue resolution and enhancements
Manage model retraining schedules, batch/real-time pipelines, and inference jobs
Perform root cause analysis (RCA) and implement preventive measures
Ensure adherence to SLA/SLO requirements and maintain operational dashboards/reporting
Handle application release support, patching, and environment maintenance
Maintain documentation for runbooks, troubleshooting guides, and standard operating procedures

Required Skills:
Strong experience in production support / application support roles (AI/ML systems preferred)
Hands-on experience with Python, SQL, and scripting for troubleshooting
Knowledge of ML lifecycle (training, validation, deployment, monitoring)
Experience with cloud platforms (Azure/AWS/GCP)
Familiarity with ML Ops tools (e.g., MLflow, Kubeflow, SageMaker, Azure ML)
Experience with containerization (Docker) and orchestration (Kubernetes)
Exposure to CI/CD tools (Jenkins, GitHub Actions, Azure DevOps)
Understanding of monitoring tools (Grafana, Prometheus, ELK, Azure Monitor)
Strong debugging and incident management skills

Preferred Skills:
Experience in healthcare/payer domain (claims, enrollment, analytics platforms)
Knowledge of data engineering tools (Spark, Airflow, Databricks)
Familiarity with model explainability and governance frameworks
ITIL process knowledge (incident, problem, change management)

Education & Experience:
Bachelor’s/Master’s degree in Computer Science, Data Science, or related field
Typically 5+ years of experience in application/production support, with exposure to AI/ML systems

Role Descriptions: AI/ML OPS apps support
Essential Skills: AI/ML Ops Application support
Skills: Digital : Python~Agile Specialisation~AI Agents~AI & Gen AI - Products & Tools~Application Server Deployment & Administration
Experience Required: 8-10
 

Apply Now