Apply Now

Requirement ID: 91304
Job Title: Databricks Data Scientist
Job Type: Contract
Duration: 6 - 9 months
Location: Indianapolis, IN
Job Description:

Role Descriptions: Role Overview

We are seeking a Databricks Data Scientist with strong experience in Databricks Lakehouse, advanced analytics, and Genie (AI/BI) to design, build, and deploy scalable data science and AI solutions. This role will focus on transforming enterprise data into actionable insights using machine learning, natural language analytics, and self-service BI powered by Databricks Genie.


You will work closely with medical, commercial, and R&D teams across the pharma and life sciences industry to build intelligent solutions that drive scientific and business impact — from drug discovery to commercial analytics to patient outcomes.


Python / PySpark Databricks MLflow Spark ML Delta Lake

Genie (AI/BI) Unity Catalog SQL NLP / GenAI HIPAA / GxP


Key Responsibilities

Data Science & Machine Learning

• Design, develop, and deploy machine learning models using Databricks (MLflow, Spark ML, Python) for pharma and life sciences use cases

• Implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, deployment, and monitoring

• Build predictive models for patient identification, HCP segmentation, market access analytics, pharmacovigilance, and safety signal detection

• Apply NLP and generative AI techniques (LLMs, RAG pipelines) to extract insights from medical literature, clinical notes, and regulatory documents

• Conduct A/B testing, model validation, and statistical analysis to evaluate model performance and business impact

• Collaborate with data engineers to ensure reliable, high-quality, production-ready datasets in the Lakehouse


Databricks & Lakehouse Architecture

• Leverage Databricks Lakehouse (Delta Lake, Unity Catalog) for scalable, governed, and high-performance analytics

• Design and optimize Spark jobs for performance and cost efficiency across large-scale pharma datasets

• Apply best practices for data governance, data lineage, and security within Unity Catalog

• Build and maintain Bronze / Silver / Gold Medallion architecture for clinical, claims, and commercial data

• Implement Delta Live Tables (DLT) pipelines with data quality checks for real-time and batch processing

• Configure and manage Databricks Workflows, Repos, and cluster policies for production ML workloads


Genie (AI/BI & Natural Language Analytics)

• Configure and enable Databricks Genie for self-service analytics across business and scientific teams

• Design semantic layers and curated Gold datasets optimized for natural language queries via Genie

• Define certified questions, trusted assets, and business glossary terms to improve Genie response quality

• Partner with business stakeholders to translate complex pharma questions into Genie-enabled insights

• Monitor and iterate on Genie Spaces based on user feedback, query accuracy, and adoption metrics

• Enable non-technical users across Medical Affairs, Commercial, and R&D to self-serve data insights


Real-World & Clinical Data Analysis

• Analyze real-world data (RWD), electronic health records (EHR), claims data, and clinical trial datasets to generate actionable insights

• Build scalable data pipelines for pharma-specific sources including IQVIA, Symphony Health, Komodo, and specialty pharmacy data

• Apply survival analysis, mixed models, and Bayesian methods for epidemiology and health economics (HEOR) studies

• Ensure all models and data processes comply with HIPAA, GxP, and 21 CFR Part 11 regulations


Business Enablement & Stakeholder Collaboration

• Work closely with product owners, analysts, and business leaders to identify and prioritize high-value data science use cases

• Communicate complex analytical results and model outputs in a clear, business-friendly manner to non-technical audiences

• Produce analytical documentation: model cards, design specs, performance reports, and executive summaries

• Lead sprint ceremonies as analytics owner: architecture reviews, estimation sessions, and release planning


Required Qualifications

• Experience: 4+ years of professional experience in data science or advanced analytics, preferably in pharma, biotech, or life sciences

• Education: Bachelor's or Master's degree in Data Science, Computer Science, Statistics, Engineering, or a related field

• Databricks: Hands-on experience with Databricks and Apache Spark for large-scale data processing and ML workloads

• Python: Strong programming skills in Python — PySpark, Pandas, NumPy, Scikit-learn — for data science and ML development

• MLflow: Experience building and deploying ML models in production using MLflow for experiment tracking and model lifecycle management

• SQL: Solid understanding of SQL and data modeling for analytical and reporting workloads on large datasets

• Delta Lake: Experience with Delta Lake, Unity Catalog, and Medallion architecture (Bronze / Silver / Gold) for Lakehouse analytics

• Genie / AI-BI: Familiarity with Databricks Genie or AI/BI tools for natural language querying and self-service analytics

• Healthcare Data: Experience working with clinical, claims, or real-world healthcare data (EHR, RWD, specialty pharmacy)

• Compliance: Familiarity with HIPAA compliance and handling of sensitive patient data in regulated environments

• Communication: Strong communication skills — ability to translate complex models and analysis into clear, actionable business insights


Preferred Qualifications

• Experience with Databricks Genie Spaces configuration, semantic layer design, and certified question management

• Hands-on experience with Delta Live Tables (DLT) for streaming and batch data quality pipelines

• Familiarity with LLMs, RAG pipelines, or generative AI for medical and scientific use cases

• Knowledge of GxP validation and 21 CFR Part 11 compliance for production ML models

• Experience with IQVIA, Symphony Health, Komodo Health, or similar pharma data vendors

• Familiarity with clinical trial data standards: CDISC, SDTM, ADaM

• Experience with pharmacovigilance, drug safety signal detection, or regulatory analytics

• Knowledge of AWS or Azure cloud services for ML deployment: SageMaker, Azure ML, Lambda, or equivalent

• Databricks certifications: Databricks Certified Machine Learning Professional or Data Engineer Associate

• PhD in a quantitative or life sciences field is a plus

• Prior experience in large-scale IT consulting or services delivery (TCS, Infosys, Accenture, Wipro, or similar

Desirable Skills:
Keyword:
Skills: BI Testing
Experience Required: 4-6

 

Apply Now