| Job Description: |
Role Descriptions: Role Overview
We are seeking a Databricks Data Scientist with strong experience in Databricks Lakehouse, advanced analytics, and Genie (AI/BI) to design, build, and deploy scalable data science and AI solutions. This role will focus on transforming enterprise data into actionable insights using machine learning, natural language analytics, and self-service BI powered by Databricks Genie.
You will work closely with medical, commercial, and R&D teams across the pharma and life sciences industry to build intelligent solutions that drive scientific and business impact — from drug discovery to commercial analytics to patient outcomes.
Python / PySpark Databricks MLflow Spark ML Delta Lake
Genie (AI/BI) Unity Catalog SQL NLP / GenAI HIPAA / GxP
Key Responsibilities
Data Science & Machine Learning
• Design, develop, and deploy machine learning models using Databricks (MLflow, Spark ML, Python) for pharma and life sciences use cases
• Implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, deployment, and monitoring
• Build predictive models for patient identification, HCP segmentation, market access analytics, pharmacovigilance, and safety signal detection
• Apply NLP and generative AI techniques (LLMs, RAG pipelines) to extract insights from medical literature, clinical notes, and regulatory documents
• Conduct A/B testing, model validation, and statistical analysis to evaluate model performance and business impact
• Collaborate with data engineers to ensure reliable, high-quality, production-ready datasets in the Lakehouse
Databricks & Lakehouse Architecture
• Leverage Databricks Lakehouse (Delta Lake, Unity Catalog) for scalable, governed, and high-performance analytics
• Design and optimize Spark jobs for performance and cost efficiency across large-scale pharma datasets
• Apply best practices for data governance, data lineage, and security within Unity Catalog
• Build and maintain Bronze / Silver / Gold Medallion architecture for clinical, claims, and commercial data
• Implement Delta Live Tables (DLT) pipelines with data quality checks for real-time and batch processing
• Configure and manage Databricks Workflows, Repos, and cluster policies for production ML workloads
Genie (AI/BI & Natural Language Analytics)
• Configure and enable Databricks Genie for self-service analytics across business and scientific teams
• Design semantic layers and curated Gold datasets optimized for natural language queries via Genie
• Define certified questions, trusted assets, and business glossary terms to improve Genie response quality
• Partner with business stakeholders to translate complex pharma questions into Genie-enabled insights
• Monitor and iterate on Genie Spaces based on user feedback, query accuracy, and adoption metrics
• Enable non-technical users across Medical Affairs, Commercial, and R&D to self-serve data insights
Real-World & Clinical Data Analysis
• Analyze real-world data (RWD), electronic health records (EHR), claims data, and clinical trial datasets to generate actionable insights
• Build scalable data pipelines for pharma-specific sources including IQVIA, Symphony Health, Komodo, and specialty pharmacy data
• Apply survival analysis, mixed models, and Bayesian methods for epidemiology and health economics (HEOR) studies
• Ensure all models and data processes comply with HIPAA, GxP, and 21 CFR Part 11 regulations
Business Enablement & Stakeholder Collaboration
• Work closely with product owners, analysts, and business leaders to identify and prioritize high-value data science use cases
• Communicate complex analytical results and model outputs in a clear, business-friendly manner to non-technical audiences
• Produce analytical documentation: model cards, design specs, performance reports, and executive summaries
• Lead sprint ceremonies as analytics owner: architecture reviews, estimation sessions, and release planning
Required Qualifications
• Experience: 4+ years of professional experience in data science or advanced analytics, preferably in pharma, biotech, or life sciences
• Education: Bachelor's or Master's degree in Data Science, Computer Science, Statistics, Engineering, or a related field
• Databricks: Hands-on experience with Databricks and Apache Spark for large-scale data processing and ML workloads
• Python: Strong programming skills in Python — PySpark, Pandas, NumPy, Scikit-learn — for data science and ML development
• MLflow: Experience building and deploying ML models in production using MLflow for experiment tracking and model lifecycle management
• SQL: Solid understanding of SQL and data modeling for analytical and reporting workloads on large datasets
• Delta Lake: Experience with Delta Lake, Unity Catalog, and Medallion architecture (Bronze / Silver / Gold) for Lakehouse analytics
• Genie / AI-BI: Familiarity with Databricks Genie or AI/BI tools for natural language querying and self-service analytics
• Healthcare Data: Experience working with clinical, claims, or real-world healthcare data (EHR, RWD, specialty pharmacy)
• Compliance: Familiarity with HIPAA compliance and handling of sensitive patient data in regulated environments
• Communication: Strong communication skills — ability to translate complex models and analysis into clear, actionable business insights
Preferred Qualifications
• Experience with Databricks Genie Spaces configuration, semantic layer design, and certified question management
• Hands-on experience with Delta Live Tables (DLT) for streaming and batch data quality pipelines
• Familiarity with LLMs, RAG pipelines, or generative AI for medical and scientific use cases
• Knowledge of GxP validation and 21 CFR Part 11 compliance for production ML models
• Experience with IQVIA, Symphony Health, Komodo Health, or similar pharma data vendors
• Familiarity with clinical trial data standards: CDISC, SDTM, ADaM
• Experience with pharmacovigilance, drug safety signal detection, or regulatory analytics
• Knowledge of AWS or Azure cloud services for ML deployment: SageMaker, Azure ML, Lambda, or equivalent
• Databricks certifications: Databricks Certified Machine Learning Professional or Data Engineer Associate
• PhD in a quantitative or life sciences field is a plus
• Prior experience in large-scale IT consulting or services delivery (TCS, Infosys, Accenture, Wipro, or similar
Desirable Skills: Keyword: Skills: BI Testing Experience Required: 4-6
|