Job Opportunities | USA | Careers | APEX-2000 Inc - A leading Global Consulting and IT services company

Requirement ID:	91304
Job Title:	Databricks Data Scientist
Job Type:	Contract
Duration:	6 - 9 months
Location:	Indianapolis, IN
Job Description:	Role Descriptions: Role Overview We are seeking a Databricks Data Scientist with strong experience in Databricks Lakehouse, advanced analytics, and Genie (AI/BI) to design, build, and deploy scalable data science and AI solutions. This role will focus on transforming enterprise data into actionable insights using machine learning, natural language analytics, and self-service BI powered by Databricks Genie. You will work closely with medical, commercial, and R&D teams across the pharma and life sciences industry to build intelligent solutions that drive scientific and business impact — from drug discovery to commercial analytics to patient outcomes. Python / PySpark Databricks MLflow Spark ML Delta Lake Genie (AI/BI) Unity Catalog SQL NLP / GenAI HIPAA / GxP Key Responsibilities Data Science & Machine Learning • Design, develop, and deploy machine learning models using Databricks (MLflow, Spark ML, Python) for pharma and life sciences use cases • Implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, deployment, and monitoring • Build predictive models for patient identification, HCP segmentation, market access analytics, pharmacovigilance, and safety signal detection • Apply NLP and generative AI techniques (LLMs, RAG pipelines) to extract insights from medical literature, clinical notes, and regulatory documents • Conduct A/B testing, model validation, and statistical analysis to evaluate model performance and business impact • Collaborate with data engineers to ensure reliable, high-quality, production-ready datasets in the Lakehouse Databricks & Lakehouse Architecture • Leverage Databricks Lakehouse (Delta Lake, Unity Catalog) for scalable, governed, and high-performance analytics • Design and optimize Spark jobs for performance and cost efficiency across large-scale pharma datasets • Apply best practices for data governance, data lineage, and security within Unity Catalog • Build and maintain Bronze / Silver / Gold Medallion architecture for clinical, claims, and commercial data • Implement Delta Live Tables (DLT) pipelines with data quality checks for real-time and batch processing • Configure and manage Databricks Workflows, Repos, and cluster policies for production ML workloads Genie (AI/BI & Natural Language Analytics) • Configure and enable Databricks Genie for self-service analytics across business and scientific teams • Design semantic layers and curated Gold datasets optimized for natural language queries via Genie • Define certified questions, trusted assets, and business glossary terms to improve Genie response quality • Partner with business stakeholders to translate complex pharma questions into Genie-enabled insights • Monitor and iterate on Genie Spaces based on user feedback, query accuracy, and adoption metrics • Enable non-technical users across Medical Affairs, Commercial, and R&D to self-serve data insights Real-World & Clinical Data Analysis • Analyze real-world data (RWD), electronic health records (EHR), claims data, and clinical trial datasets to generate actionable insights • Build scalable data pipelines for pharma-specific sources including IQVIA, Symphony Health, Komodo, and specialty pharmacy data • Apply survival analysis, mixed models, and Bayesian methods for epidemiology and health economics (HEOR) studies • Ensure all models and data processes comply with HIPAA, GxP, and 21 CFR Part 11 regulations Business Enablement & Stakeholder Collaboration • Work closely with product owners, analysts, and business leaders to identify and prioritize high-value data science use cases • Communicate complex analytical results and model outputs in a clear, business-friendly manner to non-technical audiences • Produce analytical documentation: model cards, design specs, performance reports, and executive summaries • Lead sprint ceremonies as analytics owner: architecture reviews, estimation sessions, and release planning Required Qualifications • Experience: 4+ years of professional experience in data science or advanced analytics, preferably in pharma, biotech, or life sciences • Education: Bachelor's or Master's degree in Data Science, Computer Science, Statistics, Engineering, or a related field • Databricks: Hands-on experience with Databricks and Apache Spark for large-scale data processing and ML workloads • Python: Strong programming skills in Python — PySpark, Pandas, NumPy, Scikit-learn — for data science and ML development • MLflow: Experience building and deploying ML models in production using MLflow for experiment tracking and model lifecycle management • SQL: Solid understanding of SQL and data modeling for analytical and reporting workloads on large datasets • Delta Lake: Experience with Delta Lake, Unity Catalog, and Medallion architecture (Bronze / Silver / Gold) for Lakehouse analytics • Genie / AI-BI: Familiarity with Databricks Genie or AI/BI tools for natural language querying and self-service analytics • Healthcare Data: Experience working with clinical, claims, or real-world healthcare data (EHR, RWD, specialty pharmacy) • Compliance: Familiarity with HIPAA compliance and handling of sensitive patient data in regulated environments • Communication: Strong communication skills — ability to translate complex models and analysis into clear, actionable business insights Preferred Qualifications • Experience with Databricks Genie Spaces configuration, semantic layer design, and certified question management • Hands-on experience with Delta Live Tables (DLT) for streaming and batch data quality pipelines • Familiarity with LLMs, RAG pipelines, or generative AI for medical and scientific use cases • Knowledge of GxP validation and 21 CFR Part 11 compliance for production ML models • Experience with IQVIA, Symphony Health, Komodo Health, or similar pharma data vendors • Familiarity with clinical trial data standards: CDISC, SDTM, ADaM • Experience with pharmacovigilance, drug safety signal detection, or regulatory analytics • Knowledge of AWS or Azure cloud services for ML deployment: SageMaker, Azure ML, Lambda, or equivalent • Databricks certifications: Databricks Certified Machine Learning Professional or Data Engineer Associate • PhD in a quantitative or life sciences field is a plus • Prior experience in large-scale IT consulting or services delivery (TCS, Infosys, Accenture, Wipro, or similar Desirable Skills: Keyword: Skills: BI Testing Experience Required: 4-6
	Apply Now

Apply Now