| Detailed JD (Roles and Responsibilities): | Data Engineer - 7+ years of experience in data engineering or related fields.
- Strong hands‑on experience with:
- AWS services: Glue, S3, Redshift, EMR, Lambda, Kinesis, Athena.
- Big Data tech: Spark/PySpark, Hadoop, Hive.
- Programming: Python, SQL, Scala (optional).
- Databases: SQL Server, PostgreSQL, MySQL, NoSQL (DynamoDB, MongoDB).
- Experience with CI/CD, DevOps, and IaC tools.
- Strong understanding of data modeling, warehousing, and distributed computing.
- Data Pipeline & ETL Development
- Design, build, and maintain scalable ETL/ELT pipelines using AWS services (Glue, Lambda, EMR, Step Functions).
- Develop batch and real‑time data ingestion processes from diverse sources (APIs, RDBMS, streaming platforms).
- Optimize data workflows for performance, scalability, and cost-efficiency.
- Data Platform Engineering
- Architect and implement data lakes and data warehouses using S3, Redshift, Lake Formation, Athena.
- Manage data modeling (star/snowflake schemas) and design optimized storage layers.
- Implement data cataloging, metadata management, and data lifecycle policies.
- Big Data & Analytics
- Work with big data tools such as Spark, Hadoop, Hive, and PySpark.
- Support analytics and machine learning teams by providing high‑quality, curated datasets.
- Cloud Infrastructure & DevOps
- Build CI/CD pipelines for data engineering (CodePipeline, CodeBuild, GitHub Actions).
- Write IaC using Terraform or AWS CloudFormation.
- Monitor, troubleshoot, and optimize workloads using CloudWatch and distributed logging.
- Data Quality & Governance
- Implement data validation frameworks and automated quality checks.
- Ensure compliance with security, privacy, and governance standards (IAM, KMS, encryption).
|