Senior ETL Developer will be responsible for designing, implementing, and optimizing distributed data processing jobs to handle large-scale data in Hadoop Distributed File System(HDFS) using Apache Spark and Python. This role required deep understanding of data engineering principles, proficiency in Python and hands-on experience with Spark and Hadoop ecosystems. Developer will collaborate with data engineers, analysts, and business stakeholders to process, transform and drive insights and data driven decisions.
**Key Responsibilities:**
Data Processing and Transformation:
+ Design and Implement of Spark applications to process and transform large datasets in HDFS.
+ Develop ETL Pipelines in Spark using Python for data Ingestion, cleaning, aggregation, and transformations.
Performance Optimization:
+ Optimize Spark jobs for efficiency, reducing run time and resource usage.
+ Finetune memory management, caching, and partitioning strateg...