|
Search Jobvertise Jobs
|
Jobvertise
|
Java Spark Databricks AWS Data Engineer Location: US-TX-Houston Email this job to a friend
Report this Job
Urgent W2 job opportunity(link removed)>No C2C(link removed)>H1B & CPT to excuse please Job Role: Java Spark Databricks AWS Data engineer Required Skills: Java/Spark/Databricks/Kubernetes (+ AWS Cloud Exp) Experience required: 8 - 12 years Location: Houston, TX & Columus, OH (Day 1 Onsite) Key responsibilities: The job function of a Java Spark Databricks AWS data engineer involves working with data engineering technologies and platforms to design, develop, and maintain data solutions on the AWS cloud platform. Here are some key responsibilities and tasks associated with this role: Data Ingestion: Develop and implement processes to extract data from various sources, such as databases, APIs, and files, and load it into the data lake or data warehouse using Java, Spark, and AWS tools. Data Transformation: Perform data cleansing, validation, and transformation using Spark and Java programming, ensuring data quality and consistency. Apply business rules and data processing techniques to prepare the data for analysis and consumption. Data Pipeline Development: Design and build scalable data pipelines using AWS services like AWS Glue, AWS Data Pipeline, or Apache Airflow. Develop ETL (Extract, Transform, Load) processes to move and transform data between different systems and data stores. Data Modeling: Create and maintain data models and schemas, including dimensional and relational models, to support data storage and retrieval requirements. Optimize data structures for performance and efficiency. Performance Optimization: Fine-tune Spark applications and data processing workflows to improve performance and reduce processing time. Optimize resource utilization, data partitioning, and data caching strategies. Data Security and Governance: Implement data security and access controls to ensure data privacy and compliance with regulatory requirements. Apply data governance practices to manage metadata, data lineage, and data cataloging. Monitoring and Troubleshooting: Monitor data pipelines and Spark jobs for performance, errors, and issues. Troubleshoot and resolve data-related problems, such as data quality issues or performance bottlenecks. Collaboration and Documentation: Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver data solutions. Document data pipelines, processes, and system configurations. Cloud Infrastructure Management: Configure and manage AWS services like Amazon EMR (Elastic MapReduce), Amazon S3 (Simple Storage Service), and AWS Glue for data processing, storage, and management. Monitor and optimize cloud resources for cost efficiency. Continuous Improvement: Stay updated with emerging technologies, industry trends, and best practices related to data engineering and cloud computing. Continuously enhance skills and knowledge to improve data engineering processes and solutions. Best regards, Asad Khan Sr. Account Manager (IT / GCP & Leadership) Direct : 469.813.9116 Anveta Inc. Email: asad@anveta.com |URL: (link removed) Address: 1333 Corporate Drive, Suite #108 Irving, TX 75038, USA ANVETA, Inc
Anveta
|