We are seeking a talented Data Engineer to join our client's esteemed Global Identity and Fraud Analytics team. This role offers a dynamic environment, working on diverse and challenging projects across various industries including Financial Services, Telecommunications, eCommerce, Healthcare, Insurance, and Government.
Key Responsibilities: - Work closely with the data scientist team to migrate analytical data and projects to the GCP environment, ensuring seamless project transition.
- Prepare and build data and analytical automation pipelines for self-serving machine learning projects, including data gathering, integration, consolidation, cleansing, and structuring.
- Design and code analysis scripts to run on GCP using BigQuery/Python/Scala, leveraging multiple Core data sources.
Qualifications: - 3-5 years of professional data engineering or data wrangling experience in:
- Working with Hadoop-based or Cloud-based big data management environments.
- Bash scripting or similar experience for data movement and ETL.
- Big data queries in Hive/Impala/Pig/BigQuery (Sufficient in BigQuery API libraries for data prep automation is a plus).
- Advanced Python programming (Scala is a plus) with strong coding experience. Proficiency in data studio, Big Table, GitHub working experience (Cloud composer and Data flow are pluses).
- Basic GCP certification is a plus.
- Knowledge of Kubernetes is a plus (or other types of GCP native tools for container-orchestration).
- Basic knowledge in machine learning (ensemble models, unsupervised models) with experience using Tensorflow and PyTorch is a plus.
- Basic knowledge in graph mining and graph data model is a plus.
- Understanding of best practices for data management, maintenance, and reporting.
Additional Qualifications: - 3+ years of professional experience as a data engineer.
- 3+ years working with Python and SQL.
- Experience with state-of-the-art machine learning algorithms (deep neural networks, support vector machines, boosting algorithms, random forest, etc.) preferred.
- Experience conducting advanced feature engineering and data dimension reduction in a Big Data environment is preferred.
- Strong SQL skills in a Big Data environment (Hive/Impala, etc.) are a plus.
Recommended Skills
- Api
- Algorithms
- Apache Hadoop
- Apache Hive
- Artificial Neural Networks
- Automation