Job description
Job Description
We are looking for a skilled Data Engineer to join our team. As a Data Engineer, you will be
responsible for understanding business and technological challenges, develop data pipelines
tackling those challenges, and ensuring their smooth deployment. You will as well be
responsible of the applications of standard industry and within-the-company good practices,
and the application and evolution of our various patterns. This is a remote position.
MAIN RESPONSABILITIES
• Projects understanding and Communication
o Understand problems from a user perspective and communicate to clearly
understand the issue.
o Ensure the architecture provided by the Data Architect is clearly understood by
yourself.
o Communicate with the Data Architect and your peers on the technical solution
you’re developing and communicate with the Project Manager in charge of the
project you’re working on.
• Development
o Write and communicate on new or updated interface contracts.
o Develop data pipelines based on the defined architecture.
o Ensure the regular good practices are applied.
o Deploy requested infrastructure, particularly using Terraform.
o Make peer reviews and ask to your peers to review your code when merging a
new version of the codebase.
• Testing
o Define tests with your project manager, based on the functional and technical
requirements of the pipeline you’re developing.
o Perform those tests and communicate regularly on the results.
o Regularly summarize the results of your tests in a dedicated document.
• Deployments
o Present to the Data Architect in charge of the architecture, and the Lead
DataOps, the development that was performed through our Deployment
Reviews.
o Track and communicate on any potential errors in the entire period of active
monitoring following a deployment.
o Ensure diligent application of deployment process, logging, and monitoring
strategy.
REQUESTED HARD SKILLS
• Google Cloud Platform: General knowledge of the platform and various services,
and at least one year of experience with GCP.
• Azure Cloud Platform: A general knowledge of the platform is a plus.
• Apache Airflow: At least two years of experience with the Airflow orchestrator,
experience with Google Composer is a plus.
• Google BigQuery: Extensive experience (at least 2 years) with GBQ, know how to
optimize tables and queries, and able to design database architecture. The
candidate should know more than the regular creation of tables, and should be
aware of trade-offs in developing and deploying some infrastructure versus others.
• Terraform: At least one years of experience with Terraform.
• Apache Spark: This is an optional expertise we would highly value. Some of our
pipelines are slowly being rewritten using pySpark, the Data Engineer should be
able to maintain get them evolving.
• Additional knowledge and experience that are a plus:
o Pub/Sub
o Kafka
o Google Cloud Storage
o Dataflow (or Apache Beam)