Job description

WPP is the creative transformation company. We use the power of creativity to build better futures for our people, planet, clients, and communities.

Working at WPP means being part of a global network of more than 100,000 talented people dedicated to doing extraordinary work for our clients. We operate in over 100 countries, with corporate headquarters in New York, London and Singapore.

WPP is a world leader in marketing services, with deep AI, data and technology capabilities, global presence and unrivalled creative talent. Our clients include many of the biggest companies and advertisers in the world, including approximately 300 of the Fortune Global 500.

Our people are the key to our success. We’re committed to fostering a culture of creativity, belonging and continuous learning, attracting and developing the brightest talent, and providing exciting career opportunities that help our people grow.

Why we’re hiring:

We are seeking a skilled and motivated Data Engineer to join our data team, with a strong focus on building and managing the data ingestion layer of our Databricks Lakehouse Platform. You will be responsible for creating reliable, scalable, and automated pipelines to pull data from a wide variety of sources—including third-party APIs, streaming platforms, relational databases, and file-based systems like Google Analytics 4 (GA4)—ensuring it lands accurately and efficiently in our Bronze layer.

This role requires hands-on expertise in Python (PySpark), SQL, and modern ingestion tools like Databricks Auto Loader and Structured Streaming. You will be the expert on connecting to new data sources, ensuring our data lakehouse has the raw data it needs to power analytics and business insights across the organization.

What you’ll be doing:

Design, build, and maintain robust data ingestion pipelines to collect data from diverse sources such as APIs, streaming sources (e.g., Kafka, Event Hubs), relational databases (via JDBC), and cloud storage.
Heavily utilize Databricks Auto Loader and COPY INTO for the efficient, incremental, and scalable ingestion of files into Delta Lake.
Develop and manage Databricks Structured Streaming jobs to process near-real-time data feeds.
Ensure the reliability, integrity, and freshness of the Bronze layer in our Medallion Architecture, which serves as the single source of truth for all raw data.
Perform initial data cleansing, validation, and structuring to prepare data for further transformation in the Silver layer.
Monitor, troubleshoot, and optimize ingestion pipelines for performance, cost, and stability.
Develop Python scripts and applications to automate data extraction and integration processes.
Work closely with platform architects and other data engineers to implement best practices for data ingestion and management.
Document data sources, ingestion patterns, and pipeline configurations.
Conform to agile development practices, including version control (Git), CI/CD, and automated testing.

What you’ll need:

Education: Minimum of a Bachelor’s degree in Computer Science, Engineering, Mathematics, or a related technical field preferred.
Experience: 4-6+ years of relevant experience in data engineering, with a strong focus on data ingestion and integration.
Engineer’s Core Skills:
- Databricks Platform Expertise:
- Data Ingestion Mastery: Deep, practical experience with Databricks Auto Loader, COPY INTO, and Structured Streaming.
- Apache Spark: Strong hands-on experience with Spark architecture, writing and optimizing PySpark and Spark SQL jobs for ingestion and basic transformation.
- Delta Lake: Solid understanding of Delta Lake for creating reliable landing zones for raw data. Proficient in writing data to Delta tables and understanding its core concepts like ACID transactions and schema enforcement.
- Core Engineering & Cloud Skills:
- Programming: 4+ years of strong, hands-on experience in Python, with an emphasis on PySpark and libraries for API interaction (e.g., requests).
- SQL: 4+ years of strong SQL experience for data validation and querying.
- Cloud Platforms: 3+ years working with a major cloud provider (Azure, AWS, or GCP), with specific knowledge of cloud storage (ADLS Gen2, S3), security, and messaging/streaming services.
- Diverse Data Sources: Proven experience ingesting data from a variety of sources (e.g., REST APIs, SFTP, relational databases, message queues).
- CI/CD & DevOps: Experience with version control (Git) and CI/CD pipelines (e.g., GitHub Actions, Azure DevOps) for automating deployments.
- Data Modeling: Familiarity with data modeling concepts (e.g., star schema) to understand the downstream use of the data you are ingesting.
Tools & Technologies:
- Primary Data Platform: Databricks
- Cloud Platforms: Azure (Preferred), GCP, AWS
- Data Warehouses (Integration): Snowflake, Google BigQuery
- Orchestration: Databricks Workflows
- Version Control: Git/GitHub or similar repositories
- Infrastructure as Code (Bonus): Terraform

Who you are:

You’re open : We are inclusive and collaborative; we encourage the free exchange of ideas; we respect and celebrate diverse views. We are open-minded: to new ideas, new partnerships, new ways of working.

You’re optimistic : We believe in the power of creativity, technology and talent to create brighter futures or our people, our clients and our communities. We approach all that we do with conviction: to try the new and to seek the unexpected.

You’re extraordinary: we are stronger together: through collaboration we achieve the amazing. We are creative leaders and pioneers of our industry; we provide extraordinary every day.

What we’ll give you:

Passionate, inspired people – We aim to create a culture in which people can do extraordinary work.

Scale and opportunity – We offer the opportunity to create, influence and complete projects at a scale that is unparalleled in the industry.

Challenging and stimulating work – Unique work and the opportunity to join a group of creative problem solvers. Are you up for the challenge?

#LI-Onsite

We believe the best work happens when we’re together, fostering creativity, collaboration, and connection. That’s why we’ve adopted a hybrid approach, with teams in the office around four days a week. If you require accommodations or flexibility, please discuss this with the hiring team during the interview process.

WPP is an equal opportunity employer and considers applicants for all positions without discrimination or regard to particular characteristics. We are committed to fostering a culture of respect in which everyone feels they belong and has the same opportunities to progress in their careers.

Please read our Privacy Notice (https://www.wpp.com/en/careers/wpp-privacy-policy-for-recruitment) for more information on how we process the information you provide.

Senior Data Engineer

Job description

Please read our Privacy Notice (https://www.wpp.com/en/careers/wpp-privacy-policy-for-recruitment) for more information on how we process the information you provide.

Similar Remote Jobs