A bit about usPulsePoint is a leading healthcare ad technology company that uses real-world data in real-time to optimize campaign performance and revolutionize health decision-making. Leveraging proprietary datasets and methodology, PulsePoint targets healthcare professionals and patients with an unprecedented level of accuracy—delivering unparalleled results to the clients we serve. The company is now a part of Internet Brands, a KKR portfolio company and owner of WebMD Health Corp.Sr. Data EngineerPulsePoint Data Engineering team plays a key role in our technology company that’s experiencing exponential growth. Our data pipeline processes over 80 billion impressions a day (> 20 TB of data, 200 TB uncompressed). This data is used to generate reports, update budgets, and drive our optimization engines. We do all this while running against tight SLAs and provide stats and reports as close to real-time as possible.What you'll be doingDesign, build, and maintain reliable and scalable enterprise-level distributed transactional data processing systems for scaling the existing business and supporting new business initiativesOptimize jobs to utilize Kafka, Hadoop, Presto, Spark, and Kubernetes resources in the most efficient wayMonitor and provide transparency into data quality across systems (accuracy, consistency, completeness, etc)Increase accessibility and effectiveness of data (work with analysts, data scientists, and developers to build/deploy tools and datasets that fit their use cases)Collaborate within a small team with diverse technology backgroundsProvide mentorship and guidance to junior team membersTeam ResponsibilitiesIngest, validate and process internal & third party dataCreate, maintain and monitor data flows in Python, Spark, Hive, SQL and Presto for consistency, accuracy and lag timeMaintain and enhance framework for jobs(primarily aggregate jobs in Spark and Hive)Create different consumers for data in Kafka using Spark Streaming for near time aggregationTools evaluationBackups/Retention/High Availability/Capacity PlanningReview/Approval - DDL for database, Hive Framework jobs and Spark Streaming to make sure they meet our standardsTechnologies We UsePython - primary repo languageAirflow/Luigi - for job schedulingDocker - Packaged container image with all dependenciesGraphite - for monitoring data flowsHive - SQL data warehouse layer for data in HDFSKafka - distributed commit log storageKubernetes - Distributed cluster resource managerPresto/Trino - fast parallel data warehouse and data federation layerSpark Streaming - Near time aggregationSQL Server - Reliable OLTP RDBMSApache IcebergGCP - BigQuery for performance, Looker for dashboardsRequirements8+ years of data engineering experienceStrong skills in and current experience with SQL and PythonStrong recent Spark experience (3+ years)Experience working in on-prem environmentsHadoop and Hive experienceExperience in Scala/Java is a plus (Polyglot programmer preferred!)Proficiency in LinuxStrong understanding of RDBMS and query optimizationPassion for engineering and computer science around dataEast Coast U.S. hours 9am-6pm EST; you can work fully remotelyNotice period needs to be less than 2 months (or 2 months max)Knowledge and exposure to distributed production systems i.e HadoopKnowledge and exposure to Cloud migration (AWS/GCP/Azure) is a plusLocationWe can hire as FTE in the, U.S., UK and NetherlandsWe can hire as long-term contractor (independent or B2B) in most other countriesSelection Process:1) CodeSignal Online Assessment2) Initial Screen (30 mins)3) Hiring Manager Interview (45 mins)4) Tech Challenge5) Interview with Sr. Data Engineer (60 mins)6) Team Interviews (90 mins + 3 x 45 mins) + SVP of Engineering (30 mins)7) WebMD Sr. Director, DBA (30 mins)Note that leetcode-style live coding challenges will be involved in the process.WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.

Sr. Data Engineer

Emma

Requirements and responsibilities

Skills wanted:

Language(s) required:

About PulsePoint:

www.pulsepoint.com/

Admin access needed

Payment confirmed

A member of the Torre team will contact you shortly