Data Engineer
Role summary
We are looking for a Data Engineer to ensure our AI and data scientists have reliable, wellstructured, and highquality data across our healthcare and financial services products. The role focuses primarily on healthcare data from UK hospital trusts, with additional exposure to financial services data.
Mission
Your mission is to design, maintain, and improve data pipelines and models that deliver accurate, timely, and trusted data to our AI, data science, and frontend teams. You will help turn complex healthcare and financial data into robust datasets and features that power productiongrade AI applications and userfacing experiences.
Academic Qualifications
- A degree in software engineering, data science or similar areas is desirable.
Key responsibilities
- Build, maintain, and optimize batch and nearrealtime data pipelines on Azure using Azure Data Factory, SQL, Spark, Postgres, Apache Airflow, data lakes, and related services.
- Design and evolve data models and schemas that support AI/ML workflows, analytics, and product needs, with a strong focus on healthcare data (e.g., FHIR/HL7based structures).
- Implement data quality checks, testing, and monitoring to ensure accuracy, completeness, and freshness of data powering AI models and frontend features.
- Setup the necessary Archiving and Restore processes to manage data aggregation across environments.
- Implement API based data extraction and implement data scraping technology where data access is not granted via APIs.
- Expose identified data via APIs via the building of the necessary infrastructure and end points.
- Collaborate closely with data analysts, AI/ML engineers, and frontend teams to understand their data needs and translate them into robust engineering solutions.
- Contribute to and enforce data governance and security practices, including lineage, documentation, and standards for regulated healthcare and financial data.
- Work within a regulated environment, ensuring adherence to HIPAA, GDPR, and relevant healthcare and financial compliance requirements (including ISO27001).
- Support the integration of data from existing hospital and financial partners and internal data science services into our platform’s existing data architecture (without owning the initial partnerside ingestion).
- Help improve performance, costefficiency, and reliability of the data platform as the company scales to more hospitals and financial services clients.
Required experience and skills
- 3+ years of experience as a Data Engineer (or similar role) working with production data pipelines and data platforms.
- Strong proficiency with SQL and relational databases (Postgres or similar), including query optimization and schema design.
- Handson experience with Azure Data Factory and data lake architectures.
- Experience with orchestration tools (e.g., Airflow, Prefect, Dagster) and modern data transformation frameworks (e.g., dbt, Spark).
- Solid programming skills in Python, with a focus on building and maintaining ETL/ELT pipelines.
- Experience supporting data for AI/ML teams or dataheavy products, including feature creation and model input preparation.
- Familiarity with healthcare data, ideally including standards such as FHIR and HL7, and practical experience handling sensitive health data.
- Understanding of data quality frameworks, testing strategies, and monitoring/alerting for data pipelines.
- Comfortable working in a regulated environment with HIPAA, GDPR, and PCIDSS or similar requirements.
- Strong collaboration and communication skills, able to work closely with data scientists, AI/ML engineers, and frontend engineers.
Nice to have
- Exposure to financial services data (e.g., payments, billing, transactional or claims data) and related compliance expectations.
- Experience with Microsoft PowerBI and other front end data reporting tools
- Experience working with GIT or other version control systems
- Experience with AI/LLMoriented data infrastructure such as feature stores, vector databases, or RAGstyle data flows.
- Experience working with UK hospital trusts or health systems and their data ecosystems.
- Experience with eventdriven or streaming architectures for nearrealtime data.
- Familiarity with Azure Devops and pipelines.
Reporting and ways of working
- Reports to: Data Architect.
- Environment: Joining an existing data and AI setup, with significant scope to improve and extend current pipelines and models.
- Location: Flexible (remote/hybrid), with occasional travel for rare onsite sessions as needed.
To apply for this job email your details to support@ebo.ai