Healthcare Data Lake using Amazon HealthLake

TL;DR

Amazon HealthLake enables healthcare organizations to securely store, transform, and analyze vast volumes of health data on AWS at scale.
It converts unstructured medical information, like clinical notes, prescriptions, and imaging, into structured, FHIR-compliant data for easy access and interoperability.
Built-in machine learning and NLP models extract key medical insights, enabling advanced analytics and predictive modeling.
HealthLake integrates seamlessly with tools like Amazon QuickSight and SageMaker for data visualization and AI-driven predictions.
By utilizing HealthLake, healthcare providers can unify patient data, improve decision-making, and accelerate innovation across care delivery systems.

Healthcare and Life Sciences industry has gone through a massive digital transformation over the last decade, leading to vast data collection. To find value from this data and adopt machine learning models, organizations must address challenges such as data normalization, availability, integrity, and governance. Medical information is highly distributed, contextual, and includes primarily unstructured data such as intake forms, clinical notes, X-rays, CT scans, handwritten prescriptions, insurance claims, etc.

Amazon launched a new service at re:invent 2020 to address the growing pain of managing health-related data in the cloud. The service named Amazon HealthLake is a fully managed HIPAA-eligible service that enables healthcare and life science companies to store, transform, query, and analyze health data on the AWS cloud at a petabyte-scale. Companies can aggregate all of their disparate health information across various styles and formats into a centralized data lake provisioned and managed by Amazon HealthLake. It makes it easy to import both structured and unstructured data from on-premises to AWS cloud. Companies can leverage the pre-built machine learning models to normalize and index the information by tagging the key dates, medical descriptors, and events like medications, procedures, and diagnosis. It enables users to search and analyze all the health information quickly.

Amazon HealthLake does the heavy lifting to configured multiple data sources, ingests the data, indexes all the information to be searched later, and stores it in open standard formats — like the FHIR mandated format. It processes the unstructured text data using NLP and images using ML models like binary classification, multiclass classification, or regression. Once the data has been converted to structured and centralized information, you get a complete view of an individual patient’s history to a level of granularity where you can apply advanced analytics or machine learning models for prediction.

Ingesting data using HealthLake

Amazon HealthLake makes it easier to ingest data from on-premises data sources to AWS. Organizations can use Bulk Import feature to migrate their on-premises files to S3 bucket easily.

Storing data in the open standard format

To enable the fast search queries, Amazon HealthLake Data Store creates a complete view of each patient’s medical history in chronological order. The Data Store facilitates information exchange using the open standard V4 FHIR specification and is always running to keep the index up to date. To meet the regulatory compliance, it enables rigorous security and access control.

Transforming data using Machine Learning models

Amazon HealthLake integrates medical natural language processing (NLP) to transform raw medical data from the Data Store. It uses specialized pre-built Machine Learning models that have been trained to understand and extract meaningful information from unstructured healthcare data. The original resource stays unchanged, and the extracted medical information is automatically appended to the resource.

Querying and searching data

Users can search all the information on a patient using predefined filters or utilize FHIR CRUD (Create/Read/Update/Delete) and FHIR Search operations supported by Amazon HealthLake.

Visualization of data and making predictions

Developers can leverage integration with Amazon QuickSight to quickly create dashboards on the normalized data to explore trends and patterns among their patents. Developers can also use Amazon SageMaker to build, train and deploy their Machine Learning models on the data to make predictions.

As of this writing, Amazon HealthLake is only available in US East (North Virginia) region, but we are sure that it will be made available in the other areas pretty soon. Contact us if you are having challenges with ingesting your data in different formats and finding value out of this data. Our team of cloud experts can help set up your data pipeline or help build a data lake using Amazon HealthLake.

Frequently Asked Questions

1. What is Amazon HealthLake, and how does it help healthcare organizations?

Amazon HealthLake is a fully managed, HIPAA-eligible data lake service that allows healthcare and life sciences organizations to store, transform, and analyze large volumes of health data at scale. It converts unstructured medical data into FHIR-compliant structured data, making it easier to search, query, and apply machine learning for insights.

2. How does Amazon HealthLake handle unstructured healthcare data?

Amazon HealthLake uses machine learning and medical NLP to extract key medical information from unstructured data like clinical notes, prescriptions, and imaging reports. It then tags and indexes this data with details such as diagnoses, medications, and procedures, enabling healthcare providers to build a comprehensive patient record.

3. What are the benefits of using FHIR format in Amazon HealthLake?

The FHIR (Fast Healthcare Interoperability Resources) format ensures that data stored in Amazon HealthLake is interoperable and standardized, making it easier to exchange information across healthcare systems, comply with regulatory requirements, and perform advanced analytics without the need for complex data transformations.

4. Can Amazon HealthLake integrate with analytics and ML tools?

Yes. Once data is ingested and normalized, it can be integrated with Amazon QuickSight for creating dashboards and visualizations, and Amazon SageMaker to build, train, and deploy predictive machine learning models, such as those for predicting readmission rates, disease progression, or treatment outcomes.

5. Is Amazon HealthLake available in all AWS regions?

As of now, Amazon HealthLake is available only in US East (N. Virginia) region, but AWS is expected to expand it to more areas over time. Organizations outside this region can still utilize HealthLake, but should consider latency, compliance, and data residency requirements prior to deployment.