HBase Schema Design Example

This post discusses an HBase schema modelling exercise I went through sometime in 2015 but did not get to write about it till today. This is based on the Healthcare data sets.

Here is a brief description of the Domain Data and the problem.

A Patient has their own attributes with demographic info – FirstName, LastName, Address, Phone, DOB etc.

When a Patient goes to the Dr. – it is called as an Encounter/Visit. An Encounter can result in a Patient having – Procedures, Medications, Diagnosis, PatientNotes and so on

1 Patient can have multiple encounters – each with a unique EncounterID

1 Encounter can have multiple – Procedures, Medication, Diagnosis, PatientNotes (Each of these have a unique ID – ProcedureID, MedicationID, DiagnosisID…)

1 Encounter can have multiple – Claims each with a UniqueID

1 Claim – has Multiple – Claim Details (Each has a Claim_DetailID)

1 Claim – has Multiple – Claim Charges (Each has a Claim_ChargeID)

* Each – Diagnosis, Medication or Procedure – has a UniqueID code associated with it

While trying to solve this problem – I tried 3 different approaches on how to model this in HBase. Finally I was able to select the last one after much deliberation and testing. Also shown are how 3 different most common query types can be solved using each of the model

Sample Queries

  • All Encounters for a given Patient
  • All Diagnosis/Medications/Procedures for a Given Patient
  • All Patients with a given Diagnosis / Medication / Procedure

* In each of the approaches below – I have documented – ColumnQualifiers with a large name for clarity – when designing the real application – I had kept the Column Qualifiers to be 1-2 characters so as to reduce the disk space size since for every cell value in HBase all the coordinates are stored. The application developer needs to maintain the metadata that maps these smaller Column Qualifier names to the actual metadata which is more explicit.

Approach 1 – Wide Format

Multiple Types of Row IDs – depending on access pattern 2 CF – Person, Encounter
(CF have Short Names – P, E)

Look at 1st Tab in the attached XLS file which shows the Schema Design Model for HBase using the above Approach

HBaseDataModelDiagrams

 

Approach 2 – Long Format

The image below shows the Schema Design Model for HBase for the above Approach

Look at 2nd Tab in the attached XLS file which shows the Schema Design Model for HBase using the above Approach

HBaseDataModelDiagrams

 

Approach 3 – Wide + Long Format

The image below shows the Schema Design Model for HBase for the above Approach

Look at 3rd Tab in the attached XLS file which shows the Schema Design Model for HBase using the above Approach

HBaseDataModelDiagrams

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s