Saving Lives with Big Data

The potential benefits of Big Data applications in the healthcare sector

Door: Miriam Veenstra 

Over the last 15 years, healthcare expenditures in the Netherlands have more than doubled, from over 46 million Euros in 2000 to 96 million Euros in 2016 (Centraal Bureau voor de Statistiek (CBS). Moreover, healthcare expenditures keep on growing. The CPB (Centraal Planbureau) Dutch Bureau for Economic Policy Analysis predicts an increase in healthcare expenditures of 3.5 percent in 2018.

The most obvious reason for the rising healthcare expenditures is the aging population in the Netherlands. However, the aging population is not the only cause of the rising healthcare costs. Rising healthcare costs are also a consequence of prosperity and more knowledge. People set higher demands and have higher expectations, resulting in more doctor visits, more medication and more luxurious forms of care (e.g. independent rooms instead of large dormitories in nursing homes). In addition, more investments are made in the development of medical technology, also leading to increasing expenditures. This article discusses the potential of Big Data analytics for decreasing the costs in the healthcare sector.

Big Data analytics can benefit the healthcare sector in a number of different areas. In this article various examples within each area are presented. The areas include reducing patient re-admissions, disease prevention, personalized healthcare, public health and fraud detection. The remainder of the article is organized as follows. The first section presents a general definition of Big Data. Thereafter different data sources are introduced. The subsequent section provides various examples of Big Data applications in the healthcare sector and finally the last section discusses some challenges and concludes this article.

General definition: the five V’s of Big Data

The term “Big Data” has become very popular over the last years, but what exactly is Big Data? When do we define data to be Big Data? Big Data is generally defined as data sets whose size, diversity and complexity is beyond the ability of traditional data processing software to capture, store, manage and analyze. In 2001, Gartner analyst Doug Laney defined Big Data as being three-dimensional and introduced the so-called three V’s of Big Data: volume, velocity and variety1. Over the past few years, additional elements have been introduced as characteristics of Big Data: veracity and, at the heart of the Big Data challenge, value.


Volume is the most obvious characteristic of Big Data. It refers the sheer size of data and information that is generated. Massive amounts of data are being generated every day. The amount of data generated until 2016 worldwide is 16.1 zettabytes. For reference, 1 zettabyte equals bytes, that is, 1 zettabyte equals 1 trillion gigabytes. The amount of data being generated keeps increasing. In fact, International Data Corporation (IDC) forecasts that this amount of data will grow to 163 zettabytes by 2025*. With respect to the healthcare sector data is no longer restricted to EMR (Electronic Medical Records) databases, but now also includes e.g. claims data, gene sequences, medical device data and social media data.

*Reinsel, D., Gantz, J. & Rygning, J. (2017). Data Age 2025: The Evolution of Data to Life-Critical. Don’t Focus on Big Data; Focus on Data That’s Big. IDC, Sponsored by Seagate.


Variety refers to the type and nature of the data. Data comes in many different forms. To name a few: text, images, audio, video files, geospatial data, social media updates, click data, machine and sensor data, etc. In addition, a distinction is made between structured and unstructured data. Structured data is organized in a formatted repository, that is, it is pre-defined or standardized. An example of structured data in healthcare are declarations with a fixed structure. Most types of data today however are unstructured, meaning that the data is not organized in a pre-defined manner. Medical images from cardiology is an example of unstructured data in healthcare.



Velocity is the measure of speed at which the data is generated and processed. Every day the amount of data increases at enormous speed. For example, in just one minute over 15 million text messages are sent, almost 4 million Google searches are made and over 45 thousand Uber rides are taken worldwide.* In addition, data (collection) is growing faster than ever before. In fact, 90% of all data today was created in the last two years. It is crucial that the speed of transmission, as well as the access to the data, is instantaneous to achieve real-time processing. Information needs to flow quickly to maximize benefits and competitive advantage. For example, a patient with serious health conditions requires immediate interventions when complications occur or vital signs drop. Hence real-time alerts and instantaneous decision support tools are essential.

*(2017) Data Never Sleeps 5.0. Domo. (


Veracity refers to the quality and trustworthiness of the data. The data can be noisy, full of biases and abnormalities, uncertain and imprecise. Important factors to consider are the source of the data, its lineage over time and the intended usage. The quality of healthcare data is highly variable. Take for example handwritten prescriptions, if the handwriting is poor this can lead to inaccurate translations and hence low trustworthiness.


Processing Big Data must bring value from insights gained. Turning all the other dimensions into truly useful business value is at the heart of the Big Data challenge. Examples of Big Data bringing value to the healthcare sector will be presented in a subsequent section (see: Big Data Applications in Healthcare).

Big Data Sources

With the main characteristics of Big Data in mind, let’s now look at where all this data comes from. With respect to healthcare Big Data, a distinction between five data sources can be made:

  • Web and social media data: Clickstream and interaction data from health plan websites, smartphone apps and social media such as Facebook, Twitter, LinkedIn, and blogs

  • Machine to machine data: Reading from sensors, meters, and other devices

  • Transaction data: Healthcare claims and other billing records

  • Biometric data: Genetics, handwriting, fingerprints, retinal scans, X-rays and other medical images, blood pressure, pulse and pulse-oximetry reading, and so forth

  • Human-generated data: Electronic medical records (EMRs), physicians’ notes, emails, voice recordings, and paper documents

As mentioned in Big Data and its Characteristics, most types of data are unstructured. That is, the data is not organized in a pre-defined manner. Handwritten reports and medical images are examples of unstructured data. Structured data on the other hand is data that resides in a fixed field within a record or file. Healthcare claims and electronic medical records are examples of structured data. The next section describes several examples in which above sources of Big Data are used to benefit the healthcare sector. These examples illustrate the large potential of healthcare Big Data analytics.

Big Data Applications in Healthcare

Reducing healthcare costs is a big challenge in the Netherlands. However, reducing healthcare costs often leads to lower-quality care, and vice-versa, improving healthcare often leads to higher costs. This is where Big Data can provide a solution. One of the main potentials of Big Data in healthcare is improving healthcare and in addition reducing costs.

Big Data analytics can benefit the healthcare sector in several different application areas. These application areas include reducing patient readmissions, disease prevention, personalized healthcare, public health and fraud detection. The next section provides already implemented but also conceptual examples within each application area to show the potential of Big Data analytics in healthcare.

Reducing patient readmissions

An example that illustrates this potential is the not-for-profit Aurora Health Care organization in the United States that serves communities throughout eastern Wisconsin and northern Illinois. The Aurora Health Care network consists of 15 hospitals, more than 150 clinics, 70 pharmacies and 33,000 caregivers (including 1,800 employed physicians)2. Together with the more than 1.2 million patients, the Aurora Health Care organization creates a massive amount of data. The organization decided to put this data to use to improve the quality of healthcare while reducing costs.

To realize this, by October 2013, Aurora implemented a record system, called Smart Chart, that accumulated all the data collected over the last ten years into a single data warehouse. The data comprises of invoicing data, lab data, pharmacy data and procedure data. This data and near real-time data analytics enables Aurora to predict and improve patient outcomes and associated treatments. So far, Aurora achieved a saving of $6 million by reducing patient readmissions with 10%, they reduced query time and they saved 42% on treatment costs. Hence, it is safe to say that Big Data analytics can help improve healthcare and reduce costs.

Disease prevention

Another application area where the potential of Big Data is apparent is disease prevention. An example is Propeller Health, an American company that helps patients manage their asthma and chronic obstructive pulmonary disease (COPD). Propeller Health provides patients with a sensor that can be attached to their inhaler. This way Propeller continuously monitors the patients’ conditions: where, when and how often does the patient use the inhaler? Propeller combines the sensor data with, among other things, weather data, traffic data and air quality data. Performing analyses on the combined data enables Propeller to create risk profiles and map risk areas. These tools help asthma and COPD patients manage their disease. From aggregated data across commercial programs, Propeller Health found that making use of the Propeller sensor resulted in up to 79% fewer asthma attacks, up to 50% more doses taken on time and up to 50% more symptom free days3.

Figure 1: Propeller Health Sensor

Personalized healthcare

In addition to improving healthcare, Big Data also provides opportunities for personalizing healthcare. Personalized healthcare takes into account a patient’s unique (genetic) characteristics in diagnoses and treatments. Information about the treatment outcomes of patients with similar characteristics can help determine the most effective treatment.

This is what the Center for Personalized Cancer Treatment (CPCT) in the Netherlands is trying to achieve. Even though it is known that each type of cancer has different characteristics and properties within each patient, in general personalized cancer treatments are rarely utilized. In fact, cancer treatment is generally based on ‘the average cancer patient’. Consequently, the treatment of cancer is very successful for some patients and less successful for other patients.

CPCT is therefore working towards personalized cancer treatment. At the beginning of the treatment process, the center analyzes the genetic material of cancer, the DNA, and creates a profile. Combining this data with data on treatment results and effects of different medications enables the center to predict the outcomes of specific treatments for a patient with specific characteristics. That is, the data is used to develop a customized treatment plan for each patient, with the best possible expected outcome.4

Figure 2: Predicting outbreaks

Public health

Public health institutions can also highly benefit from Big Data analytics. Organizations could, for example, improve public health through monitoring the spread of infectious diseases and analyzing disease patterns. Researchers of IBM, John Hopkins University and the University of California, San Francisco use Big Data to predict outbreaks of dengue fever and malaria. Over the last years, the dengue fever has spread to more than 100 countries, and malaria still causes up to 1 million of deaths each year5. The researchers use data on, among other things, rainfall, temperature and airport and highway traffic in addition to disease data to predict the spread of the diseases and predict where the next outbreaks will be. Accordingly, public health resources can be better deployed.

Fraud detection

Finally, Big Data also has the potential of improving fraud detection. Using Big Data analytics, financial irregularities and fraud can be detected faster and more efficient. The Centers for Medicare and Medicaid Services (CMS) has approximately saved $1.5 billion with the Fraud Prevention System6. The system uses Big Data tools and predictive analytics to identify fraudulent claims and prevent inappropriate payments.

Analysis of large historical data enables CMS to detect and identify anomalies and patterns of suspicious behavior, such as the submission of duplicate claims, healthcare providers that perform a higher rate of tests, or treatments that are not medically requisite. CMS uses predictive modeling tools to compare claims against a fraud profile in real time to flag suspicious billing. This provides for the opportunity to prevent future overpayments.

Potential & Challenges

These inspiring examples illustrate the great potential of Big Data in healthcare. The amount of data keeps increasing every day and with it the possibilities of using this data in the healthcare sector. However, some challenges still lie in the way of using Big Data in the healthcare industry.

First of all, healthcare data is sensitive data, and there are laws and regulations about the protection and privacy of healthcare data. Secondly, the fragmentation of all the different data sources, e.g. hospitals, patients, policy organizations and research institutions, is a huge obstacle as owners typically do not want to share their data. Though Big Data derives its value precisely from the integration of different data sources. A possible solution for these two challenges is to make the data anonymous and only consider the data on an aggregate level, for example aggregate all hospital data together. This way the data cannot be traced back to a specific patient or specific hospital.

A third challenge relates to the technology necessary for the collection, storage and processing of Big Data. Healthcare providers are generally not acquainted with this kind of technology. Therefore healthcare providers must be convinced of the potential of Big Data analytics in the healthcare sector, before they are willing to invest in the technology that is necessary.

To conclude, the potential of Big Data analytics in the healthcare sector is exciting. Reducing patient readmissions, disease prevention, personalizing healthcare, improving public health and fraud prevention are all application areas in which Big Data provides opportunities to improve healthcare and reduce costs. Quantics believes that in the coming years, we will succeed in overcoming the challenges that are in the way of fully benefitting from Big Data analytics in healthcare.


1. Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety. META Group Inc.
Note: the icons were made in Word and are based on the image used in the following blog: Minor, K. (2013). How Big Data and Cognitive Computing are Transforming Insurance: Part 2. IBM.
2. Aurora Healthcare, retrieved 11/14/17. (
3. Proppeller Health: retrieved 11/14/17. (
4. Central for Personalized Cancer Treatment (CPCT), retrieved 11/14/17. (
5. (2013) Made in IBM Labs: Scientist Turn Data into Disease Detective to Predict Dengue Fever and Malaria Outbreaks. IBM
6 .Bellive, J. (2016). Big Data Tool Saves CMS $1.5B by Preventing Medicare Fraud. RevCycle Intelligence.

2019-03-05T10:45:46+01:0028 november 2017|Tags: , , , , |