• Email Alert

big data analytics in healthcare research paper pdf

论文  全文  图  表  新闻 

  • Abstracting/Indexing
  • Journal Metrics
  • Current Editorial Board
  • Early Career Advisory Board
  • Previous Editor-in-Chief
  • Past Issues
  • Current Issue
  • Special Issues
  • Early Access
  • Online Submission
  • Information for Authors
  • Share facebook twitter google linkedin

big data analytics in healthcare research paper pdf

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8 , Top 4% (SCI Q1) CiteScore: 17.6 , Top 3% (Q1) Google Scholar h5-index: 77, TOP 5

Big Data Analytics in Healthcare — A Systematic Literature Review and Roadmap for Practical Implementation

Doi:  10.1109/jas.2020.1003384.

  • Sohail Imran , 
  • Tariq Mahmood ,  , 
  • Ahsan Morshed , 
  • Timos Sellis

Sohail Imran is an Assistant Professor and a doctoral candidate at the PAF-Karachi Institute of Economics and Technology, Pakistan. He has more than 15 years teaching experience in databases, data science, and big data analytics, and more than 10 years of training experience in databases (SQL and NoSQL), big data infrastructure, and data science for different institutes, universities, and the corporate sector. His research work is focused on mapping OLAP data warehousing schema into the distributed Hadoop environment. Specifically, he has developed a framework which creates dimension and fact tables over Hbase and Hive in a NoSQL schema-less manner and computes aggregates through SQL-overHadoop technologies (Presto, Drill, Spark SQL). This functionality is made scalable through containerization and more efficient through the use of Apache Spark

Tariq Mahmood is an Associate Professor at the Faculty of Computer Science, Institute of Business Administration (IBA), Pakistan. He received the Ph.D. degree in machine learning from University of Trento, Italy, and the M.S. degree in statistical machine learning from Universite Pierre et Marie Curie (Paris 6), France. He has published around 20 international journal and 35 conference publications with total 691 citations and h-index of 12 (Google Scholar). His research interests include BDA, deep learning and machine learning/data science. He heads the Big Data Analytics Laboratory at IBA, with the focus on imparting data science and big data certifications to students and industry professionals, implementing BDA-related industrial projects and researching in BDA technology stack, particularly to develop BDA architectures for different types of streaming and non-streaming data. He also consults in various local industries regarding business intelligence, data governance, BDA, and machine learning

Ahsan Morshed is a Lecturer in ICT at CQ University, Australia. Previously, he was a Research Fellow in Data Analytics at Swinburne University of Technology and a Senior Project Officer at RMIT University. He was also a Postdoctoral Fellow at CSIRO (Australia) on sensor data integration and machine learning, and an Information Management Specialist in the OEKC division at Food and Agriculture Organization (FAO) of UN in Rome, Italy. During his time in FAO, he acquired extensive skills in metadata standards, knowledge organization systems, ontologies, Linked Open Data management and information management tools. His research interests are the big data, data science, semantic web, linked open data and semantic machine learning. He holds the Ph.D. degree from the University of Trento, Italy. Dr. Morshed has 50 peer-reviewed publications (book, book chapter, journals, conference and workshop papers), with 229 citations and an h-index of 6 (Google Scholar)

Timos Sellis (F’09) is a Professor at Swinburne University of Technology, Australia. He holds the diploma from National Technical University of Athens (NTUA), Greece, the M.Sc. degree from Harvard University, USA, and the Ph.D. degree from the University of California at Berkeley, USA. Timos has a significant international research reputation in big data, data analytics, data integration and spatiotemporal database systems. He is a Fellow of the Association for Computing Machinery (ACM) for his contributions to database query optimisation, spatial data management and data warehousing and also an Institute of Electrical and Electronics Engineers (IEEE) Fellow for his contributions to database query optimisation and spatial data management. In 2018 he was awarded the IEEE TCDE Impact Award, in recognition of his impact in the field and for contributions to database systems research and broadening the reach of data engineering research. Before joining Swinburne, Timos was the Director of the Institute for Management of Information Systems and Professor at the National Technical University of Athens. He has also held the role of Director, Big Data Lab at RMIT University

  • Corresponding author: T. Mahmood is with the Faculty of Computer Science, Institute of Business Administration, Karachi 75270, Pakistan (e-mail: [email protected] )
  • 1 https://neo4j.com
  • 2 http://www.hl7.org/implement/standards/fhir/)
  • 3 A group of graduate students participated in this activity over a period of 3 months. For the sake of brevity, the details are outside the scope of this paper.
  • 4 To the best of our knowledge, this list is complete as of June 2020.
  • 5 A detailed discussion of the nine compared papers is outside the scope of this work; we invite the reader to go through these papers for more required information.
  • Revised Date: 2020-07-21
  • Accepted Date: 2020-07-22
  • Big data analytics (BDA) , 
  • big data architecture , 
  • healthcare , 
  • NoSQL data stores , 
  • patient care , 
  • roadmap , 
  • systematic literature review

Proportional views

通讯作者: 陈斌, [email protected].

沈阳化工大学材料科学与工程学院 沈阳 110142

Figures( 13 )  /  Tables( 5 )

Article Metrics

  • PDF Downloads( 319 )
  • Abstract views( 4119 )
  • HTML views( 909 )
  • The most thorough systematic literature review on big data analytics applications to healthcare
  • Focus on healthcare applications for NoSQL databases and Apache Hadoop ecosystem
  • Proposes the first-ever Zeta architecture called Med-BDA for big healthcare data analytics
  • Med-BDA has the potential to solve ALL current limitations for big healthcare data analytics
  • We present business strategies to successfully implement Med-BDA in any clinical organization
  • Copyright © 2022 IEEE/CAA Journal of Automatica Sinica
  • 京ICP备14019135号-24
  • E-mail: [email protected]  Tel: +86-10-82544459, 10-82544746
  • Address: 95 Zhongguancun East Road, Handian District, Beijing 100190, China

big data analytics in healthcare research paper pdf

Export File

shu

  • Figure 1. Year-wise distribution of selected 99 articles
  • Figure 2. Digital source distribution for six basic search queries
  • Figure 3. Digital source distribution for six basic search queries + healthcare (HC)
  • Figure 4. Digital source distribution for six basic search queries + healthcare analytics (HA)
  • Figure 5. Hadoop components and ecosystem
  • Figure 6. Data generators for an HIMS
  • Figure 7. The 4 V’s big data identified in healthcare research literature
  • Figure 8. The Challenges in Application of Big Data Analytics to Healthcare
  • Figure 9. A snapshot of key-value store from healthcare domain
  • Figure 10. A snapshot of columnar store from healthcare domain
  • Figure 11. A snapshot of a document store from healthcare domain
  • Figure 12. A snapshot of a graph store from healthcare domain
  • Figure 13. Med-BDA: A state-of-the-art BDA architecture for healthcare
  • Survey paper
  • Open access
  • Published: 19 June 2019

Big data in healthcare: management, analysis and future prospects

  • Sabyasachi Dash 1   na1 ,
  • Sushil Kumar Shakyawar 2 , 3   na1 ,
  • Mohit Sharma 4 , 5 &
  • Sandeep Kaushik 6  

Journal of Big Data volume  6 , Article number:  54 ( 2019 ) Cite this article

447k Accesses

694 Citations

103 Altmetric

Metrics details

‘Big data’ is massive amounts of information that can work wonders. It has become a topic of special interest for the past two decades because of a great potential that is hidden in it. Various public and private sector industries generate, store, and analyze big data with an aim to improve the services they provide. In the healthcare industry, various sources for big data include hospital records, medical records of patients, results of medical examinations, and devices that are a part of internet of things. Biomedical research also generates a significant portion of big data relevant to public healthcare. This data requires proper management and analysis in order to derive meaningful information. Otherwise, seeking solution by analyzing big data quickly becomes comparable to finding a needle in the haystack. There are various challenges associated with each step of handling big data which can only be surpassed by using high-end computing solutions for big data analysis. That is why, to provide relevant solutions for improving public health, healthcare providers are required to be fully equipped with appropriate infrastructure to systematically generate and analyze big data. An efficient management, analysis, and interpretation of big data can change the game by opening new avenues for modern healthcare. That is exactly why various industries, including the healthcare industry, are taking vigorous steps to convert this potential into better services and financial advantages. With a strong integration of biomedical and healthcare data, modern healthcare organizations can possibly revolutionize the medical therapies and personalized medicine.

Introduction

Information has been the key to a better organization and new developments. The more information we have, the more optimally we can organize ourselves to deliver the best outcomes. That is why data collection is an important part for every organization. We can also use this data for the prediction of current trends of certain parameters and future events. As we are becoming more and more aware of this, we have started producing and collecting more data about almost everything by introducing technological developments in this direction. Today, we are facing a situation wherein we are flooded with tons of data from every aspect of our life such as social activities, science, work, health, etc. In a way, we can compare the present situation to a data deluge. The technological advances have helped us in generating more and more data, even to a level where it has become unmanageable with currently available technologies. This has led to the creation of the term ‘big data’ to describe data that is large and unmanageable. In order to meet our present and future social needs, we need to develop new strategies to organize this data and derive meaningful information. One such special social need is healthcare. Like every other industry, healthcare organizations are producing data at a tremendous rate that presents many advantages and challenges at the same time. In this review, we discuss about the basics of big data including its management, analysis and future prospects especially in healthcare sector.

The data overload

Every day, people working with various organizations around the world are generating a massive amount of data. The term “digital universe” quantitatively defines such massive amounts of data created, replicated, and consumed in a single year. International Data Corporation (IDC) estimated the approximate size of the digital universe in 2005 to be 130 exabytes (EB). The digital universe in 2017 expanded to about 16,000 EB or 16 zettabytes (ZB). IDC predicted that the digital universe would expand to 40,000 EB by the year 2020. To imagine this size, we would have to assign about 5200 gigabytes (GB) of data to all individuals. This exemplifies the phenomenal speed at which the digital universe is expanding. The internet giants, like Google and Facebook, have been collecting and storing massive amounts of data. For instance, depending on our preferences, Google may store a variety of information including user location, advertisement preferences, list of applications used, internet browsing history, contacts, bookmarks, emails, and other necessary information associated with the user. Similarly, Facebook stores and analyzes more than about 30 petabytes (PB) of user-generated data. Such large amounts of data constitute ‘ big data ’. Over the past decade, big data has been successfully used by the IT industry to generate critical information that can generate significant revenue.

These observations have become so conspicuous that has eventually led to the birth of a new field of science termed ‘ Data Science ’. Data science deals with various aspects including data management and analysis, to extract deeper insights for improving the functionality or services of a system (for example, healthcare and transport system). Additionally, with the availability of some of the most creative and meaningful ways to visualize big data post-analysis, it has become easier to understand the functioning of any complex system. As a large section of society is becoming aware of, and involved in generating big data, it has become necessary to define what big data is. Therefore, in this review, we attempt to provide details on the impact of big data in the transformation of global healthcare sector and its impact on our daily lives.

Defining big data

As the name suggests, ‘big data’ represents large amounts of data that is unmanageable using traditional software or internet-based platforms. It surpasses the traditionally used amount of storage, processing and analytical power. Even though a number of definitions for big data exist, the most popular and well-accepted definition was given by Douglas Laney. Laney observed that (big) data was growing in three different dimensions namely, volume, velocity and variety (known as the 3 Vs) [ 1 ]. The ‘big’ part of big data is indicative of its large volume. In addition to volume, the big data description also includes velocity and variety. Velocity indicates the speed or rate of data collection and making it accessible for further analysis; while, variety remarks on the different types of organized and unorganized data that any firm or system can collect, such as transaction-level data, video, audio, text or log files. These three Vs have become the standard definition of big data. Although, other people have added several other Vs to this definition [ 2 ], the most accepted 4th V remains ‘veracity’.

The term “ big data ” has become extremely popular across the globe in recent years. Almost every sector of research, whether it relates to industry or academics, is generating and analyzing big data for various purposes. The most challenging task regarding this huge heap of data that can be organized and unorganized, is its management. Given the fact that big data is unmanageable using the traditional software, we need technically advanced applications and software that can utilize fast and cost-efficient high-end computational power for such tasks. Implementation of artificial intelligence (AI) algorithms and novel fusion algorithms would be necessary to make sense from this large amount of data. Indeed, it would be a great feat to achieve automated decision-making by the implementation of machine learning (ML) methods like neural networks and other AI techniques. However, in absence of appropriate software and hardware support, big data can be quite hazy. We need to develop better techniques to handle this ‘endless sea’ of data and smart web applications for efficient analysis to gain workable insights. With proper storage and analytical tools in hand, the information and insights derived from big data can make the critical social infrastructure components and services (like healthcare, safety or transportation) more aware, interactive and efficient [ 3 ]. In addition, visualization of big data in a user-friendly manner will be a critical factor for societal development.

Healthcare as a big-data repository

Healthcare is a multi-dimensional system established with the sole aim for the prevention, diagnosis, and treatment of health-related issues or impairments in human beings. The major components of a healthcare system are the health professionals (physicians or nurses), health facilities (clinics, hospitals for delivering medicines and other diagnosis or treatment technologies), and a financing institution supporting the former two. The health professionals belong to various health sectors like dentistry, medicine, midwifery, nursing, psychology, physiotherapy, and many others. Healthcare is required at several levels depending on the urgency of situation. Professionals serve it as the first point of consultation (for primary care), acute care requiring skilled professionals (secondary care), advanced medical investigation and treatment (tertiary care) and highly uncommon diagnostic or surgical procedures (quaternary care). At all these levels, the health professionals are responsible for different kinds of information such as patient’s medical history (diagnosis and prescriptions related data), medical and clinical data (like data from imaging and laboratory examinations), and other private or personal medical data. Previously, the common practice to store such medical records for a patient was in the form of either handwritten notes or typed reports [ 4 ]. Even the results from a medical examination were stored in a paper file system. In fact, this practice is really old, with the oldest case reports existing on a papyrus text from Egypt that dates back to 1600 BC [ 5 ]. In Stanley Reiser’s words, the clinical case records freeze the episode of illness as a story in which patient, family and the doctor are a part of the plot” [ 6 ].

With the advent of computer systems and its potential, the digitization of all clinical exams and medical records in the healthcare systems has become a standard and widely adopted practice nowadays. In 2003, a division of the National Academies of Sciences, Engineering, and Medicine known as Institute of Medicine chose the term “ electronic health records ” to represent records maintained for improving the health care sector towards the benefit of patients and clinicians. Electronic health records (EHR) as defined by Murphy, Hanken and Waters are computerized medical records for patients any information relating to the past, present or future physical/mental health or condition of an individual which resides in electronic system(s) used to capture, transmit, receive, store, retrieve, link and manipulate multimedia data for the primary purpose of providing healthcare and health-related services” [ 7 ].

Electronic health records

It is important to note that the National Institutes of Health (NIH) recently announced the “All of Us” initiative ( https://allofus.nih.gov/ ) that aims to collect one million or more patients’ data such as EHR, including medical imaging, socio-behavioral, and environmental data over the next few years. EHRs have introduced many advantages for handling modern healthcare related data. Below, we describe some of the characteristic advantages of using EHRs. The first advantage of EHRs is that healthcare professionals have an improved access to the entire medical history of a patient. The information includes medical diagnoses, prescriptions, data related to known allergies, demographics, clinical narratives, and the results obtained from various laboratory tests. The recognition and treatment of medical conditions thus is time efficient due to a reduction in the lag time of previous test results. With time we have observed a significant decrease in the redundant and additional examinations, lost orders and ambiguities caused by illegible handwriting, and an improved care coordination between multiple healthcare providers. Overcoming such logistical errors has led to reduction in the number of drug allergies by reducing errors in medication dose and frequency. Healthcare professionals have also found access over web based and electronic platforms to improve their medical practices significantly using automatic reminders and prompts regarding vaccinations, abnormal laboratory results, cancer screening, and other periodic checkups. There would be a greater continuity of care and timely interventions by facilitating communication among multiple healthcare providers and patients. They can be associated to electronic authorization and immediate insurance approvals due to less paperwork. EHRs enable faster data retrieval and facilitate reporting of key healthcare quality indicators to the organizations, and also improve public health surveillance by immediate reporting of disease outbreaks. EHRs also provide relevant data regarding the quality of care for the beneficiaries of employee health insurance programs and can help control the increasing costs of health insurance benefits. Finally, EHRs can reduce or absolutely eliminate delays and confusion in the billing and claims management area. The EHRs and internet together help provide access to millions of health-related medical information critical for patient life.

Digitization of healthcare and big data

Similar to EHR, an electronic medical record (EMR) stores the standard medical and clinical data gathered from the patients. EHRs, EMRs, personal health record (PHR), medical practice management software (MPM), and many other healthcare data components collectively have the potential to improve the quality, service efficiency, and costs of healthcare along with the reduction of medical errors. The big data in healthcare includes the healthcare payer-provider data (such as EMRs, pharmacy prescription, and insurance records) along with the genomics-driven experiments (such as genotyping, gene expression data) and other data acquired from the smart web of internet of things (IoT) (Fig.  1 ). The adoption of EHRs was slow at the beginning of the 21st century however it has grown substantially after 2009 [ 7 , 8 ]. The management and usage of such healthcare data has been increasingly dependent on information technology. The development and usage of wellness monitoring devices and related software that can generate alerts and share the health related data of a patient with the respective health care providers has gained momentum, especially in establishing a real-time biomedical and health monitoring system. These devices are generating a huge amount of data that can be analyzed to provide real-time clinical or medical care [ 9 ]. The use of big data from healthcare shows promise for improving health outcomes and controlling costs.

figure 1

Workflow of Big data Analytics. Data warehouses store massive amounts of data generated from various sources. This data is processed using analytic pipelines to obtain smarter and affordable healthcare options

Big data in biomedical research

A biological system, such as a human cell, exhibits molecular and physical events of complex interplay. In order to understand interdependencies of various components and events of such a complex system, a biomedical or biological experiment usually gathers data on a smaller and/or simpler component. Consequently, it requires multiple simplified experiments to generate a wide map of a given biological phenomenon of interest. This indicates that more the data we have, the better we understand the biological processes. With this idea, modern techniques have evolved at a great pace. For instance, one can imagine the amount of data generated since the integration of efficient technologies like next-generation sequencing (NGS) and Genome wide association studies (GWAS) to decode human genetics. NGS-based data provides information at depths that were previously inaccessible and takes the experimental scenario to a completely new dimension. It has increased the resolution at which we observe or record biological events associated with specific diseases in a real time manner. The idea that large amounts of data can provide us a good amount of information that often remains unidentified or hidden in smaller experimental methods has ushered-in the ‘- omics ’ era. The ‘ omics ’ discipline has witnessed significant progress as instead of studying a single ‘ gene ’ scientists can now study the whole ‘ genome ’ of an organism in ‘ genomics ’ studies within a given amount of time. Similarly, instead of studying the expression or ‘ transcription ’ of single gene, we can now study the expression of all the genes or the entire ‘ transcriptome ’ of an organism under ‘ transcriptomics ’ studies. Each of these individual experiments generate a large amount of data with more depth of information than ever before. Yet, this depth and resolution might be insufficient to provide all the details required to explain a particular mechanism or event. Therefore, one usually finds oneself analyzing a large amount of data obtained from multiple experiments to gain novel insights. This fact is supported by a continuous rise in the number of publications regarding big data in healthcare (Fig.  2 ). Analysis of such big data from medical and healthcare systems can be of immense help in providing novel strategies for healthcare. The latest technological developments in data generation, collection and analysis, have raised expectations towards a revolution in the field of personalized medicine in near future.

figure 2

Publications associated with big data in healthcare. The numbers of publications in PubMed are plotted by year

Big data from omics studies

NGS has greatly simplified the sequencing and decreased the costs for generating whole genome sequence data. The cost of complete genome sequencing has fallen from millions to a couple of thousand dollars [ 10 ]. NGS technology has resulted in an increased volume of biomedical data that comes from genomic and transcriptomic studies. According to an estimate, the number of human genomes sequenced by 2025 could be between 100 million to 2 billion [ 11 ]. Combining the genomic and transcriptomic data with proteomic and metabolomic data can greatly enhance our knowledge about the individual profile of a patient—an approach often ascribed as “individual, personalized or precision health care”. Systematic and integrative analysis of omics data in conjugation with healthcare analytics can help design better treatment strategies towards precision and personalized medicine (Fig.  3 ). The genomics-driven experiments e.g., genotyping, gene expression, and NGS-based studies are the major source of big data in biomedical healthcare along with EMRs, pharmacy prescription information, and insurance records. Healthcare requires a strong integration of such biomedical data from various sources to provide better treatments and patient care. These prospects are so exciting that even though genomic data from patients would have many variables to be accounted, yet commercial organizations are already using human genome data to help the providers in making personalized medical decisions. This might turn out to be a game-changer in future medicine and health.

figure 3

A framework for integrating omics data and health care analytics to promote personalized treatment

Internet of Things (IOT)

Healthcare industry has not been quick enough to adapt to the big data movement compared to other industries. Therefore, big data usage in the healthcare sector is still in its infancy. For example, healthcare and biomedical big data have not yet converged to enhance healthcare data with molecular pathology. Such convergence can help unravel various mechanisms of action or other aspects of predictive biology. Therefore, to assess an individual’s health status, biomolecular and clinical datasets need to be married. One such source of clinical data in healthcare is ‘internet of things’ (IoT).

In fact, IoT is another big player implemented in a number of other industries including healthcare. Until recently, the objects of common use such as cars, watches, refrigerators and health-monitoring devices, did not usually produce or handle data and lacked internet connectivity. However, furnishing such objects with computer chips and sensors that enable data collection and transmission over internet has opened new avenues. The device technologies such as Radio Frequency IDentification (RFID) tags and readers, and Near Field Communication (NFC) devices, that can not only gather information but interact physically, are being increasingly used as the information and communication systems [ 3 ]. This enables objects with RFID or NFC to communicate and function as a web of smart things. The analysis of data collected from these chips or sensors may reveal critical information that might be beneficial in improving lifestyle, establishing measures for energy conservation, improving transportation, and healthcare. In fact, IoT has become a rising movement in the field of healthcare. IoT devices create a continuous stream of data while monitoring the health of people (or patients) which makes these devices a major contributor to big data in healthcare. Such resources can interconnect various devices to provide a reliable, effective and smart healthcare service to the elderly and patients with a chronic illness [ 12 ].

Advantages of IoT in healthcare

Using the web of IoT devices, a doctor can measure and monitor various parameters from his/her clients in their respective locations for example, home or office. Therefore, through early intervention and treatment, a patient might not need hospitalization or even visit the doctor resulting in significant cost reduction in healthcare expenses. Some examples of IoT devices used in healthcare include fitness or health-tracking wearable devices, biosensors, clinical devices for monitoring vital signs, and others types of devices or clinical instruments. Such IoT devices generate a large amount of health related data. If we can integrate this data with other existing healthcare data like EMRs or PHRs, we can predict a patients’ health status and its progression from subclinical to pathological state [ 9 ]. In fact, big data generated from IoT has been quiet advantageous in several areas in offering better investigation and predictions. On a larger scale, the data from such devices can help in personnel health monitoring, modelling the spread of a disease and finding ways to contain a particular disease outbreak.

The analysis of data from IoT would require an updated operating software because of its specific nature along with advanced hardware and software applications. We would need to manage data inflow from IoT instruments in real-time and analyze it by the minute. Associates in the healthcare system are trying to trim down the cost and ameliorate the quality of care by applying advanced analytics to both internally and externally generated data.

Mobile computing and mobile health (mHealth)

In today’s digital world, every individual seems to be obsessed to track their fitness and health statistics using the in-built pedometer of their portable and wearable devices such as, smartphones, smartwatches, fitness dashboards or tablets. With an increasingly mobile society in almost all aspects of life, the healthcare infrastructure needs remodeling to accommodate mobile devices [ 13 ]. The practice of medicine and public health using mobile devices, known as mHealth or mobile health, pervades different degrees of health care especially for chronic diseases, such as diabetes and cancer [ 14 ]. Healthcare organizations are increasingly using mobile health and wellness services for implementing novel and innovative ways to provide care and coordinate health as well as wellness. Mobile platforms can improve healthcare by accelerating interactive communication between patients and healthcare providers. In fact, Apple and Google have developed devoted platforms like Apple’s ResearchKit and Google Fit for developing research applications for fitness and health statistics [ 15 ]. These applications support seamless interaction with various consumer devices and embedded sensors for data integration. These apps help the doctors to have direct access to your overall health data. Both the user and their doctors get to know the real-time status of your body. These apps and smart devices also help by improving our wellness planning and encouraging healthy lifestyles. The users or patients can become advocates for their own health.

Nature of the big data in healthcare

EHRs can enable advanced analytics and help clinical decision-making by providing enormous data. However, a large proportion of this data is currently unstructured in nature. An unstructured data is the information that does not adhere to a pre-defined model or organizational framework. The reason for this choice may simply be that we can record it in a myriad of formats. Another reason for opting unstructured format is that often the structured input options (drop-down menus, radio buttons, and check boxes) can fall short for capturing data of complex nature. For example, we cannot record the non-standard data regarding a patient’s clinical suspicions, socioeconomic data, patient preferences, key lifestyle factors, and other related information in any other way but an unstructured format. It is difficult to group such varied, yet critical, sources of information into an intuitive or unified data format for further analysis using algorithms to understand and leverage the patients care. Nonetheless, the healthcare industry is required to utilize the full potential of these rich streams of information to enhance the patient experience. In the healthcare sector, it could materialize in terms of better management, care and low-cost treatments. We are miles away from realizing the benefits of big data in a meaningful way and harnessing the insights that come from it. In order to achieve these goals, we need to manage and analyze the big data in a systematic manner.

Management and analysis of big data

Big data is the huge amounts of a variety of data generated at a rapid rate. The data gathered from various sources is mostly required for optimizing consumer services rather than consumer consumption. This is also true for big data from the biomedical research and healthcare. The major challenge with big data is how to handle this large volume of information. To make it available for scientific community, the data is required to be stored in a file format that is easily accessible and readable for an efficient analysis. In the context of healthcare data, another major challenge is the implementation of high-end computing tools, protocols and high-end hardware in the clinical setting. Experts from diverse backgrounds including biology, information technology, statistics, and mathematics are required to work together to achieve this goal. The data collected using the sensors can be made available on a storage cloud with pre-installed software tools developed by analytic tool developers. These tools would have data mining and ML functions developed by AI experts to convert the information stored as data into knowledge. Upon implementation, it would enhance the efficiency of acquiring, storing, analyzing, and visualization of big data from healthcare. The main task is to annotate, integrate, and present this complex data in an appropriate manner for a better understanding. In absence of such relevant information, the (healthcare) data remains quite cloudy and may not lead the biomedical researchers any further. Finally, visualization tools developed by computer graphics designers can efficiently display this newly gained knowledge.

Heterogeneity of data is another challenge in big data analysis. The huge size and highly heterogeneous nature of big data in healthcare renders it relatively less informative using the conventional technologies. The most common platforms for operating the software framework that assists big data analysis are high power computing clusters accessed via grid computing infrastructures. Cloud computing is such a system that has virtualized storage technologies and provides reliable services. It offers high reliability, scalability and autonomy along with ubiquitous access, dynamic resource discovery and composability. Such platforms can act as a receiver of data from the ubiquitous sensors, as a computer to analyze and interpret the data, as well as providing the user with easy to understand web-based visualization. In IoT, the big data processing and analytics can be performed closer to data source using the services of mobile edge computing cloudlets and fog computing. Advanced algorithms are required to implement ML and AI approaches for big data analysis on computing clusters. A programming language suitable for working on big data (e.g. Python, R or other languages) could be used to write such algorithms or software. Therefore, a good knowledge of biology and IT is required to handle the big data from biomedical research. Such a combination of both the trades usually fits for bioinformaticians. The most common among various platforms used for working with big data include Hadoop and Apache Spark. We briefly introduce these platforms below.

Loading large amounts of (big) data into the memory of even the most powerful of computing clusters is not an efficient way to work with big data. Therefore, the best logical approach for analyzing huge volumes of complex big data is to distribute and process it in parallel on multiple nodes. However, the size of data is usually so large that thousands of computing machines are required to distribute and finish processing in a reasonable amount of time. When working with hundreds or thousands of nodes, one has to handle issues like how to parallelize the computation, distribute the data, and handle failures. One of most popular open-source distributed application for this purpose is Hadoop [ 16 ]. Hadoop implements MapReduce algorithm for processing and generating large datasets. MapReduce uses map and reduce primitives to map each logical record’ in the input into a set of intermediate key/value pairs, and reduce operation combines all the values that shared the same key [ 17 ]. It efficiently parallelizes the computation, handles failures, and schedules inter-machine communication across large-scale clusters of machines. Hadoop Distributed File System (HDFS) is the file system component that provides a scalable, efficient, and replica based storage of data at various nodes that form a part of a cluster [ 16 ]. Hadoop has other tools that enhance the storage and processing components therefore many large companies like Yahoo, Facebook, and others have rapidly adopted  it. Hadoop has enabled researchers to use data sets otherwise impossible to handle. Many large projects, like the determination of a correlation between the air quality data and asthma admissions, drug development using genomic and proteomic data, and other such aspects of healthcare are implementing Hadoop. Therefore, with the implementation of Hadoop system, the healthcare analytics will not be held back.

Apache Spark

Apache Spark is another open source alternative to Hadoop. It is a unified engine for distributed data processing that includes higher-level libraries for supporting SQL queries ( Spark SQL ), streaming data ( Spark Streaming ), machine learning ( MLlib ) and graph processing ( GraphX ) [ 18 ]. These libraries help in increasing developer productivity because the programming interface requires lesser coding efforts and can be seamlessly combined to create more types of complex computations. By implementing Resilient distributed Datasets (RDDs), in-memory processing of data is supported that can make Spark about 100× faster than Hadoop in multi-pass analytics (on smaller datasets) [ 19 , 20 ]. This is more true when the data size is smaller than the available memory [ 21 ]. This indicates that processing of really big data with Apache Spark would require a large amount of memory. Since, the cost of memory is higher than the hard drive, MapReduce is expected to be more cost effective for large datasets compared to Apache Spark. Similarly, Apache Storm was developed to provide a real-time framework for data stream processing. This platform supports most of the programming languages. Additionally, it offers good horizontal scalability and built-in-fault-tolerance capability for big data analysis.

Machine learning for information extraction, data analysis and predictions

In healthcare, patient data contains recorded signals for instance, electrocardiogram (ECG), images, and videos. Healthcare providers have barely managed to convert such healthcare data into EHRs. Efforts are underway to digitize patient-histories from pre-EHR era notes and supplement the standardization process by turning static images into machine-readable text. For example, optical character recognition (OCR) software is one such approach that can recognize handwriting as well as computer fonts and push digitization. Such unstructured and structured healthcare datasets have untapped wealth of information that can be harnessed using advanced AI programs to draw critical actionable insights in the context of patient care. In fact, AI has emerged as the method of choice for big data applications in medicine. This smart system has quickly found its niche in decision making process for the diagnosis of diseases. Healthcare professionals analyze such data for targeted abnormalities using appropriate ML approaches. ML can filter out structured information from such raw data.

Extracting information from EHR datasets

Emerging ML or AI based strategies are helping to refine healthcare industry’s information processing capabilities. For example, natural language processing (NLP) is a rapidly developing area of machine learning that can identify key syntactic structures in free text, help in speech recognition and extract the meaning behind a narrative. NLP tools can help generate new documents, like a clinical visit summary, or to dictate clinical notes. The unique content and complexity of clinical documentation can be challenging for many NLP developers. Nonetheless, we should be able to extract relevant information from healthcare data using such approaches as NLP.

AI has also been used to provide predictive capabilities to healthcare big data. For example, ML algorithms can convert the diagnostic system of medical images into automated decision-making. Though it is apparent that healthcare professionals may not be replaced by machines in the near future, yet AI can definitely assist physicians to make better clinical decisions or even replace human judgment in certain functional areas of healthcare.

Image analytics

Some of the most widely used imaging techniques in healthcare include computed tomography (CT), magnetic resonance imaging (MRI), X-ray, molecular imaging, ultrasound, photo-acoustic imaging, functional MRI (fMRI), positron emission tomography (PET), electroencephalography (EEG), and mammograms. These techniques capture high definition medical images (patient data) of large sizes. Healthcare professionals like radiologists, doctors and others do an excellent job in analyzing medical data in the form of these files for targeted abnormalities. However, it is also important to acknowledge the lack of specialized professionals for many diseases. In order to compensate for this dearth of professionals, efficient systems like Picture Archiving and Communication System (PACS) have been developed for storing and convenient access to medical image and reports data [ 22 ]. PACSs are popular for delivering images to local workstations, accomplished by protocols such as digital image communication in medicine (DICOM). However, data exchange with a PACS relies on using structured data to retrieve medical images. This by nature misses out on the unstructured information contained in some of the biomedical images. Moreover, it is possible to miss an additional information about a patient’s health status that is present in these images or similar data. A professional focused on diagnosing an unrelated condition might not observe it, especially when the condition is still emerging. To help in such situations, image analytics is making an impact on healthcare by actively extracting disease biomarkers from biomedical images. This approach uses ML and pattern recognition techniques to draw insights from massive volumes of clinical image data to transform the diagnosis, treatment and monitoring of patients. It focuses on enhancing the diagnostic capability of medical imaging for clinical decision-making.

A number of software tools have been developed based on functionalities such as generic, registration, segmentation, visualization, reconstruction, simulation and diffusion to perform medical image analysis in order to dig out the hidden information. For example, Visualization Toolkit is a freely available software which allows powerful processing and analysis of 3D images from medical tests [ 23 ], while SPM can process and analyze 5 different types of brain images (e.g. MRI, fMRI, PET, CT-Scan and EEG) [ 24 ]. Other software like GIMIAS, Elastix, and MITK support all types of images. Various other widely used tools and their features in this domain are listed in Table  1 . Such bioinformatics-based big data analysis may extract greater insights and value from imaging data to boost and support precision medicine projects, clinical decision support tools, and other modes of healthcare. For example, we can also use it to monitor new targeted-treatments for cancer.

Big data from omics

The big data from “omics” studies is a new kind of challenge for the bioinformaticians. Robust algorithms are required to analyze such complex data from biological systems. The ultimate goal is to convert this huge data into an informative knowledge base. The application of bioinformatics approaches to transform the biomedical and genomics data into predictive and preventive health is known as translational bioinformatics. It is at the forefront of data-driven healthcare. Various kinds of quantitative data in healthcare, for example from laboratory measurements, medication data and genomic profiles, can be combined and used to identify new meta-data that can help precision therapies [ 25 ]. This is why emerging new technologies are required to help in analyzing this digital wealth. In fact, highly ambitious multimillion-dollar projects like “ Big Data Research and Development Initiative ” have been launched that aim to enhance the quality of big data tools and techniques for a better organization, efficient access and smart analysis of big data. There are many advantages anticipated from the processing of ‘ omics’ data from large-scale Human Genome Project and other population sequencing projects. In the population sequencing projects like 1000 genomes, the researchers will have access to a marvelous amount of raw data. Similarly, Human Genome Project based Encyclopedia of DNA Elements (ENCODE) project aimed to determine all functional elements in the human genome using bioinformatics approaches. Here, we list some of the widely used bioinformatics-based tools for big data analytics on omics data.

SparkSeq is an efficient and cloud-ready platform based on Apache Spark framework and Hadoop library that is used for analyses of genomic data for interactive genomic data analysis with nucleotide precision

SAMQA identifies errors and ensures the quality of large-scale genomic data. This tool was originally built for the National Institutes of Health Cancer Genome Atlas project to identify and report errors including sequence alignment/map [SAM] format error and empty reads.

ART can simulate profiles of read errors and read lengths for data obtained using high throughput sequencing platforms including SOLiD and Illumina platforms.

DistMap is another toolkit used for distributed short-read mapping based on Hadoop cluster that aims to cover a wider range of sequencing applications. For instance, one of its applications namely the BWA mapper can perform 500 million read pairs in about 6 h, approximately 13 times faster than a conventional single-node mapper.

SeqWare is a query engine based on Apache HBase database system that enables access for large-scale whole-genome datasets by integrating genome browsers and tools.

CloudBurst is a parallel computing model utilized in genome mapping experiments to improve the scalability of reading large sequencing data.

Hydra uses the Hadoop-distributed computing framework for processing large peptide and spectra databases for proteomics datasets. This specific tool is capable of performing 27 billion peptide scorings in less than 60 min on a Hadoop cluster.

BlueSNP is an R package based on Hadoop platform used for genome-wide association studies (GWAS) analysis, primarily aiming on the statistical readouts to obtain significant associations between genotype–phenotype datasets. The efficiency of this tool is estimated to analyze 1000 phenotypes on 10 6 SNPs in 10 4 individuals in a duration of half-an-hour.

Myrna the cloud-based pipeline, provides information on the expression level differences of genes, including read alignments, data normalization, and statistical modeling.

The past few years have witnessed a tremendous increase in disease specific datasets from omics platforms. For example, the ArrayExpress Archive of Functional Genomics data repository contains information from approximately 30,000 experiments and more than one million functional assays. The growing amount of data demands for better and efficient bioinformatics driven packages to analyze and interpret the information obtained. This has also led to the birth of specific tools to analyze such massive amounts of data. Below, we mention some of the most popular commercial platforms for big data analytics.

Commercial platforms for healthcare data analytics

In order to tackle big data challenges and perform smoother analytics, various companies have implemented AI to analyze published results, textual data, and image data to obtain meaningful outcomes. IBM Corporation is one of the biggest and experienced players in this sector to provide healthcare analytics services commercially. IBM’s Watson Health is an AI platform to share and analyze health data among hospitals, providers and researchers. Similarly, Flatiron Health provides technology-oriented services in healthcare analytics specially focused in cancer research. Other big companies such as Oracle Corporation and Google Inc. are also focusing to develop cloud-based storage and distributed computing power platforms. Interestingly, in the recent few years, several companies and start-ups have also emerged to provide health care-based analytics and solutions. Some of the vendors in healthcare sector are provided in Table  2 . Below we discuss a few of these commercial solutions.

Ayasdi is one such big vendor which focuses on ML based methodologies to primarily provide machine intelligence platform along with an application framework with tried & tested enterprise scalability. It provides various applications for healthcare analytics, for example, to understand and manage clinical variation, and to transform clinical care costs. It is also capable of analyzing and managing how hospitals are organized, conversation between doctors, risk-oriented decisions by doctors for treatment, and the care they deliver to patients. It also provides an application for the assessment and management of population health, a proactive strategy that goes beyond traditional risk analysis methodologies. It uses ML intelligence for predicting future risk trajectories, identifying risk drivers, and providing solutions for best outcomes. A strategic illustration of the company’s methodology for analytics is provided in Fig.  4 .

figure 4

Illustration of application of “Intelligent Application Suite” provided by AYASDI for various analyses such as clinical variation, population health, and risk management in healthcare sector

Linguamatics

It is an NLP based algorithm that relies on an interactive text mining algorithm (I2E). I2E can extract and analyze a wide array of information. Results obtained using this technique are tenfold faster than other tools and does not require expert knowledge for data interpretation. This approach can provide information on genetic relationships and facts from unstructured data. Classical, ML requires well-curated data as input to generate clean and filtered results. However, NLP when integrated in EHR or clinical records per se facilitates the extraction of clean and structured information that often remains hidden in unstructured input data (Fig.  5 ).

figure 5

Schematic representation for the working principle of NLP-based AI system used in massive data retention and analysis in Linguamatics

This is one of the unique ideas of the tech-giant IBM that targets big data analytics in almost every professional sector. This platform utilizes ML and AI based algorithms extensively to extract the maximum information from minimal input. IBM Watson enforces the regimen of integrating a wide array of healthcare domains to provide meaningful and structured data (Fig.  6 ). In an attempt to uncover novel drug targets specifically in cancer disease model, IBM Watson and Pfizer have formed a productive collaboration to accelerate the discovery of novel immune-oncology combinations. Combining Watson’s deep learning modules integrated with AI technologies allows the researchers to interpret complex genomic data sets. IBM Watson has been used to predict specific types of cancer based on the gene expression profiles obtained from various large data sets providing signs of multiple druggable targets. IBM Watson is also used in drug discovery programs by integrating curated literature and forming network maps to provide a detailed overview of the molecular landscape in a specific disease model.

figure 6

IBM Watson in healthcare data analytics. Schematic representation of the various functional modules in IBM Watson’s big-data healthcare package. For instance, the drug discovery domain involves network of highly coordinated data acquisition and analysis within the spectrum of curating database to building meaningful pathways towards elucidating novel druggable targets

In order to analyze the diversified medical data, healthcare domain, describes analytics in four categories: descriptive, diagnostic, predictive, and prescriptive analytics. Descriptive analytics refers for describing the current medical situations and commenting on that whereas diagnostic analysis explains reasons and factors behind occurrence of certain events, for example, choosing treatment option for a patient based on clustering and decision trees. Predictive analytics focuses on predictive ability of the future outcomes by determining trends and probabilities. These methods are mainly built up of machine leaning techniques and are helpful in the context of understanding complications that a patient can develop. Prescriptive analytics is to perform analysis to propose an action towards optimal decision making. For example, decision of avoiding a given treatment to the patient based on observed side effects and predicted complications. In order to improve performance of the current medical systems integration of big data into healthcare analytics can be a major factor; however, sophisticated strategies  need to be developed. An architecture of best practices of different analytics in healthcare domain is required for integrating big data technologies to improve the outcomes. However, there are many challenges associated with the implementation of such strategies.

Challenges associated with healthcare big data

Methods for big data management and analysis are being continuously developed especially for real-time data streaming, capture, aggregation, analytics (using ML and predictive), and visualization solutions that can help integrate a better utilization of EMRs with the healthcare. For example, the EHR adoption rate of federally tested and certified EHR programs in the healthcare sector in the U.S.A. is nearly complete [ 7 ]. However, the availability of hundreds of EHR products certified by the government, each with different clinical terminologies, technical specifications, and functional capabilities has led to difficulties in the interoperability and sharing of data. Nonetheless, we can safely say that the healthcare industry has entered into a ‘post-EMR’ deployment phase. Now, the main objective is to gain actionable insights from these vast amounts of data collected as EMRs. Here, we discuss some of these challenges in brief.

Storing large volume of data is one of the primary challenges, but many organizations are comfortable with data storage on their own premises. It has several advantages like control over security, access, and up-time. However, an on-site server network can be expensive to scale and difficult to maintain. It appears that with decreasing costs and increasing reliability, the cloud-based storage using IT infrastructure is a better option which most of the healthcare organizations have opted for. Organizations must choose cloud-partners that understand the importance of healthcare-specific compliance and security issues. Additionally, cloud storage offers lower up-front costs, nimble disaster recovery, and easier expansion. Organizations can also have a hybrid approach to their data storage programs, which may be the most flexible and workable approach for providers with varying data access and storage needs.

The data needs to cleansed or scrubbed to ensure the accuracy, correctness, consistency, relevancy, and purity after acquisition. This cleaning process can be manual or automatized using logic rules to ensure high levels of accuracy and integrity. More sophisticated and precise tools use machine-learning techniques to reduce time and expenses and to stop foul data from derailing big data projects.

Unified format

Patients produce a huge volume of data that is not easy to capture with traditional EHR format, as it is knotty and not easily manageable. It is too difficult to handle big data especially when it comes without a perfect data organization to the healthcare providers. A need to codify all the clinically relevant information surfaced for the purpose of claims, billing purposes, and clinical analytics. Therefore, medical coding systems like Current Procedural Terminology (CPT) and International Classification of Diseases (ICD) code sets were developed to represent the core clinical concepts. However, these code sets have their own limitations.

Some studies have observed that the reporting of patient data into EMRs or EHRs is not entirely accurate yet [ 26 , 27 , 28 , 29 ], probably because of poor EHR utility, complex workflows, and a broken understanding of why big data is all-important to capture well. All these factors can contribute to the quality issues for big data all along its lifecycle. The EHRs intend to improve the quality and communication of data in clinical workflows though reports indicate discrepancies in these contexts. The documentation quality might improve by using self-report questionnaires from patients for their symptoms.

Image pre-processing

Studies have observed various physical factors that can lead to altered data quality and misinterpretations from existing medical records [ 30 ]. Medical images often suffer technical barriers that involve multiple types of noise and artifacts. Improper handling of medical images can also cause tampering of images for instance might lead to delineation of anatomical structures such as veins which is non-correlative with real case scenario. Reduction of noise, clearing artifacts, adjusting contrast of acquired images and image quality adjustment post mishandling are some of the measures that can be implemented to benefit the purpose.

There have been many security breaches, hackings, phishing attacks, and ransomware episodes that data security is a priority for healthcare organizations. After noticing an array of vulnerabilities, a list of technical safeguards was developed for the protected health information (PHI). These rules, termed as HIPAA Security Rules, help guide organizations with storing, transmission, authentication protocols, and controls over access, integrity, and auditing. Common security measures like using up-to-date anti-virus software, firewalls, encrypting sensitive data, and multi-factor authentication can save a lot of trouble.

To have a successful data governance plan, it would be mandatory to have complete, accurate, and up-to-date metadata regarding all the stored data. The metadata would be composed of information like time of creation, purpose and person responsible for the data, previous usage (by who, why, how, and when) for researchers and data analysts. This would allow analysts to replicate previous queries and help later scientific studies and accurate benchmarking. This increases the usefulness of data and prevents creation of “data dumpsters” of low or no use.

Metadata would make it easier for organizations to query their data and get some answers. However, in absence of proper interoperability between datasets the query tools may not access an entire repository of data. Also, different components of a dataset should be well interconnected or linked and easily accessible otherwise a complete portrait of an individual patient’s health may not be generated. Medical coding systems like ICD-10, SNOMED-CT, or LOINC must be implemented to reduce free-form concepts into a shared ontology. If the accuracy, completeness, and standardization of the data are not in question, then Structured Query Language (SQL) can be used to query large datasets and relational databases.

Visualization

A clean and engaging visualization of data with charts, heat maps, and histograms to illustrate contrasting figures and correct labeling of information to reduce potential confusion, can make it much easier for us to absorb information and use it appropriately. Other examples include bar charts, pie charts, and scatterplots with their own specific ways to convey the data.

Data sharing

Patients may or may not receive their care at multiple locations. In the former case, sharing data with other healthcare organizations would be essential. During such sharing, if the data is not interoperable then data movement between disparate organizations could be severely curtailed. This could be due to technical and organizational barriers. This may leave clinicians without key information for making decisions regarding follow-ups and treatment strategies for patients. Solutions like Fast Healthcare Interoperability Resource (FHIR) and public APIs, CommonWell (a not-for-profit trade association) and Carequality (a consensus-built, common interoperability framework) are making data interoperability and sharing easy and secure. The biggest roadblock for data sharing is the treatment of data as a commodity that can provide a competitive advantage. Therefore, sometimes both providers and vendors intentionally interfere with the flow of information to block the information flow between different EHR systems [ 31 ].

The healthcare providers will need to overcome every challenge on this list and more to develop a big data exchange ecosystem that provides trustworthy, timely, and meaningful information by connecting all members of the care continuum. Time, commitment, funding, and communication would be required before these challenges are overcome.

Big data analytics for cutting costs

To develop a healthcare system based on big data that can exchange big data and provides us with trustworthy, timely, and meaningful information, we need to overcome every challenge mentioned above. Overcoming these challenges would require investment in terms of time, funding, and commitment. However, like other technological advances, the success of these ambitious steps would apparently ease the present burdens on healthcare especially in terms of costs. It is believed that the implementation of big data analytics by healthcare organizations might lead to a saving of over 25% in annual costs in the coming years. Better diagnosis and disease predictions by big data analytics can enable cost reduction by decreasing the hospital readmission rate. The healthcare firms do not understand the variables responsible for readmissions well enough. It would be easier for healthcare organizations to improve their protocols for dealing with patients and prevent readmission by determining these relationships well. Big data analytics can also help in optimizing staffing, forecasting operating room demands, streamlining patient care, and improving the pharmaceutical supply chain. All of these factors will lead to an ultimate reduction in the healthcare costs by the organizations.

Quantum mechanics and big data analysis

Big data sets can be staggering in size. Therefore, its analysis remains daunting even with the most powerful modern computers. For most of the analysis, the bottleneck lies in the computer’s ability to access its memory and not in the processor [ 32 , 33 ]. The capacity, bandwidth or latency requirements of memory hierarchy outweigh the computational requirements so much that supercomputers are increasingly used for big data analysis [ 34 , 35 ]. An additional solution is the application of quantum approach for big data analysis.

Quantum computing and its advantages

The common digital computing uses binary digits to code for the data whereas quantum computation uses quantum bits or qubits [ 36 ]. A qubit is a quantum version of the classical binary bits that can represent a zero, a one, or any linear combination of states (called superpositions ) of those two qubit states [ 37 ]. Therefore, qubits allow computer bits to operate in three states compared to two states in the classical computation. This allows quantum computers to work thousands of times faster than regular computers. For example, a conventional analysis of a dataset with n points would require 2 n processing units whereas it would require just n quantum bits using a quantum computer. Quantum computers use quantum mechanical phenomena like superposition and quantum entanglement to perform computations [ 38 , 39 ].

Quantum algorithms can speed-up the big data analysis exponentially [ 40 ]. Some complex problems, believed to be unsolvable using conventional computing, can be solved by quantum approaches. For example, the current encryption techniques such as RSA, public-key (PK) and Data Encryption Standard (DES) which are thought to be impassable now would be irrelevant in future because quantum computers will quickly get through them [ 41 ]. Quantum approaches can dramatically reduce the information required for big data analysis. For example, quantum theory can maximize the distinguishability between a multilayer network using a minimum number of layers [ 42 ]. In addition, quantum approaches require a relatively small dataset to obtain a maximally sensitive data analysis compared to the conventional (machine-learning) techniques. Therefore, quantum approaches can drastically reduce the amount of computational power required to analyze big data. Even though, quantum computing is still in its infancy and presents many open challenges, it is being implemented for healthcare data.

Applications in big data analysis

Quantum computing is picking up and seems to be a potential solution for big data analysis. For example, identification of rare events, such as the production of Higgs bosons at the Large Hadron Collider (LHC) can now be performed using quantum approaches [ 43 ]. At LHC, huge amounts of collision data (1PB/s) is generated that needs to be filtered and analyzed. One such approach, the quantum annealing for ML (QAML) that implements a combination of ML and quantum computing with a programmable quantum annealer, helps reduce human intervention and increase the accuracy of assessing particle-collision data. In another example, the quantum support vector machine was implemented for both training and classification stages to classify new data [ 44 ]. Such quantum approaches could find applications in many areas of science [ 43 ]. Indeed, recurrent quantum neural network (RQNN) was implemented to increase signal separability in electroencephalogram (EEG) signals [ 45 ]. Similarly, quantum annealing was applied to intensity modulated radiotherapy (IMRT) beamlet intensity optimization [ 46 ]. Similarly, there exist more applications of quantum approaches regarding healthcare e.g. quantum sensors and quantum microscopes [ 47 ].

Conclusions and future prospects

Nowadays, various biomedical and healthcare tools such as genomics, mobile biometric sensors, and smartphone apps generate a big amount of data. Therefore, it is mandatory for us to know about and assess that can be achieved using this data. For example, the analysis of such data can provide further insights in terms of procedural, technical, medical and other types of improvements in healthcare. After a review of these healthcare procedures, it appears that the full potential of patient-specific medical specialty or personalized medicine is under way. The collective big data analysis of EHRs, EMRs and other medical data is continuously helping build a better prognostic framework. The companies providing service for healthcare analytics and clinical transformation are indeed contributing towards better and effective outcome. Common goals of these companies include reducing cost of analytics, developing effective Clinical Decision Support (CDS) systems, providing platforms for better treatment strategies, and identifying and preventing fraud associated with big data. Though, almost all of them face challenges on federal issues like how private data is handled, shared and kept safe. The combined pool of data from healthcare organizations and biomedical researchers have resulted in a better outlook, determination, and treatment of various diseases. This has also helped in building a better and healthier personalized healthcare framework. Modern healthcare fraternity has realized the potential of big data and therefore, have implemented big data analytics in healthcare and clinical practices. Supercomputers to quantum computers are helping in extracting meaningful information from big data in dramatically reduced time periods. With high hopes of extracting new and actionable knowledge that can improve the present status of healthcare services, researchers are plunging into biomedical big data despite the infrastructure challenges. Clinical trials, analysis of pharmacy and insurance claims together, discovery of biomarkers is a part of a novel and creative way to analyze healthcare big data.

Big data analytics leverage the gap within structured and unstructured data sources. The shift to an integrated data environment is a well-known hurdle to overcome. Interesting enough, the principle of big data heavily relies on the idea of the more the information, the more insights one can gain from this information and can make predictions for future events. It is rightfully projected by various reliable consulting firms and health care companies that the big data healthcare market is poised to grow at an exponential rate. However, in a short span we have witnessed a spectrum of analytics currently in use that have shown significant impacts on the decision making and performance of healthcare industry. The exponential growth of medical data from various domains has forced computational experts to design innovative strategies to analyze and interpret such enormous amount of data within a given timeframe. The integration of computational systems for signal processing from both research and practicing medical professionals has witnessed growth. Thus, developing a detailed model of a human body by combining physiological data and “-omics” techniques can be the next big target. This unique idea can enhance our knowledge of disease conditions and possibly help in the development of novel diagnostic tools. The continuous rise in available genomic data including inherent hidden errors from experiment and analytical practices need further attention. However, there are opportunities in each step of this extensive process to introduce systemic improvements within the healthcare research.

High volume of medical data collected across heterogeneous platforms has put a challenge to data scientists for careful integration and implementation. It is therefore suggested that revolution in healthcare is further needed to group together bioinformatics, health informatics and analytics to promote personalized and more effective treatments. Furthermore, new strategies and technologies should be developed to understand the nature (structured, semi-structured, unstructured), complexity (dimensions and attributes) and volume of the data to derive meaningful information. The greatest asset of big data lies in its limitless possibilities. The birth and integration of big data within the past few years has brought substantial advancements in the health care sector ranging from medical data management to drug discovery programs for complex human diseases including cancer and neurodegenerative disorders. To quote a simple example supporting the stated idea, since the late 2000′s the healthcare market has witnessed advancements in the EHR system in the context of data collection, management and usability. We believe that big data will add-on and bolster the existing pipeline of healthcare advances instead of replacing skilled manpower, subject knowledge experts and intellectuals, a notion argued by many. One can clearly see the transitions of health care market from a wider volume base to personalized or individual specific domain. Therefore, it is essential for technologists and professionals to understand this evolving situation. In the coming year it can be projected that big data analytics will march towards a predictive system. This would mean prediction of futuristic outcomes in an individual’s health state based on current or existing data (such as EHR-based and Omics-based). Similarly, it can also be presumed that structured information obtained from a certain geography might lead to generation of population health information. Taken together, big data will facilitate healthcare by introducing prediction of epidemics (in relation to population health), providing early warnings of disease conditions, and helping in the discovery of novel biomarkers and intelligent therapeutic intervention strategies for an improved quality of life.

Availability of data and materials

Not applicable.

Laney D. 3D data management: controlling data volume, velocity, and variety, Application delivery strategies. Stamford: META Group Inc; 2001.

Google Scholar  

Mauro AD, Greco M, Grimaldi M. A formal definition of big data based on its essential features. Libr Rev. 2016;65(3):122–35.

Article   Google Scholar  

Gubbi J, et al. Internet of Things (IoT): a vision, architectural elements, and future directions. Future Gener Comput Syst. 2013;29(7):1645–60.

Doyle-Lindrud S. The evolution of the electronic health record. Clin J Oncol Nurs. 2015;19(2):153–4.

Gillum RF. From papyrus to the electronic tablet: a brief history of the clinical medical record with lessons for the digital Age. Am J Med. 2013;126(10):853–7.

Reiser SJ. The clinical record in medicine part 1: learning from cases*. Ann Intern Med. 1991;114(10):902–7.

Reisman M. EHRs: the challenge of making electronic data usable and interoperable. Pharm Ther. 2017;42(9):572–5.

Murphy G, Hanken MA, Waters K. Electronic health records: changing the vision. Philadelphia: Saunders W B Co; 1999. p. 627.

Shameer K, et al. Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams. Brief Bioinform. 2017;18(1):105–24.

Service, R.F. The race for the $1000 genome. Science. 2006;311(5767):1544–6.

Stephens ZD, et al. Big data: astronomical or genomical? PLoS Biol. 2015;13(7):e1002195.

Yin Y, et al. The internet of things in healthcare: an overview. J Ind Inf Integr. 2016;1:3–13.

Moore SK. Unhooking medicine [wireless networking]. IEEE Spectr 2001; 38(1): 107–8, 110.

MathSciNet   Google Scholar  

Nasi G, Cucciniello M, Guerrazzi C. The role of mobile technologies in health care processes: the case of cancer supportive care. J Med Internet Res. 2015;17(2):e26.

Apple, ResearchKit/ResearchKit: ResearchKit 1.5.3. 2017.

Shvachko K, et al. The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). New York: IEEE Computer Society; 2010. p. 1–10.

Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–13.

Zaharia M, et al. Apache Spark: a unified engine for big data processing. Commun ACM. 2016;59(11):56–65.

Gopalani S, Arora R. Comparing Apache Spark and Map Reduce with performance analysis using K-means; 2015.

Ahmed H, et al. Performance comparison of spark clusters configured conventionally and a cloud servicE. Procedia Comput Sci. 2016;82:99–106.

Saouabi M, Ezzati A. A comparative between hadoop mapreduce and apache Spark on HDFS. In: Proceedings of the 1st international conference on internet of things and machine learning. Liverpool: ACM; 2017. p. 1–4.

Strickland NH. PACS (picture archiving and communication systems): filmless radiology. Arch Dis Child. 2000;83(1):82–6.

Article   MathSciNet   Google Scholar  

Schroeder W, Martin K, Lorensen B. The visualization toolkit. 4th ed. Clifton Park: Kitware; 2006.

Friston K, et al. Statistical parametric mapping. London: Academic Press; 2007. p. vii.

Li L, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med. 2015;7(311):311ra174.

Valikodath NG, et al. Agreement of ocular symptom reporting between patient-reported outcomes and medical records. JAMA Ophthalmol. 2017;135(3):225–31.

Fromme EK, et al. How accurate is clinician reporting of chemotherapy adverse effects? A comparison with patient-reported symptoms from the Quality-of-Life Questionnaire C30. J Clin Oncol. 2004;22(17):3485–90.

Beckles GL, et al. Agreement between self-reports and medical records was only fair in a cross-sectional study of performance of annual eye examinations among adults with diabetes in managed care. Med Care. 2007;45(9):876–83.

Echaiz JF, et al. Low correlation between self-report and medical record documentation of urinary tract infection symptoms. Am J Infect Control. 2015;43(9):983–6.

Belle A, et al. Big data analytics in healthcare. Biomed Res Int. 2015;2015:370194.

Adler-Milstein J, Pfeifer E. Information blocking: is it occurring and what policy strategies can address it? Milbank Q. 2017;95(1):117–35.

Or-Bach, Z. A 1,000x improvement in computer systems by bridging the processor-memory gap. In: 2017 IEEE SOI-3D-subthreshold microelectronics technology unified conference (S3S). 2017.

Mahapatra NR, Venkatrao B. The processor-memory bottleneck: problems and solutions. XRDS. 1999;5(3es):2.

Voronin AA, Panchenko VY, Zheltikov AM. Supercomputations and big-data analysis in strong-field ultrafast optical physics: filamentation of high-peak-power ultrashort laser pulses. Laser Phys Lett. 2016;13(6):065403.

Dollas, A. Big data processing with FPGA supercomputers: opportunities and challenges. In: 2014 IEEE computer society annual symposium on VLSI; 2014.

Saffman M. Quantum computing with atomic qubits and Rydberg interactions: progress and challenges. J Phys B: At Mol Opt Phys. 2016;49(20):202001.

Nielsen MA, Chuang IL. Quantum computation and quantum information. 10th anniversary ed. Cambridge: Cambridge University Press; 2011. p. 708.

Raychev N. Quantum computing models for algebraic applications. Int J Scientific Eng Res. 2015;6(8):1281–8.

Harrow A. Why now is the right time to study quantum computing. XRDS. 2012;18(3):32–7.

Lloyd S, Garnerone S, Zanardi P. Quantum algorithms for topological and geometric analysis of data. Nat Commun. 2016;7:10138.

Buchanan W, Woodward A. Will quantum computers be the end of public key encryption? J Cyber Secur Technol. 2017;1(1):1–22.

De Domenico M, et al. Structural reducibility of multilayer networks. Nat Commun. 2015;6:6864.

Mott A, et al. Solving a Higgs optimization problem with quantum annealing for machine learning. Nature. 2017;550:375.

Rebentrost P, Mohseni M, Lloyd S. Quantum support vector machine for big data classification. Phys Rev Lett. 2014;113(13):130503.

Gandhi V, et al. Quantum neural network-based EEG filtering for a brain-computer interface. IEEE Trans Neural Netw Learn Syst. 2014;25(2):278–88.

Nazareth DP, Spaans JD. First application of quantum annealing to IMRT beamlet intensity optimization. Phys Med Biol. 2015;60(10):4137–48.

Reardon S. Quantum microscope offers MRI for molecules. Nature. 2017;543(7644):162.

Download references

Acknowledgements

Author information.

Sabyasachi Dash and Sushil Kumar Shakyawar contributed equally to this work

Authors and Affiliations

Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, 10065, NY, USA

Sabyasachi Dash

Center of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal

Sushil Kumar Shakyawar

SilicoLife Lda, Rua do Canastreiro 15, 4715-387, Braga, Portugal

Postgraduate School for Molecular Medicine, Warszawskiego Uniwersytetu Medycznego, Warsaw, Poland

Mohit Sharma

Małopolska Centre for Biotechnology, Jagiellonian University, Kraków, Poland

3B’s Research Group, Headquarters of the European Institute of Excellence on Tissue Engineering and Regenerative Medicine, AvePark - Parque de Ciência e Tecnologia, Zona Industrial da Gandra, Barco, 4805-017, Guimarães, Portugal

Sandeep Kaushik

You can also search for this author in PubMed   Google Scholar

Contributions

MS wrote the manuscript. SD and SKS further added significant discussion that highly improved the quality of manuscript. SK designed the content sequence, guided SD, SS and MS in writing and revising the manuscript and checked the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sandeep Kaushik .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Dash, S., Shakyawar, S.K., Sharma, M. et al. Big data in healthcare: management, analysis and future prospects. J Big Data 6 , 54 (2019). https://doi.org/10.1186/s40537-019-0217-0

Download citation

Received : 17 January 2019

Accepted : 06 June 2019

Published : 19 June 2019

DOI : https://doi.org/10.1186/s40537-019-0217-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Biomedical research
  • Big data analytics
  • Internet of things
  • Personalized medicine
  • Quantum computing

big data analytics in healthcare research paper pdf

  • Open access
  • Published: 22 June 2022

How can big data analytics be used for healthcare organization management? Literary framework and future research from a systematic review

  • Nicola Cozzoli 1 ,
  • Fiorella Pia Salvatore   ORCID: orcid.org/0000-0001-6294-3360 1 ,
  • Nicola Faccilongo 1 &
  • Michele Milone 1  

BMC Health Services Research volume  22 , Article number:  809 ( 2022 ) Cite this article

18k Accesses

20 Citations

17 Altmetric

Metrics details

Multiple attempts aimed at highlighting the relationship between big data analytics and benefits for healthcare organizations have been raised in the literature. The big data impact on health organization management is still not clear due to the relationship’s multi-disciplinary nature. This study aims to answer three research questions: a) What is the state of art of big data analytics adopted by healthcare organizations? b) What about the benefits for both health managers and healthcare organizations? c) What about future directions on big data analytics research in healthcare?

Through a systematic literature review the impact of big data analytics on healthcare management has been examined. The study aims to map extant literature and present a framework for future scholars to further build on, and executives to be guided by.

The positive relationship between big data analytics and healthcare organization management has emerged. To find out common elements in the studies reviewed, 16 studies have been selected and clustered into 4 research areas: 1) Potentialities of big data analytics. 2) Resource management. 3) Big data analytics and management of health surveillance systems. 4) Big data analytics and technology for healthcare organization.

Conclusions

In conclusion is identified how the big data analytics solutions are considered a milestone for managerial studies applied to healthcare organizations, although scientific research needs to investigate standardization and integration of the devices as well as the protocol in data analysis to improve the performance of the healthcare organization.

Peer Review reports

Big data is transforming and will transform the healthcare organizations in the near future [ 1 , 2 ]. Scientific literature in the managerial context applied to healthcare organizations, consider the Big Data Analytics (BDA) a fundamental tool, so much so that it has attracted the attention of the scientific community and stakeholders [ 3 ]. However, a premise should be made: data by themselves explain little, thus, to be useful in the healthcare organization management, firstly it is necessary to validate their quality, and secondly, find the right correlations. In other words, the data should be processed, analyzed, and interpreted with the appropriate tools [ 4 , 5 ].

Technological applications in healthcare BDA-related are rapidly increasing [ 6 ] and will increasingly characterize managers’ decision-making process. For example, IBM’s Watson project [ 7 ] is a "super-computer" that has scoured through several million scientific articles over the last twenty years and uses artificial intelligence tools (e.g., Machine Learning) to correlate disease symptoms and predict possible diagnostic scenarios. This case helps to understand how and to what extent BDA could really support healthcare managers to improve their decision processes, while increasing the performance of the healthcare organization.

Nowadays, the amount of data is no longer an issue. Internet traffic reports from Cisco and other network operators have estimated the entire digital universe to be 44 zettabytes and 463 exabytes will be the daily information could be generated by 2025. A new era took place in which the processes of production and management of human knowledge will no longer be the exclusive preserve of humans; machines will also play their part as knowledge producers [ 8 ]. From pharmaceutical companies to healthcare organizations, this enormous potential of data products, combined with IoT applications and AI tools [ 9 , 10 , 11 ], will play a significant role in the near future. Today, the medical applications based on IoT allow the monitoring of clinical data through the production of data generated by special devices (e.g., wearable devices) [ 12 ], remotely accessible by a physician rather than by caregivers [ 13 ].

The market size is a useful indicator of how much the healthcare organizations are turning their attention to new management models based on the use of big data. By 2025, the big data market in healthcare will touch $70 billion with a record 568% growth in 10 years. The use of such a tool not only represents a complex challenge [ 14 ], but also opens opportunities for all those involved in the healthcare supply chain who manage decision-making processes. Moreover, if on the one hand this technology will influence the definition of new managerial strategies within healthcare organizations, on the other hand, it will have positive repercussions on the effectiveness and efficiency of healthcare processes [ 15 ]. Indeed, the big data technology is used by healthcare managers to get, for example, information related to the list of doctors and nurses, the list of drugs with their expiration date, etc., in order to have tools for facilitating decision-making processes, improving the quality of services provided, and, at the same time, rationalizing the use of resources, by facilitating the management of the healthcare organization as a whole.

The BDA satisfies multiple needs that, on the one hand, influence the quality of the healthcare organization’s performance and, on the other hand, are useful in directing management strategies to improve the supply of healthcare services. Below there are some strategies, which aim to:

Provide specific services to patients, from diagnostics to preventive medicine passing through therapeutic adherence.

Detect the onset and spread of diseases in advance.

Observe parameters inherent to hospital quality standards, promoting control and prevention actions.

Modify treatment techniques.

Facilitate research and development in pharmacology, reducing the time to market of drugs.

Facilitate research and development of new and specific medical devices.

The main aim of this research is, therefore, to provide both an integrative framework on the state of art, and perspectives on how the BDA can be useful for the management of the healthcare organization. Considering the results, food-for-thought on how this technological and cultural revolution will affect the modus operandi of healthcare organizations will be launched.

Through an overview of recent scientific studies, this research aims to raise awareness among both practitioners and managers about BDA tools applied to healthcare management to address more effectively and efficiently the challenges imposed by an increasing demand for healthcare services.

In this regard, the study provides a systematic literature review (SLR) to explore the effect of BDA on the healthcare management by analyzing articles from the Scopus database during a period of 5 years (2016 – 2021).

Furthermore, the result through a content analysis, aspires to be a privileged starting point to find out potential barriers and opportunities provided by BDA-based management systems for smarter healthcare organization. Specifically, the study answers different research questions (RQs) as different levels of analysis have been performed. By analyzing the relationship between BDA-based management systems and the benefits delivered to the organizations, the research could not be conducted without exploring the state of art of BDA tools deployed in the field of healthcare. Thus, starting from this background the discussion on the future perspectives on BDA development in the healthcare organizations appears as a need.

Theoretical framework

Why use BDA and how to exploit its potential for healthcare organization management? This is the main question asked by managers and decision makers working in the healthcare sector. In recent years there have been multiple attempts in the literature aimed at highlighting the relationship between implementation of BDA and benefits for healthcare organizations, in terms of both resource efficiency and process management.

In 2017, a study by Wang and Hajli [ 16 ] has proposed a model founded on Resource-Based Theory and BDA Capabilities (BDAC) to explain the relationship between BDA, benefits, and value creation for healthcare organizations. As stated by Srinivasan and Swink [ 17 ], BDAC refers to “ organizational facility with tools, techniques, and processes that enable a firm to process, organize, visualize, and analyze data, thereby producing insights that enable data-driven operational planning, decision-making, and execution ”. In the healthcare organization, BDAC represents the ability to collect, store, analyze, and process huge volume variety, and velocity of health data come from various sources to improve data-driven decisions [ 18 , 19 ]. Indeed, the study of Wang and Hajli [ 16 ], validated on an empirical basis by 109 cases of BDA tools implementation in 63 healthcare organizations, has demonstrated how specific "path-to-value" can be identified. By varying degrees of relevance of the identified pathways, it has been shown that alongside the challenges of implementing certain BDA tools, there are corresponding specific benefits for healthcare organizations. Preliminarily, the study has defined the ability to analyze big data through the concept of Information Lifecycle Management (ILM) [ 20 ]. In this perspective, the capabilities of the BDA in healthcare organizations are configured as the abilities to process health data from diverse sources and provide significant information to healthcare managers. Thorough BDA, managers can detect timely indicators and identify business strategies, which allow them to put in place perspective plans, efficient strategies, and programs to increase the performance of organizations.

Researchers have found that BDA capabilities primarily stem from the implementation of various tools and features. Specifically, in order of importance, BDA capabilities are firstly triggered by processing tools (e.g., OLAP, machine learning, NLP), followed by aggregation tools (e.g., data warehouse tools), and, secondly, by data visualization tools and capabilities (e.g., visual dashboards/systems, reporting systems/interfaces).

Among the potentials triggered by the implementation of BDA in the healthcare organization, the analytical one was the main capability, that is the ability to process clinical data characterized by immense volume, variety (from text to graph), and speed (from batch to streaming), using descriptive analysis techniques [ 21 , 22 ]. In this regard, it is important to note that BDA-based management systems are the only ones capable of analyzing semi-structured or unstructured data. This represents a crucial element for revealing correlation patterns that are difficult to determine with traditional management systems [ 23 ]. Furthermore, the launch of these systems in a healthcare organization ensures the ability to effectively manage outputs regarding care process and service in order to constantly improve the performance of the organization. In summary, the characteristics of BDA-based management systems implemented in a healthcare organization, are:

predictive analytics capability, i.e., the ability to explore data and identify useful correlations, patterns and trends, and extrapolate them to predict what is likely to occur in the future [ 24 , 25 ];

interoperability capability, i.e., the ability to integrate data and processes to support management, collaboration, and sharing across different healthcare departments, managers, and facilities [ 26 ], and finally,

traceability capability, i.e., the ability to integrate and track all patient history data from different IT facilities and different healthcare units.

In terms of expected benefits from the BDA implementation, the study of Wang and Hajli [ 16 ] has showed that the most important ones are obtained from improved operational activities, such as improved quality and accuracy of healthcare decisions, rapid processing of issues, and the ability to enable treatments proactively before patients’ conditions worsen. Next, in terms of relevance, they were the benefits related to IT infrastructure, such as standardization and reduced costs for redundant infrastructure and the ability to quickly transfer data between different IT systems. Substantially, they have delivered a useful business model that healthcare managers can draw on to evaluate the specific leverages they need to activate in relation to the implementation of the BDA-based management systems. In addition to highlighting the undoubted benefits, the authors clearly show how specific BDA tools can facilitate the decision-making processes of healthcare managers and make them faster and more effective.

In another study carried out to identify BDA benefits and supports, and to drive organizational strategies, Wang, Kung, and Byrd [ 19 ], through the analysis of 26 case studies related to the BDA applications in the healthcare organization, have identified five "capabilities" of BDA: analytic capability for care patterns, unstructured data analytical capability, decision support, predictive, and traceability capabilities [ 19 ]. The study is remarkably interesting because in addition to mapping precise benefits, it also recommends specific strategies considering the BDA implementation for healthcare organizations. These strategies are useful for achieving effective results by leveraging the potential of BDA.

The first successful strategy is to implement governance based on the use of big data, starting with a definition of objectives, procedures, and key performance indicators (KPIs). Once again, one of the discriminating factors for success in implementing such a strategy remains the integration of information systems and the standardization of data protocols that often come from heterogeneous sources already existing in healthcare organizations. The second strategy is related to developing a culture of data sharing. The third one considers the training of healthcare managers, who cannot ignore knowledge related to BDA, for example on the use of data mining and business intelligence tools. The fourth strategy is related to the storage of big data, often available in heterogeneous formats, and is identified in the transition from the more expensive traditional storage systems (NAS) to more efficient and effective systems such as cloud computing solutions. The last strategic driver involves pathways related to the implementation of predictive BDA models. The mastery of KPIs, interactive visualization and data aggregation tools such as dashboards and reports should be acquired instruments for healthcare managers and in general for healthcare organizations oriented to BDA driven process management strategies.

More recent studies focus attention on the management practices supply chain in healthcare. In the study performed by Yu et al. [ 27 ], the authors, interviewing senior executives in Chinese hospitals, show on both a theoretical and empirical basis, how BDAC positively impacts the three dimensions of hospital supply chain integration (SCI) (inter-functional integration, hospital-patient integration and hospital-supplier integration) and how SCI, in turn, contributes to improve the operational flexibility [ 27 ]. By “operational flexibility” in the healthcare organization, it is meant the ability of a ward to adapt its operating procedures in relation to unforeseen circumstances while meeting the needs of patients [ 28 , 29 ].

The scholars have delivered an important contribution in demonstrating the relationship between BDAC, SCI, and operational flexibility from multiple perspectives, by providing useful management guidance for healthcare executives and managers involved in the supply chain. By analyzing and processing medical and managerial data with advanced analytical techniques, Chinese healthcare organizations were able to facilitate decision-making process with timely and appropriate actions, for example, tracking people's movements during the lockdown caused by the Coronavirus, understanding ongoing health trends, and managing pharmaceutical supplies [ 30 , 31 ].

This theoretical framework provides a key to interpreting the benefits offered by good practices deriving from the use of the BDA in the healthcare organization.

At the same time, the rigorous scientific method allows the validation of empirical experiences in relation to clear theoretical references. In the next paragraph projects that demonstrate what is stated in the literature are shown.

Practical framework

N(ursing)  +  Care App is an mHealth application that supports the work of frontline health workers (FHW) in developing countries [ 32 ]. The system is designed to collect not only patient data, but also diagnostic images. It is also given the opportunity to add recommended doctors based on the advice of FHWs in case the patient needs to follow a specific hospital visit.

For healthcare managers, predicting the number of emergency department accesses is a critical issue which complicates the optimization of the human resource management. To this end, Intel, and Assistance Publique-Hôpitaux de Paris (AP-HP), the largest hospital university in Europe, leveraging datasets from multiple sources, worked together to build a cloud-based solution to predict the number of patient visits to emergency rooms and hospital admissions. This predictive analytics tool, will enable healthcare managers at AP-HP hospitals to know the number of emergency room visits and hospital admissions at 15 days in order to reduce wait times, optimize human resource (HR) levels based on anticipated needs, accurately plan patient loads, including by pathology, and overall improve the quality and efficiency of services provided by the healthcare organization [ 33 ].

Chronic conditions, if not kept under control through a rigorous program of therapeutic adherence, can become a source of both more serious physical problems for patients and economic burdens for healthcare organizations. Another project that actively introduced BDA tools into healthcare management was carried out by the European Commission to launch production of the drug Enerzair Breezhaler . It was the first drug for the treatment of asthma co-packaged and co-prescribed with the Propeller digital platform. The app sends a reminder to comply with therapeutic adherence and maintains a record of the data, which the patient shares with him or her physician. Studies have demonstrated that the Propeller platform increases the degree of asthma control by up to 63%, therapeutic adherence by up to 58% [ 34 ], and reduces asthma emergency department visits and hospital admissions by up to 57% [ 35 ].

The practical framework described, aided by some empirical experience, only partially reveals the potential offered by BDA. The diffusion of BDA-based management systems in the healthcare organization will trigger a virtuous circle, allowing soon to accumulate increasingly accurate medical data. By exploiting the most advanced AI technologies, BDA will support predictive analysis, allow physicians to make more accurate and faster diagnostic pathways and managers to use results. It will help health practitioners in the decision-making process, optimize the use of resources with a consequent costs reduction and, overall, improve the quality of services provided by healthcare organizations.

The main aim of this study is to update the state of art about the BDA-based management systems adopted in the healthcare organization, underlining management advantages for both the organizations and managers. BDA has the potential to reduce the cost of care, prevent disease outbreaks, and improve the patients’ quality of life. Through its ability to process and cross-reference massive amounts of both management, and clinical information, BDA promises to be an effective support tool for both healthcare managers and patients.

To achieve this aim, a Systematic Literature Review (SLR) was performed. This method identifies, evaluates, and summarizes the updates that raise from the literature about the BDA tools used to improve both the healthcare organizations performance and patients’ quality of life. The method takes inspiration from the protocol used by Khanra S., et al. [ 36 ] which considers inclusion and exclusion criteria.

The present study aims to add a contribute to the literature by addressing three RQs:

What is the state of art of BDA adopted by healthcare organizations?

What about the benefits for both health managers and healthcare organization?

What about future directions on BDA research in healthcare?

To answer the RQs, as widespread electronic database Scopus has been selected. To obtain an international validity of studies, the research only considers papers in English. Utilizing the Boolean operator “AND”, the following keywords have been searched: “big data analytics” AND “healthcare” AND “management”. As inclusion criteria, only papers published from 2016 to 2021 have been considered. As subject areas, “medicine” and “business, management and accounting” have been selected. Instead, as exclusion criteria, article in press and the following documents type: “review”, “book”, “conference review”, “letter” and “note” have not been taken into account. Also, to avoid a dispersal of the study, conference proceedings have been excluded. Following the searching protocol, 34 results have been obtained (Fig.  1 ).

figure 1

Workflow of articles selection

An excel spreadsheet was used to perform the extraction procedures while the statistical analyses were carried out using the software STATA 16 ©. The list of the extracted papers investigated with the content analysis can be found in the Appendix.

The work proceeds through a descriptive analysis. After that, a content analysis has been performed to identify the most relevant characteristics of the BDA-based management systems, underlining the positive impact for the healthcare organizations, without neglecting to outline the trends for the future scenarios and research directions.

According to the SLR, the iterative process shown in the Fig.  1 , has allowed to delete the duplicates and match the results with the RQs.

As shown in Fig.  1 the initial search on Scopus database has delivered 227 results. By limiting research to papers published between 2016 and 2021, 11% of records have been removed. At the second stage, by selecting the subject areas, the screening has allowed to exclude 131 records; thus, the 57.7% of the results initially selected. The last step of the process has conducted to exclude document types such as Review, Book, Conference Review, Letter, and Note. In other words, 37 records were excluded, representing 16.3% of the sample. At the end of the screening process, 34 articles were selected, representing about 15% of the sample.

In the descriptive analysis the time distribution of the studies from 2016 to 2021 is included. It is important to note the increasing of publication trend from 2017 to 2019. This output confirms a growing interest in the research field of BDA applied to healthcare organizations (Fig.  2 ).

figure 2

Trend of research steams

The trend of research steams considers a sample of 34 scientific contributions as they come from the screening process above described. Although 6% of the total sample was collected in the years 2016 and 2017, it is only indicative of the growing trend of scientific studies on BDA in healthcare sector. The overall incidence in 2018 was 12% but the turning point was reached in 2019 as 32% of the studies collected in the sample were reached. This outcome could be read considering the Covid-19 pandemic outbreak which has been a representative testing ground for BDA tools by helping managers and decision-makers to plan healthcare managerial strategies.

In this context, the use of the BDA by Chinese healthcare organizations for tracking people's flow during the lockdown, represents an important case study that has registered the peak in the time flow of research. By looking at 2020 and 2021 data, which represent respectively 24% and 21% of the total scientific contributions, the growing trend seems to be confirmed by validating the rising interest in BDA research seen as a planning tool for healthcare processes.

The pie-chart shows the scientific production by country. It is necessary to specify that Scopus database clusters the studies by home country author’s organization, therefore the same study could be referred to more than one country and thus belong to more than one cluster.

The geographical locations of the studies showed in the Fig.  3 outlining India, UK, and USA as more than one third of the total scientific producers. It is well known that IT companies as Google, Apple, Amazon, and Microsoft are investing considerable resources on BDA tools for healthcare. China and India contribute together with 22% of the scientific articles. Big data technology has played a key role in virus tracking during the pandemic crisis. The "Internet Plus Healthcare", a big data center in Zhongwei (China), provides cloud services to both healthcare institutions and IT companies. In Yinchuan (China), an industrial park for big data acts as a catalyst for IT company involved in healthcare sector. India confirms to be one of the heavily adopter countries of artificial intelligence, big data analytics, and IoT technologies. Although India must face the challenge to provide basic healthcare services in a predominantly rural country, start-ups with BDA skills in healthcare are springing up.

figure 3

Geographical locations of the studies

It is also important underlining the performance of the European countries. UK, Greece, Italy, Spain, Germany, and Portugal support the research with almost 40% of the studies published, confirming that Europe will be a driving force for the BDA research in the next future. The development of a European Health Data Space (EHDS) is an ambitious project of the European Commission. It will lead member states to share an efficient infrastructure for both exchange and management health data by providing citizens with equal treatment, free access to clinical data, and quality healthcare services.

In the area “Others” all the other countries contributing marginally to research have been included.

The next step of the study is focused on a content analysis to show the experiences of applying BDA in healthcare organizations.

Starting from the 34 articles selected for the descriptive analysis, to identify in detail the core issue of the study, a second screening was performed. 18 articles were excluded because weakly focused on the research objective which concerns specifically how BDA can be used for healthcare organization management. Thus, after an in-depth reading of abstracts and full papers, the scholars have identified 16 papers closer targeted on the mentioned research objective. The 16 studies selected through a content analysis were clustered into 4 research areas (RAs) as showed in the following table (Table 1 ). The clustering procedure identifies 4 relevant topics: Potentialities of BDA (RA1), Resource management (RA2), BDA and management of health surveillance system (RA3), BDA technology for healthcare organization (RA4). The proposed clustering has been though to give an easy-to-go research map and to support the healthcare managers.

RA1: potentialities of BDA

Wang and Hajli [ 16 ] define BDA potentialities in the healthcare context as “ the ability to acquire, store, process and analyze large amounts of health data in various forms, and deliver meaningful information to users, which allows them to discover business values and insights in a timely fashion ”. The relationship between BDA and the benefits for the healthcare organizations it has been well expressed by the theory of the “path to value chain” [ 16 ]. This path represents an important contribution to the exploration of business value, not only for drawing the generic and well-established connection between big data capabilities [ 19 ] and the benefits, but also for empirically showing how capabilities can be developed and what benefits can be achieved in the healthcare organizations. Another study included in this area, explores the key role of BDA capabilities in developing healthcare supply chain integrations and its impact on hospital flexibility [ 27 ]. Specifically, the BDA has a fundamental role in developing healthcare integration supply chain and the operational flexibility. Considering the health and economic crises caused by the Covid-19, this dimension of BDA has been an especially important leverage for managers to improve operational flexibility of the healthcare organizations. The ability to provide predictive models and real-time insights, is a powerful prospective of the BDA for helping healthcare professionals and managers in decision-making process. In this regard, the literature presents several applications of big data in healthcare that support the data collection, management, and integration of data in healthcare organizations [ 37 ]. Moreover, BDA enables the integration of massive datasets, supporting decisions of manager and monitoring the managerial aspects of healthcare organizations. Building a decision-making process based on BDA, firstly means identifying the big data keys that can implement ad-hoc strategies to improve efficiency along the healthcare value chain. To this end, the research carried out by Sousa et al., [ 37 ] underlines the benefits that BDA can give to the decision-making process, through predictive models and real-time analytics, assisting in the collection, management, and integration of data in healthcare organizations.

To date, thanks to an integrated and interconnected ecosystem, is becoming possible to provide personalized healthcare services, collect an enormous quantity of both clinical and biometrics data and, thus, implement BDA instruments. Nevertheless, to take a real advantage from these tools and turn them into useful decision support systems (DSS), is necessary for R&D to be focused on data filtering mechanisms in order to obtain good-quality reliable information [ 38 ]. The healthcare models based on BDA and implementation of new healthcare programs, enable both medical and managerial decision support for the healthcare services provision. New types of interactions with and among users of the healthcare ecosystem will produce in the next future a wide variety of complex data, thus, the main challenges refer to information processing and analytics.

In light of the above, the RA1 includes studies for which the quality of data and the need for high performance filtering mechanisms are becoming keys factor for the success of BDA-based management systems in the healthcare organizations. For example, the study carried out by Maglaveras et al., [ 38 ], included in this area, explores new R&D pathways in biomedical information processing and management, as well as to the design of new intelligent decision support systems.

RA2: resource management

Another important research direction emerged from the literature review, concerns positive impact of the BDA on the resource management. Insufficient policy for managing medical materials waste, energy use and environmental burden, restricts the resources conservation. The BDA is extremely useful in this aspect; it could provide in the next future an important contribution to implement the circular economy processes and to support sustainable development initiatives in the healthcare organizations [ 39 ]. To this end, the study developed by Kazançoğlu et al. [ 39 ], underline the importance of circularity and sustainability concepts to mitigate the sector’s negative impacts on the environment. Furthermore, the study identifies the barriers related to circular economy in the healthcare organization and provides solutions to these barriers by implementing BDA-based management systems. Lastly, the authors, have developed a managerial, policy and theoretical framework to support healthcare managers to launch sustainable initiatives in the context of healthcare organization.

The impact on the performance has been also investigated by studies that have linked benefits of BDA and artificial intelligence with green supply chain integration process [ 40 ]. Digital learning is more becoming a “moderator” of the green supply chain process with a significant positive impact on environmental performance of the healthcare organization. BDA-AI technologies will lead to improvement of the environmental process integration and green supply chain collaboration and, consequently, will support the managers’ decisions involved in the supply processes. This study also provides an important reference framework for logistics/supply chain managers who want to implement BDA-AI technologies for supporting green supply processes and enhancing environmental performance of the healthcare organization [ 40 ].

Nowadays, many scholars are focusing on BDA-driven decision support systems to sustain the healthcare managers [ 41 ]. These types of BDA-based analytical tools will provide a useful quantitative support for managers of healthcare organizations. The authors have reported design and technical details of the system implementations using case studies. They have developed a toolkit which represents a framework reference for resources management, allowing to create strategic models and obtain analytical results for evidence-based decisions and managerial evaluations.

In this RA, two other important topics investigated by BDA are: high quality healthcare service, and healthcare costs. Optimize the supply chain activities is an imperative to keep lower the healthcare costs. The data generated by medical equipment and devices can be successfully used in forecasting, decision-making process, and to make more efficient the healthcare supply chain management [ 42 ]. The study carried out by Alotaibi et al. [ 42 ], thus, presents a review on the use of big data in healthcare organizations underling opportunities and challenges deriving from the application of BDA-based management systems within the organizations.

As already asserted, a good implementation of BDA in the healthcare organization will play a fundamental role in improving the clinical outcomes management, giving helpful insights for decision makers and managers, in order to avoiding diseases, reducing healthcare expenses, and improving the performance of the healthcare organization [ 43 ]. However, to achieve these ambitious outcomes the research will face a crucial challenge: how to rationalize, make easily usable, and at affordable costs, heterogeneous data coming from diverse sources. The research developed by Kundella and Gobinath [ 43 ] represents an important contribute to explore key challenges, techniques, technologies, privacy issues, security algorithms and future directions of the use of BDA in the healthcare organization.

RA3: BDA and management of health surveillance system

The rise of BDA promises to solve many healthcare challenges in the developing countries. The BDA applied to healthcare organization help managers to rationalize the resources, and health system to better delivery treatments to the patients [ 44 ]. In this regard, the government of Zambia is thinking to implement BDA solutions to provide more effective and efficient healthcare services. A well-managed health surveillance system represents an important driver to improve the quality of life and reduce the medical waste, especially in developing countries where the lack of resources is severe and limits economic development. For all these reasons, Europe is investing on BDA initiatives in public health and in the oncology sectors, to generate new knowledge, improve clinical care and make more efficient the management of the public health surveillance system [ 45 ]. The BDA capability for identifying specific population pattern, managing high volume of data and turn it into real (or near real) time insights, contributes to identify it as a powerful tool to support the managers for the decision-making processes. Despite this, implementing a BDA-based management systems within the healthcare organizations requires investment in the human capital, strong collaboration with stakeholders, and data integration with and among the healthcare units. To this end, Gunapal et al., [ 46 ] has highlighted that Singapore has setup a Regional Health System (RHS) database to facilitate BDA for proactive population health management (PHM) and health services research [ 46 ]. The structure of the healthcare database has been built collecting data from four database coming from three RHSs: National Healthcare Group (NHG), Tan Tock Seng Hospital (TTSH), National University Hospital (NUH) and Alexandra Hospital (AH). The result has been a database including information useful for the healthcare managers which incorporates data on patient demographics, chronic disease, and healthcare utilization information. These characteristics facilitate the identification of specific patients’ paths linked by past healthcare utilization and chronic disease information. Converging information into a single database helps to understand the cross-utilization of healthcare services across the three RHSs. A such approach allows to setup the RHSs structure for initiative-taking population health management (PHM) and to improve the performance of healthcare organizations [ 46 ].

RA 4: BDA technology for healthcare organization

The wearable devices and different kind of sensors, able to collect clinical data, in combination with BDA, will constitute the basis of personalized medicine and will be crucial tools to improve the performance of healthcare organizations [ 47 ]. The scientific research has to face the important challenge to adapt data acquisition, storage, transmission and analytics to healthcare demand. Thus, the healthcare data should be categorized, homogenized, and implemented into specific models by adapting machine-learning techniques to the nature of the healthcare organization.

A fruitful field of interest for the application of BDA in healthcare organization is the diagnostic imaging. To take out maximum benefits from it and to be useful for managers of healthcare organizations, it is necessary to implement digital platforms and applications [ 48 ]. Indeed, the simple production of a large amount of data does not automatically translate to an advantage for the healthcare performance. Specific applications are required to favor the correct and advantageous management of diagnostic images [ 48 ]. The link between BDA and IoT technologies, as instrument to incorporate the accessibility, capacity to customize, and practical conveyance of clinical data, emerged as another research direction investigated by the papers included in this RA. These tools allow: (1) the healthcare organizations to decrease expenses; (2) the people to self regulates treatments; (3) practitioners to take as quickly as possible decisions in remote way and keep constant contact with patients [ 49 ].

In light of these results, it is possible to state that IoT, big data, and artificial intelligence as machine-learning algorithms, are three of the most significative innovations in the healthcare organization. These types of organizations are implementing home-centric data collection networks and intelligent BDA systems based on machine learning technologies. For example, a high-level implementation of these systems has been efficiently implemented in Cartagena, Colombia, for hypertensive patients by using an e-Health sensor and Amazon Web Services components [ 50 ]. The authors stress the importance of using the combination of IoT, big data, and artificial intelligence as tools to obtain better health outcomes for the communities and improved performance for healthcare organization. The new generation of machine-learning algorithms can use standardized data sets generated by these sources to improve the effectiveness of public health interventions [ 50 ]. To this end, as pointed out by numerous studies in the field of BDA applied on healthcare organizations, it becomes crucial for the next future research to concentrate R&D efforts towards full standardized dataset protocols.

As highlighted by the results, in Europe, as well as in the rest of the world, a significant trend is emerging among healthcare organizations in adopting BDA-based management systems [ 45 ]. Among the clustering process performed, the common element in the studies reviewed is the positive relationship between BDA tools and achievable benefits for healthcare organizations.

As emerged by the RAs, some studies explore business value for healthcare organizations and the concept of potentialities of BDA (RA1) to explain the evidence of precise path-to-value chains leading to specific benefits [ 16 ]. These perspectives provide useful guidelines for healthcare managers who want to consider implementing BDA tools in their organizations. Some authors in particular focus on the role of BDA capabilities in the development of hospital supply chain integration and operational flexibility, demonstrating a positive relationship between the two dimensions [ 27 ]. During the Covid-19 outbreak, it became clearer how important operational flexibility is to healthcare organizations. The scholars also underline how BDA can impact to the efficiency of the decision-making processes in healthcare organizations, through predictive models and real-time analytics, helping health professionals in the collection, management, and analysis [ 37 ].

In general, BDA-based management systems make personalized care programs possible. However, considering the enormous amount and heterogeneity of information available nowadays, it emerges the necessity to address R&D pathways towards data filtering mechanisms and engineering new intelligent decision support systems within the healthcare organizations [ 38 ].

Circular economy (CE) and sustainability concepts are becoming important key drivers in healthcare organizations to reduce negative impact on the environment (RA2). Some study directions look at BDA as tool to provide solution for barriers related to CE and support sustainable development initiatives in the healthcare organizations [ 39 ]. Empirical studies have demonstrated the benefits of BDA-AI in the supply chain integration process and its impact on environmental performance. By assessing a sample of 168 French hospitals, Benzidia et al. [ 40 ], has observed that the use of BDA-AI technologies has a significant impact on environmental process integration and green supply chain. In particular, this study provides important insights for healthcare managers, who wish to implement BDA-AI technologies for sustaining green supply processes and improving environmental performance [ 40 ]. BDA and web technologies can successfully help managers to redesign healthcare processes making them more effective and efficient. Since healthcare spending is constantly growing in the world’s major regions, there is urgent need to redesign processes optimizing supply chain activities such that high-quality services could be provided at lower costs [ 42 ]. Although BDA-based management systems promise to fulfil this role in the healthcare organization, more in-depth studies are required. Due to heterogeneity of information sources, one of future research direction should deeply investigate the protocol standardization and integration in data analyzing as well as techniques and technologies used, security algorithms of BDA in the healthcare and medical data [ 43 ].

In developing countries, as well as in the rest of the world, the management of health surveillance is a sensitive issue (RA3). Therefore, authors have studied main key factors that hind BDA access in the healthcare organization [ 44 ]. Technology, staff, data management and health policies have been identified as some of decisive variables [ 44 ]. Due to increasing of the ageing population and the related disability, healthcare organizations will face hard challenges soon. To this end, big data can also help healthcare managers to detect patterns and to turn high volumes of data into usable knowledges. In this context investments in technological infrastructures are needed as well as in the human capital [ 45 ]. China is proving, with a large scale of investment, to be a pioneer country in the adoption of BDA-based management systems in the healthcare organization [ 46 ].

The rising of AI, IoT, machine learning [ 49 , 50 , 51 ], and sensors technology, as well as embedded systems able to communicate each other, have boosted the adoption of BDA with valuable benefits for the healthcare organization (RA4). These technologies will play a fundamental role on big data management to improve the performances of the healthcare organizations. Some authors have underlined privacy issues related to healthcare data and the necessity to make sensor data homogeneous and tagged. Furthermore, implementation of clinical records into models and adaptation of machine-learning techniques is required [ 47 ]. Future R&D in this field should be focused on the developing of digital platforms and specific applications based on BDA also for managing diagnostic images [ 48 ].

By exploring the relationship between BDA-based management systems and the benefits delivered to the healthcare organizations, this study replies to 3 RQs: 1) What is the state of art of BDA adopted by healthcare organizations, 2) What are the benefits for both health managers and healthcare organizations and 3) What are the future directions on BDA research in healthcare.

To answer the RQs the SLR has started from an investigation on the recent literature BDA about the BDA in healthcare organizations. Descriptive analysis has been performed on a sample of 34 studies coming from all over the world. The second stage shows a detailed content analysis on 16 studies which better answer to research question about the relationship between benefits for the healthcare organization and BDA solutions.

By analyzing the successful BDA strategies in healthcare context, some authors focus their attention on the BDA potentialities applied in the healthcare organizations [ 16 , 37 ]. Indeed, the research highlights how analytical tools through personal health systems support public health management systems and how BDA suggests new pathways to support healthcare managers in decision-making process.

In the literature, other scholars highlight the positive impact of BDA on resource management. The BDA solutions are analyzed as tools to sustain CE initiatives [ 38 , 39 ] as well as to enable green supply chain process integration and improve hospital performance [ 40 ]. By exploiting KPIs coming from BDA solutions, some researchers present innovative models for planning public health policy [ 41 ]. In this context, the studies consider BDA cloud computing solutions and social media data analytics for supporting the performance of healthcare supply chain management [ 42 , 43 ]. Furthermore, researchers from all around the world are showing particular interest on BDA for health surveillance system management [ 44 , 45 , 46 ].

According to the recent literature, BDA is transforming the healthcare organizations. The SLR has showed how the BDA solutions are now quite considered a milestone for managerial studies applied to healthcare organizations. The Coronavirus pandemic has been a good test run for using BDA to design healthcare policy strategies. Although an extensive literature on BDA to support healthcare management is being produced, the classification into four RAs proposed is an attempt to examine precise key research directions. About that, the limitations of the present research can be detected as the difficulty to review a field of literature constantly evolving. To date, the amount of data is no longer an issue. To be useful in the healthcare context, is necessary to validate their quality and then find the right correlations. In other words, the data should be processed, analyzed, and interpreted correctly. For this reason, emerges the need to address research pathways towards filtering mechanisms, by converting data from big to smart, and engineering new decision support systems within the healthcare organizations [ 38 ].

The content analysis carried out in this research has shown that studies are addressed to find out new models for both predictive and personalized medicine by exploiting BDA technologies [ 47 ]. The researchers underline the added value of using BDA both in the medical diagnostic process [ 48 ] and jointly with IT technologies such as IOT and machine learning [ 49 , 51 ].

Thus, considering the results obtained, it is possible to state that BDA can effectively help healthcare managers to detect common patterns and turn high volumes of data into usable knowledges. Investments on human capital become a priority to exploit the potential of BDA [ 45 ].

To achieve these objectives the future research should provide usable insights and standardized procedures for training healthcare managers and practitioners. AI, machines learning, as well as management strategies, will also play their part as knowledge producers in the healthcare organization. Privacy issues related to healthcare data and also the necessity to make sensor data homogeneous, are becoming crucial research topics to be faced. Finally, due to the heterogeneity of information sources, the future direction of research should investigate the standardization and integration of the protocol in data analysis, as well as the techniques useful for the managerial sector to implement increasingly BDA-based management systems in future healthcare organizations [ 43 ].

Nowadays the challenge for healthcare organizations is the development of useful applications BDA-based. According with the circular economy view, the future research directions should be addressed considering the relationship between digitalization and management resources consumption. The data centralization combined with a BDA approach can effectively support circular economy processes in healthcare supply chain by reducing waste and resource consumptions.

Exploiting the BDA’s capabilities will also be a key factor in forecasting and monitoring outbreaks. Future studies will need to focus on developing more efficient models for sharing data in order to improve the performance of healthcare organizations around the world.

Availability of data and materials

The datasets analyzed during the current study are not publicly available due to data relating to scientific journal names and authors but are available from the corresponding author on reasonable request.

Wang L, Alexander CA. Big data in medical applications and health care. Curr Res Med. 2015;6:1–8.

Article   Google Scholar  

Aceto G, Persico V, Pescape A. Industry 4.0 and health: internet of things, big data, and cloud computing for healthcare 4.0. J Ind Inf Integr. 2020;18:100129.

Google Scholar  

Galetsi P, Katsaliaki K, Kumar S. Values, challenges and future directions of big data analytics in healthcare: A systematic review. Soc Sci Med. 2019;241:112533.

Article   CAS   PubMed   Google Scholar  

Obermeyer Z, Emanuel EJ. Predicting the future — big data, machine learning, and clinical medicine. New Engl J Med. 2016;375:1216–9.

Article   PubMed   Google Scholar  

Kumar Y, Sood K, Kaul S, Vasuja R, et al. Big data analytics and its benefits in healthcare. In: Kulkarni J, et al., editors. Big data analytics in healthcare, studies in big data 66. Cham: Springer; 2020. p. 3–21.

Raghupati W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst Vol. 2014;2(1):1–10.

Jain DA, Kumar V, Khanduja D, Sharma K, Bateja R. A detailed study of big data in healthcare: case study of Brenda and IBM Watson. Int J Recent Technol Eng. 2019;7:8–12.

Tremolada, L. (2019), “Quanti dati sono generati in un giorno?” Il Sole24Ore , May 26, 2019, available at: https://www.infodata.ilsole24ore.com/2019/05/14/quanti-dati-sono-generati-in-un-giorno/?refresh_ce=1 (Accessed 17 Feb 2022).

Srivastava P.K., Rakshit P. Cutting edge IoT Technology for Smart Indian Pharma. In: International Conference on Advance Computing and Innovative Technologies in Engineering, (ICACITE) 2021. Greater Noida: Institute of Electrical and Electronics Engineers Inc.; 2021. p. 360–2.

Rayan R.A, Tsagkaris C, Zafar I. IoT for better mobile health applications. In: Kumar P, editor. A fusion of artificial intelligence and internet of things for emerging cyber systemsand internet of things for emerging cyber systems. Cham: Springer; 2022. p. 1–13.

Chung K, Park RC. Chatbot-based healthcare service with a knowledge base for cloud computing. Cluster Comput. 2019;22:1925–37.

Ali F, El-Sappagh S, Islam SMR, Ali A, Attique M, Imran M, Kwak KS. An intelligent healthcare monitoring framework using wearable sensors and social networking data. Fut Generation Comput Syst. 2021;114:23–43.

Yousefi S, Derakhshan F, Karimipour H. Applications of big data analytics and machine learning in the internet of things. In: Choo KK, Dehghantanha A, editors. Handbook of big data privacy. Cham: Springer; 2020. p. 77–108.

Chapter   Google Scholar  

Mehta N, Pandit A, Kulkarni M. Elements of healthcare big data analytics. In: Big data analytics in healthcare, studies in big data 66. Cham: Springer; 2018.

Han Y, Lie RK, Guo R. The internet hospital as a telehealth model in China: systematic search and content analysis. J Med Int Res. 2020;22:e17995.

Wang Y Hajli, N.,. Exploring the path to big data analytics success in healthcare. J Bus Res. 2017;70:287–99.

Srinivasan R, Swink M. An investigation of visibility and flexibility as complements to supply chain analytics: an organizational information processing theory perspective. Prod Oper Manage. 2018;27:1849–67.

Wang Y, Byrd TA. Business analytics-enabled decision-making effectiveness through knowledge absorptive capacity in health care. J Knowl Manage. 2017;21:517–39.

Wang Y, Kung LA, Byrd TA. Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technol Forecast Soc Change. 2018;126:3–13.

Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C. Big data and its technical challenges. Commun ACM. 2014;57:86–94.

Seddon PB, Constantinidis D, Dod H. How does business analytics contribute to business value? In: Information Systems Journal, Proceeding of Thirty Third International Conference on Information Systems. Orlando: Wiley Publishing Ltd; 2012. p. 237–69.

Cao G, Duan Y, Li G. Linking business analytics to decision making effectiveness: a path model analysis. IEEE Trans Eng Manage. 2015;62:384–95.

Watson HJ. Tutorial: big data analytics: concepts, technologies, and applications. Commun Assoc Inf Syst. 2014;34:1247–68.

Negash S. Business intelligence. Commun Assoc Inf Syst. 2004;13:177–95.

Hurwitz J, Nugent A, Hapler F, Kaufman M. Big data for dummies. Hoboken: Wiley; 2013.

Sadeghi P, Benyoucef M, Kuziemsky CE. A mashup-based framework for multimulti-level healthcare interoperability. Inf Syst Front. 2012;14:57–72.

Yu W, Zhao G, Liu Q, Song Y. Role of big data analytics capability in developing integrated hospital supply chains and operational flexibility: An organizational information processing theory perspective. Technol Forecast Soc Change. 2021;163:120417.

Butler TW, Leong GK, Everett LN. The operations management role in hospital strategic planning. J Oper Manag. 1996;14:137–56.

Slack N, Brandon-Jones A, Johnston R. Operations management. 8th ed. Harlow: Pearson; 2016.

Liu, J., (2020), “Deployment of health IT in China’s fight against the COVID-19 pandemic”, available at: https://www.itnonline.com/article/deployment-health-it-china%E2%80%99s-fight-against-covid-19-pandemic (Accessed 20 Dec 2021).

Ting DS, Wei LC, Dzau V, Wong TY. Digital technology and COVID-19. Nat Med. 2020;26:459–61.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Rajasekera J, Mishal A.V., Mori Y, et al. Innovative mHealth solution for reliable patient data empowering rural healthcare in developing countries. In: Kulkarni A, et al., editors. Big data analytics in healthcare. Studies in big data, vol 66,. Cham: Springer; 2020. p. 83–103.

Ambert, K., Beaune, S., Chaibi, A., Briard, L., Bhattacharjee, A., Bharadwaj, V., Sumanth, K., Crowe, K. (2016), “French Hospital Uses Trusted Analytics Platform to Predict Emergency Department Visits and Hospital Admissions”, available at: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/french-hospital-analytics-predict-admissions-paper.pdf , (Accessed 13 Mar 2022).

Van Sickle D, Barrett M, Humblet O, Henderson K, Hogg C. Randomized, controlled study of the impact of a mobile health tool on asthma SABA use, control and adherence. Eur Respir J .  2016;48(Suppl. 60):1018.

Merchant R, Szefler SJ, Bender BG, Tuffli M, Barrett MA, Gondalia R, Kaye L, Van Sickle D, Stempel DA. Impact of a digital health intervention on asthma resource utilization. World Allergy Org J. 2018;411:28.

Khanra S, Dhir A, Islam N, Mäntymäki M. Big data analytics in healthcare: a systematic literature review. Enterprise Inf Syst. 2020;14:878–912.

Sousa MJ, Pesqueira AM, Lemos C, Sousa M, Rocha Á. Decision-making based on big data analytics for people management in healthcare organizations. J Med Syst. 2019;43:290.

Maglaveras N, Kilintzis V, Koutkias V, Chouvarda I. Integrated care and connected health approaches leveraging personalised health through big data analytics. Stud Health Technol Inf. 2016;224:117–22.

Kazançoğlu Y, Sağnak M, Lafcı Ç, Luthra S, Kumar A, Taçoğlu C. Big Data-enabled solutions framework to overcoming the barriers to circular economy initiatives in healthcare sector. Int J Environ Res Public Health. 2021;18:7513.

Article   PubMed   PubMed Central   Google Scholar  

Benzidia S, Makaoui N, Bentahar O. The impact of big data analytics and artificial intelligence on green supply chain process integration and hospital environmental performance. Technol Forecast Soc Change. 2021;165:120557.

Moutselos K, Maglogiannis I. Evidence-based public health policy models development and evaluation using big data analytics and web technologies. Med Arch (Sarajevo, Bosnia and Herzegovina). 2020;74:47–53.

Alotaibi S, Mehmood R, Katib I, Chlamtac I. The role of big data and twitter data analytics in healthcare supply chain management. In: Mehmood R, See S, Katib I, editors. Smart infrastructure and applications. Cham: EAI/Springer Innovations in Communication and Computing, Springer; 2020. p. 267–79.

Kundella S, Gobinath R. A survey on big data analytics in medical and healthcare using cloud computing. Int J Sci Technol Res. 2019;8:1061–5.

Chellah RC, Kunda D. An assessment of factors that affect the implementation of big data analytics in the Zambian health sector for strategic planning and predictive analysis: a case of Copperbelt province. Int J Electron Healthc. 2020;11:101–22.

Pastorino R, De Vito C, Migliara G, Glocker K, Binenbaum I, Ricciardi W, Boccia S. Benefits and challenges of big data in healthcare: an overview of the European initiatives. Eur J Public Health. 2019;29:23–7.

Gunapal PPG, Kannapiran P, Teow KL, Zhu Z, You AX, Saxena N, Singh V, Tham L, Choo PWJ, Chong P-N, Sim JHJ, Wong JEL. Setting up a regional health system database for seamless population health management in Singapore. Proc Singapore Healthc. 2016;25:27–34.

Clim A, Zota RD, Tinica G. Big data in home healthcare: A new frontier in personalized medicine. Medical emergency services and prediction of hypertension risks. Int J Healthc Manage. 2019;12:241–9.

Aiello M, Cavaliere C, D’Albore A, Salvatore M. The challenges of diagnostic imaging in the era of big data. J Clin Med. 2019;8:316.

Article   PubMed Central   Google Scholar  

Bharathi MJ, Rajavarman VN. A survey on big data management in health care using IOT. Int J Recent Technol Eng. 2019;7:196–8.

Lai A, Rossignoli F, Stacchezzini R. How integrated reporting meets the investors and other stakeholders’information needs . (In Vrontis D., Weber Y., Tsoukatos E.) Global and national business theories and practice: bridging the past with the future. Cyprus: EuroMed Press; 2017.

Martinez F.E.L, Núñez-Valdez E.R, et al. Big data and machine learning: a way to improve outcomes in population health management. In: González García C, et al., editors. Protocols and applications for the industrial internet of things. Hershey: IGI Global; 2018. p. 225–39.

Download references

Acknowledgements

Not applicable.

The research was carried out without funding.

Author information

Authors and affiliations.

Department of Economics, University of Foggia, Via Caggese n.1, Foggia, Italy

Nicola Cozzoli, Fiorella Pia Salvatore, Nicola Faccilongo & Michele Milone

You can also search for this author in PubMed   Google Scholar

Contributions

NC and FPS designed and conducted the empirical study, wrote and revised the manuscript. NC and FPS carried out the analysis and wrote the results, discussion and conclusions. NC, FPS, NF, and MM revised the manuscript. All authors read the manuscript and approved the final version.

Corresponding author

Correspondence to Fiorella Pia Salvatore .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

List of articles.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Cozzoli, N., Salvatore, F.P., Faccilongo, N. et al. How can big data analytics be used for healthcare organization management? Literary framework and future research from a systematic review. BMC Health Serv Res 22 , 809 (2022). https://doi.org/10.1186/s12913-022-08167-z

Download citation

Received : 02 March 2022

Accepted : 06 June 2022

Published : 22 June 2022

DOI : https://doi.org/10.1186/s12913-022-08167-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Healthcare management
  • Healthcare organization
  • Healthcare governance
  • Big data analytics

BMC Health Services Research

ISSN: 1472-6963

big data analytics in healthcare research paper pdf

The use of Big Data Analytics in healthcare

Affiliations.

  • 1 Department of Business Informatics, University of Economics in Katowice, Katowice, Poland.
  • 2 Department of Biomedical Processes and Systems, Institute of Health and Nutrition Sciences, Częstochowa University of Technology, Częstochowa, Poland.
  • PMID: 35013701
  • PMCID: PMC8733917
  • DOI: 10.1186/s40537-021-00553-4

The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data Analytics in medical facilities. The direct research was carried out based on research questionnaire and conducted on a sample of 217 medical facilities in Poland. Literature studies have shown that the use of Big Data Analytics can bring many benefits to medical facilities, while direct research has shown that medical facilities in Poland are moving towards data-based healthcare because they use structured and unstructured data, reach for analytics in the administrative, business and clinical area. The research positively confirmed that medical facilities are working on both structural data and unstructured data. The following kinds and sources of data can be distinguished: from databases, transaction data, unstructured content of emails and documents, data from devices and sensors. However, the use of data from social media is lower as in their activity they reach for analytics, not only in the administrative and business but also in the clinical area. It clearly shows that the decisions made in medical facilities are highly data-driven. The results of the study confirm what has been analyzed in the literature that medical facilities are moving towards data-based healthcare, together with its benefits.

Keywords: Big Data; Big Data Analytics; Data-driven healthcare.

© The Author(s) 2022.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Integr Bioinform
  • v.15(3); 2018 Sep

Logo of jib

Big Data Analytics in Medicine and Healthcare

Blagoj ristevski.

“St. Kliment Ohridski” University – Bitola, Faculty of Information and Communication Technologies, ul. Partizanska bb, 7000 Bitola, Republic of Macedonia

Department of Bioinformatics, College of Life Sciences, Zhejiang University Zijingang Campus, Hangzhou, P.R. China

This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various – omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.

1. Introduction

To obtain the best services and care for the patients, healthcare organizations in many countries have proposed various models of healthcare information systems. These models for personalized, predictive, participatory and preventive medicine are based on using of electronic health records (EHRs) and huge amounts of complex biomedical data and high-quality – omics data [ 1 ].

Contemporarily genomics and postgenomics technologies produce huge amounts of raw data about complex biochemical and regulatory processes in the living organisms [ 2 ]. These -omics data are heterogeneous, and very often they are stored in different data formats. Similar to these - omics data, the EHRs data are also in heterogeneous formats. The EHRs data can be structured, semi-structured or unstructured; discrete or continuous.

Big data in healthcare and medicine refers to these various large and complex data, which they are difficult to analyse and manage with traditional software or hardware [ 3 ], [ 4 ]. Big data analytics covers integration of heterogeneous data, data quality control, analysis, modeling, interpretation and validation [ 5 ]. Application of big data analytics provides comprehensive knowledge discovering from the available huge amount of data.

Particularly, big data analytics in medicine and healthcare enables analysis of the large datasets from thousands of patients, identifying clusters and correlation between datasets, as well as developing predictive models using data mining techniques [ 2 ]. Big data analytics in medicine and healthcare integrates analysis of several scientific areas such as bioinformatics, medical imaging, sensor informatics, medical informatics and health informatics. A survey of big data cases in medical and healthcare institutions/organizations is given in [ 6 ].

The new knowledge discovered by big data analytics techniques should provide comprehensive benefits to the patients, clinicians and health policy makers [ 7 ].

The remainder of the paper is organized as follows. Related work is described in the second section. Section 3 describes characteristics of big data, while big data analytics is depicted in the subsequent section. The next section explains some challenging issues about big data analytics techniques, while big data privacy and security are described in Section 6 . Last section concludes this paper with discussion and further works.

2. Related Work

The rapid development of the emerging information technologies, experimental technologies and methods, cloud computing, the Internet of Things, social networks supplies the amounts of generated data that is growing tremendously in numerous research fields [ 8 ].

On this point, contemporarily genomics and postgenomics technologies produce huge amounts of raw data about complex biochemical and regulatory processes in the living organisms [ 2 ]. These high throughput – omics data provide comprehensive insight towards different kinds of molecular profiles, changes and interactions, such as knowledge allied to the genome, epigenome, transcriptome, proteome, metabolome, interactome, pharmacogenome, diseasome, etc. [ 9 ]. These – omics data are heterogeneous and very often stored in different data formats. The main aims and characteristics of the different – omics disciplines are tabled in Table 1 .

The main aims of the variety of – omics disciplines.

Similar to these – omics data, the EHRs data are also stored in heterogeneous formats. The EHRs data, which can be structured, semi-structured or unstructured; discrete or continuous, contain personal patients’ data, clinical notes, diagnoses, administrative data, charts, tables, prescriptions, procedures, lab tests, medical images, magnetic resonance imaging (MRI), ultrasound, computer tomography (CT) data. Some of these data are acquired from wearable sensors or capture from medical monitoring devices, with different collection frequency [ 5 ] that makes these data to have complex features and high dimensions [ 10 ]. Dealing with noisiness and incompleteness of EHRs are still challenging task and these shortcomings should be consider while applying data mining techniques [ 11 ].

These growing amounts of various – omics data need to be collect, clean, store, transform, transfer, visualize and deliver in a suitable manner to be represented to the clinicians [ 12 ]. The processing of these big data in medicine and healthcare can be accelerating by using cloud computing and powerful multicore central processing units (CPUs), graphics processing units (GPU) and field-programmable gate arrays (FPGAs) with parallel processing methods.

3. Big Data Characteristics

The term big data is described by the following characteristics: value , volume , velocity , variety , veracity and variability , denoted as 6 “Vs” [ 13 ], [ 14 ], shown in Figure ​ Figure1. 1 . Besides these 6 “Vs”, some authors has defined more than these 6 properties to describe big data characteristics [ 15 ].

An external file that holds a picture, illustration, etc.
Object name is jib-15-20170030-g001.jpg

The 6 V’s of big data.

The volume of health and medical data is expected to raise intensely in the years ahead, usually measured in terabytes, petabytes even yottabytes [ 14 ], [ 16 ]. Volume refers to the amount of data, while velocity refers to data in motion as well as and to the speed and frequency of data creation, processing and analysis. Complexity and heterogeneity of multiple datasets, which can be structured, semi-structured and unstructured, refer to the variety . Veracity referrers to the data quality, relevance, uncertainty, reliability and predictive value [ 14 ], while variability regards about consistency of the data over time. The value of the big data refers to their coherent analysis, which should be valuable to the patients and clinicians.

Considering the big data characteristics, data searching, storage and analysis, a very appropriate and promising software platform for development of applications that can handle big data in medicine and healthcare is the open-source distributed data processing platform Apache Hadoop MapReduce [ 1 ], [ 17 ] that is based on data-intensive computing and NoSQL data modeling techniques [ 18 ].

4. Big Data Analytics

Applications of big data analytics can improve the patient-based service, to detect spreading diseases earlier, generate new insights into disease mechanisms, monitor the quality of the medical and healthcare institutions as well as provide better treatment methods [ 19 ], [ 20 ], [ 21 ].

Data mining techniques employed on EHRs, web and social media data enable identifying the optimal practical guidelines in the hospitals, identifying the association rules in the EHRs [ 22 ] and revealing the disease monitoring and health-based trends. Moreover, integration and analysis of the data with different nature, such as social and scientific, can lead to new knowledge and intelligence, exploring new hypothesis, identifying hidden patterns [ 14 ].

Nowadays, smart phones are excellent platforms to deliver personal messages to patients to involve them in behavioral changes to improve their wellbeing and health conditions. The mobile phone messages can substitute delivering of medical and motivational advices to the patients [ 14 ].

5. Challenges in Big Data Analytics

Regarding collection of large amount data, some challenging issues should be considered. Obtaining high-throughput – omics data is tied to the cost of experimental measurements. Concerning heterogeneity of the data sources, the noise of the experimental – omics data and the variety of the experimental techniques, environmental conditions, biological nature should be considered, before integration of these heterogeneous data and before employing of the data mining methods. Different data mining techniques can be applied on these heterogeneous biomedical data sets, such as: anomaly detection, clustering, classification, association rules as well as summarization and visualization of those big data sets.

These shortcomings might lead to the unreliability of some of the data points, such as missing values or outliers. Despite of these drawbacks of the – omics data, EHRs data are very influenced by the staff who entered the patient’s data, which can lead to entering missing values, incorrect data as a result of mistakes, misunderstanding or wrong interpretation of the original data [ 5 ]. Integration of data from various databases and standardization for laboratory protocols and values still remain challenging issues [ 10 ].

High dimensionality of the – omics data means, that there have many more dimensions or features than the number of samples, and on the other side the EHRs data which regard to the individuals/patients, makes data mining techniques to be more challenging task.

The subsequent stage is the pre-processing of the data, which usually envelop handling noisy data, outliers, missing values, data transformation and normalization. This data pre-processing enables to be applied statistical techniques and data mining methods and thus the big data analytics quality and outcomes can improve and can result with discovering of novel knowledge. This novel knowledge obtained by integration of the – omics and EHRs data should results with improving of the implemented healthcare to the patients as well to advanced decision making by the healthcare decision policy makers.

6. Big Data Privacy and Security

Two important issues towards big data in healthcare and medicine are security and privacy of the individuals/patients [ 14 ], [ 23 ]. All medical data are very sensitive and different countries consider these data as legally possessed by the patients [ 2 ]. To address these security and privacy challenges, the big data analytics software solutions should use advanced encryption algorithms and pseudo-anonymization of the personal data. These software solutions should provide security on the network level and authentication for all involved users, guarantee privacy and security, as well as set up good governance standards and practices.

7. Discussion and Future Work

Big data analytics in medicine and healthcare is very promising process of integrating, exploring and analysing of large amount complex heterogeneous data with different nature: biomedical data, experimental data, electronic health records data and social media data. Integration of such diverse data makes big data analytics to intertwine several fields, such as bioinformatics, medical imaging, sensor informatics, medical informatics, health informatics and computational biomedicine. As a further work, the big data characteristics provide very appropriate basis to use promising software platforms for development of applications that can handle big data in medicine and healthcare. One such platform is the open-source distributed data processing platform Apache Hadoop MapReduce that use massive parallel processing (MPP) [ 20 ], [ 24 ]. These applications should enable applying data mining techniques to these heterogeneous and complex data to reveal hidden patterns and novel knowledge from the data.

Recent hardware innovations in processor technology, newer kinds of memories/network architecture will minimize the time spent in moving the data from storage to the processor in a distributed setting [ 25 ].

Acknowledgements

This paper was supported by the Ministry of Education and Science of the Republic of Macedonia and the Ministry of Science and Technology (MOST) of the Government of the People’s Republic of China.

Conflict of interest statement

Authors state no conflict of interest. All authors have read the journal’s Publication ethics and publication malpractice statement available at the journal’s website and hereby confirm that they comply with all its parts applicable to the present scientific work.

Big Data Analytics in Healthcare

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

IMAGES

  1. (PDF) Big data analytics in healthcare: Promise and potential

    big data analytics in healthcare research paper pdf

  2. (PDF) How can big data analytics be used for healthcare organization

    big data analytics in healthcare research paper pdf

  3. (PDF) Health Big Data Analytics: A Technology Survey

    big data analytics in healthcare research paper pdf

  4. 4 Benefits of Data Analytics in Healthcare

    big data analytics in healthcare research paper pdf

  5. (PDF) Review Article Big Data Analytics in Healthcare

    big data analytics in healthcare research paper pdf

  6. Big Data Analytics in Healthcare: Need of the Hour

    big data analytics in healthcare research paper pdf

VIDEO

  1. "Conducting" big data for AI development

  2. Week 02

  3. Top 10 Healthcare IT Careers for Technology Enthusiasts

  4. What is Big Data I Introduction to Big Data I Accredian

  5. How to Write a Scientific Research Paper

  6. How Big Data Analytics is Transforming Advertising

COMMENTS

  1. (PDF) Big Data Analytics in Healthcare Systems

    Big Data analytics can improve patient outcomes, advance and personalize care, improve provider relation ships with. patients, and reduce medical sp ending. This paper introduces he althcare data ...

  2. The use of Big Data Analytics in healthcare

    The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected results of direct research on the use of Big Data ...

  3. Big data analytics in healthcare: a systematic literature review

    2.1. Characteristics of big data. The concept of BDA overarches several data-intensive approaches to the analysis and synthesis of large-scale data (Galetsi, Katsaliaki, and Kumar Citation 2020; Mergel, Rethemeyer, and Isett Citation 2016).Such large-scale data derived from information exchange among different systems is often termed 'big data' (Bahri et al. Citation 2018; Khanra, Dhir ...

  4. Big data analytics in healthcare: a systematic literature review

    Malik, Abdallah, and Ala'raj (2018) reviewed the use of BDA in supply chain management in healthcare. Saheb and Izadi (2019) reviewed the use of big data sourced from Internet-of-Things devices in the healthcare industry. Such review studies are not designed to provide a comprehensive review of the literature on BDA in healthcare.

  5. PDF Big data analytics in healthcare: promise and potential

    This article provides an overview of big data analytics in healthcare as it is emerging as a discipline. First, we define and discuss the various advantages and character-istics of big data analytics in healthcare. Then we de-scribe the architectural framework of big data analytics in healthcare. Third, the big data analytics application

  6. Big Data Analytics in Healthcare

    The advent of healthcare information management systems (HIMSs) continues to produce large volumes of healthcare data for patient care and compliance and regulatory requirements at a global scale. Analysis of this big data allows for boundless potential outcomes for discovering knowledge. Big data analytics (BDA) in healthcare can, for instance, help determine causes of diseases, generate ...

  7. Big Data Analytics in Healthcare

    The book is divided into two main sections, the first of which discusses the challenges and opportunities associated with the implementation of big data in the healthcare sector. In turn, the second addresses the mathematical modeling of healthcare problems, as well as current and potential future big data applications and platforms.

  8. Towards the Use of Big Data in Healthcare: A Literature Review

    3. Materials and Methods. A systematic literature review was conducted over the period 2010-2021 to explore the main areas of application of BD in healthcare and the organizational changes needed to address the challenges of applying BD in this area, as well as to illustrate the potential benefits in light of the COVID-19 health emergency that, with its extemporaneity and unpredictability ...

  9. A Review of the Role and Challenges of Big Data in Healthcare

    Big data's objective and guiding concept is to gather more information and more insights from this information and has the capacity to forecast future occurrences. Several reputable healthcare firms expect a robust growth rate in the healthcare data sector. 2. Literature Review of Healthcare Data.

  10. Big data in healthcare: management, analysis and future prospects

    Management and analysis of big data. Big data is the huge amounts of a variety of data generated at a rapid rate. The data gathered from various sources is mostly required for optimizing consumer services rather than consumer consumption. This is also true for big data from the biomedical research and healthcare.

  11. PDF The use of Big Data Analytics in healthcare

    The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of selected ...

  12. Exploring big data analytics in health care

    Here Big Data plays a vital role in storing huge volumes of patient information using storage mechanisms such as HDFS, HBase. Many issues in health care are discussed in this paper such as prediction of diseases, getting patients information across databases as a single view. Previous. Data mining.

  13. How can big data analytics be used for healthcare organization

    Big data is transforming and will transform the healthcare organizations in the near future [1, 2].Scientific literature in the managerial context applied to healthcare organizations, consider the Big Data Analytics (BDA) a fundamental tool, so much so that it has attracted the attention of the scientific community and stakeholders [].However, a premise should be made: data by themselves ...

  14. The use of Big Data Analytics in healthcare

    Abstract. The introduction of Big Data Analytics (BDA) in healthcare will allow to use new technologies both in treatment of patients and health management. The paper aims at analyzing the possibilities of using Big Data Analytics in healthcare. The research is based on a critical analysis of the literature, as well as the presentation of ...

  15. The use of Big Data Analytics in healthcare

    The first is the introduction which provides background and the general problem statement of this research. In the second part, this paper discusses considerations on use of Big Data and Big Data Analytics in Healthcare, and then, in the third part, it moves on to challenges and potential benefits of using Big Data Analytics in healthcare.

  16. Big Data Analytics in Health Care: A Review Paper

    Furthermore, such value can be provided using big data analytics, which is the application of advanced analytics techniques on big data. This paper presents an overview of big data content, sources, technologies, tools, and challenges in health care. It also intends to identify the strategies to overcome the challenges.

  17. Big Data Analytics in Healthcare: An Overview

    If you need immediate assistance, call 877-SSRNHelp (877 777 6435) in the United States, or +1 212 448 2500 outside of the United States, 8:30AM to 6:00PM U.S. Eastern, Monday - Friday. Big data in healthcare is vast and complex to understand through traditional data processing techniques. These data can be collected from patient records, clini.

  18. Big Data Analytics in Medicine and Healthcare

    This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various - omics ...

  19. Big Data Analytics in Healthcare

    The pace of both digital innovation and technology disruption is refining the healthcare industry at an exponential rate. The large volume of healthcare data continues to mount every second, making it harder and very difficult to find any form of useful information. Recently, big data is shifting the traditional way of data delivery into valuable insights using big data analytics method. Big ...

  20. [PDF] Retracted: Influential Usage of Big Data and Artificial

    This paper summarizes the recent promising applications of AI and big data in medical health and electronic health, which have potentially added value to diagnosis and patient care. ... Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI. Learn More. About About Us Meet the Team ...