Linda M. Peelen, Teus H. Kappen and Wilton A. van Klei | University Medical Center Utrecht, Utrecht, The Netherlands
The phrase “big data” is used abundantly, both in science and in everyday life. A PubMed search on “big data” in 2015 yielded 695 hits (3 in 2009); within perioperative medicine literature big data has become of interest.1-3
Do we have big data?
The phrase “big data” means different things to different people, but a common concept of Big Data includes the “four Vs”: volume, velocity, variety, and veracity.4 The field of perioperative medicine certainly has the potential to create Big Data. With more than 230 million surgeries per year worldwide5 and part of the data measured at high frequency, we create a large amount of data (volume). These data may be used for analysis in near real-time (velocity), provided our Anaesthesia Information Management Systems (AIMSs) can deal with the computational power and storage required for these analyses with anaesthesiologists expanding their domain to pre- and postoperative care rather than only the operating room. Thus we increasingly register data from different data sources (variety), such as hospital information systems and pharmacy data. The quality of the data, veracity, will be discussed further.
Do we need big data?
Over the past decades our field has improved in safety, resulting in low incidences of major perioperative complications (<1-3%), such as hypoxic brain injury or death. Nonetheless, even events with a 1% rate have an impact on 2 million people each year worldwide. Currently these devastating events are hardly studied in randomized clinical trials, as their low incidences would require inclusion of significant numbers of patients. Big data may help address questions for such rare endpoints and improve clinical practice. For example, the widespread use of perioperative beta blockers in non-cardiac surgery and the routine use of aprotinin for cardiac surgery have both been questioned by large clinical dataset research.6,7 The results of these landmark hypothesis-generating studies eventually radically changed the perioperative use of these drugs.
Research on treatment policies and patient care trajectories, rather than on a single intervention, require the use of big datasets, combining data from different centres and sources, including (national) registries.
Challenges in obtaining big data
Collecting big data may not be straightforward. First, there is still a substantial gap between what we clinically observe and what we truly record in our AIMSs. Physiological data are being monitored at high frequencies, but typically get recorded at 1-minute intervals; however, a lot of important information goes unregistered, e.g., the implicit rationale for making certain treatment decisions. Second, differences in coding practice between centres and countries make sharing data difficult and may compromise data quality. We know from existing initiatives8 that it requires extensive effort to harmonize data collection and representation. Reluctance to share data is another barrier for combining data from multiple centres. As scientific journals increasingly require researchers to publicly share their data, there is a need for guidelines on ownership of data and its scientific, and non-scientific, use by others. Finally, legislation on protecting privacy of patients and care providers has recently become more stringent, requiring inventive computational and storage solutions.
Challenges in using big data
The use of big data requires development of new methodology for storage, computational algorithms, and statistical methods to analyse the data, and entirely new research fields focusing on these topics are emerging. But there are more subtle challenges that need to be taken into account. A major issue is that the same data may be used for many purposes. This may seem an advantage, but one should be aware that if the data were collected for a particular purpose, this has probably influenced the level of detail and the actual values that were registered.9 For example, in data collected for billing purposes, choices of discharge diagnoses may be influenced by financial incentives. Hence, although data are collected prospectively, the analyses are conducted in retrospect and bear the potential for bias. Using modern methods new associations in big datasets can be discovered without specifying prior knowledge or hypotheses. The resulting findings may contain ‘new knowledge’, but may also represent noise or even bias. It is unclear how to make this distinction, when our current clinical knowledge and experience serve as the reference, albeit imperfect.
In summary, our field needs large datasets to address relevant research questions and to monitor and improve quality of care. Such large datasets can only be achieved by combining data sources from different hospitals and countries, covering the entire spectrum of perioperative medicine. With the current state of affairs in terms of data harmonization, methods for study design and (statistical) analysis, and legislation, research on ‘big data’ sets can be used as an excellent source for hypothesis generation, but will not yet provide final answers to all of our clinically relevant questions.