Optimizing Data Pipelines for Real-Time Healthcare Analytics in Distributed Systems: Architectural Strategies, Performance Trade-offs, and Emerging Paradigms

Olasehinde Omolayo1, Raphael Ugboko2, Deborah Olamide Oyeyemi3, Oluwafemi Oloruntoba4*, & Samuel O. Fakunle5
1 Mathematics and Statistics Department, Georgia State University, USA
2 Human-Centered Computing, Clemson University, USA
3 Business Analytics and Information Management, University of Delaware, USA
4 Department of Information Technology, Lamar University, USA
5 Information Security & Systems, University of East London. United Kingdom
DOI
– http://doi.org/10.37502/IJSMR.2025.8708

FULL TEXT – PDF

Abstract

The growing complexity and volume of healthcare data necessitate highly optimized real-time analytics systems capable of supporting clinical decision-making and operational efficiency. This study investigates architectural strategies for optimizing data pipelines in distributed healthcare analytics environments. It evaluates key performance metrics such as latency, throughput, scalability, reliability, and data consistency across multiple pipeline architectures, including Lambda, Kappa, and Micro-Batch (Spark). Using synthetic healthcare datasets and performance benchmarks, we highlight trade-offs between latency and operational costs, emphasizing the critical balance between system efficiency and clinical utility. Emerging paradigms such as edge computing, AI-driven optimization, and adaptive resource management are explored as pathways to enhance resilience and performance. The findings provide actionable insights for designing adaptive, secure, and cost-effective healthcare data pipelines capable of meeting stringent real-time demands.

Keywords: Real-Time Healthcare Analytics; Distributed Systems; Data Pipelines; Latency Optimization; Scalability; Edge Computing; AI-Driven Optimization

References

  • Ann Alexander, C., & Wang, L. (2018). Big Data and Data-Driven Healthcare Systems. In Journal of Business and Management Sciences (Vol. 6, Issue 3, pp. 104–111). Science and Education Publishing Co., Ltd. https://doi.org/10.12691/jbms-6-3-7
  • Mishra, S. (2025). PERFORMANCE OPTIMIZATION TECHNIQUES IN DATABASE RELIABILITY ENGINEERING. In INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND INFORMATION TECHNOLOGY (Vol. 8, Issue 1, pp. 2230–2241). IAEME Publication. https://doi.org/10.34218/ijrcait_08_01_162
  • Mahmood, N., Burney, A., Abbas, Z., & Rizwan, K. (2012). Data and Knowledge Management in Designing Healthcare Information Systems. In International Journal of Computer Applications (Vol. 50, Issue 2, pp. 34–39). Foundation of Computer Science. https://doi.org/10.5120/7745-0798
  • Chatterjee, S., & Strosnider, J. (1995). Distributed Pipeline Scheduling: A Framework for Distributed, Heterogeneous Real-Time System Design. In The Computer Journal (Vol. 38, Issue 4, pp. 271–285). Oxford University Press (OUP). https://doi.org/10.1093/comjnl/38.4.271
  • Frolov, Angela, “REAL-TIME DATA DISTRIBUTION” (2014). Open Access Dissertations. Paper 227. https://doi.org/10.23860/diss-frolov-angela-2014
  • Tormasov, A., Lysov, A., & Mazur, E. (2015). Distributed data storage systems: analysis, classification and choice. In Proceedings of the Institute for System Programming of the RAS (Vol. 27, Issue 6, pp. 225–252). Institute for System Programming of the Russian Academy of Sciences. https://doi.org/10.15514/ispras-2015-27(6)-15
  • Netinant, P., Saengsuwan, N., Rukhiran, M., & Pukdesree, S. (2023). Enhancing Data Management Strategies with a Hybrid Layering Framework in Assessing Data Validation and High Availability Sustainability. In Sustainability (Vol. 15, Issue 20, p. 15034). MDPI AG. https://doi.org/10.3390/su152015034
  • Petrenko, A., Kyslyi, R., & Pysmennyi, I. (2018). Designing security of personal data in distributed health care platform. In Technology audit and production reserves (Vol. 4, Issue 2(42), pp. 10–15). Private Company Technology Center. https://doi.org/10.15587/2312-8372.2018.141299
  • Sai Krishna, Dr. K. V. N. R., & Srinivas Rao, Dr. A. (2020). Data Science Applications inside Healthcare. In International Journal of Computer Science and Mobile Computing (Vol. 9, Issue 12, pp. 30–40). Zain Publications. https://doi.org/10.47760/ijcsmc.2020.v09i12.005
  • Vijayalakshmi, A., & John Paul, C. (2018). Big Data Health Care System Using Distributed Wearable In International Journal of Engineering & Technology (Vol. 7, Issue 4.10, pp. 429–431). Science Publishing Corporation. https://doi.org/10.14419/ijet.v7i4.10.21033
  • Meir, A., & Rubinsky, B. (2009). Distributed Network, Wireless and Cloud Computing Enabled 3-D Ultrasound; a New Medical Technology Paradigm. In H. P. Soyer (Ed.), PLoS ONE (Vol. 4, Issue 11, p. e7974). Public Library of Science (PLoS). https://doi.org/10.1371/journal.pone.0007974
  • Oloruntoba, O., Ekundayo, T., & Aladebumoye, T. (2022). Optimizing Investments with Cloud-Based Data Mining Frameworks. International Research Journal of Modernization in Engineering Technology and Science, 04(12), 2172–2186. https://doi.org/https://www.doi.org/10.56726/IRJMETS32232
  • Kubiuk, Y., & Kharchenko, K. (2020). Design and implementation of the distributed system using an orchestrator based on the data flow paradigm. In Technology audit and production reserves (Vol. 3, Issue 2(53), pp. 38–41). Private Company Technology Center. https://doi.org/10.15587/2706-5448.2020.205151
  • Nguyen, T. (2017). Big data system for health care records. In VNU Journal of Science: Policy and Management Studies (Vol. 33, Issue 2). Vietnam National University Journal of Science. https://doi.org/10.25073/2588-1116/vnupam.4101
  • Hong, N., Wen, A., Shen, F., Sohn, S., Wang, C., Liu, H., & Jiang, G. (2019). Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. In JAMIA Open (Vol. 2, Issue 4, pp. 570–579). Oxford University Press (OUP). https://doi.org/10.1093/jamiaopen/ooz056
  • Satyanarayan Kanungo. (2024). AI-driven resource management strategies for cloud computing systems, services, and applications. In World Journal of Advanced Engineering Technology and Sciences (Vol. 11, Issue 2, pp. 559–566). GSC Online Press. https://doi.org/10.30574/wjaets.2024.11.2.0137
  • Zheng, X., Sun, S., Mukkamala, R. R., Vatrapu, R., & Ordieres-Meré, J. (2019). Accelerating Health Data Sharing: A Solution Based on the Internet of Things and Distributed Ledger Technologies. In Journal of Medical Internet Research (Vol. 21, Issue 6, p. e13583). JMIR Publications Inc. https://doi.org/10.2196/13583
  • (2004). THE DATA FLOW AND DISTRIBUTED CALCULATIONS INTELLIGENCE INFORMATION TECHNOLOGY FOR DECISION SUPPORT SYSTEM IN REAL TIME. In Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems (pp. 497–500). SciTePress – Science and and Technology Publications. https://doi.org/10.5220/0002592804970500
  • Wu, C.-J., Liu, G.-M., & Liu, X. (2017). Network Optimization for Distributed Memory File System on High Performance Computers. In Proceedings of the 2nd Annual International Conference on Electronics, Electrical Engineering and Information Science (EEEIS 2016). Atlantis Press. https://doi.org/10.2991/eeeis-16.2017.93