Posts by Collection

conferences

publications

A mathematical justification for metronomic chemotherapy in oncology

Published in Mathematical Modelling of Natural Phenomena, 2022

Abstract We mathematically justify metronomic chemotherapy as the best strategy to apply many cytotoxic drugs in oncology for both curative and palliative approaches, assuming the classical pharmacokinetic model together with the Emax pharmacodynamic and the Norton-Simon hypothesis. From the mathematical point of view, we will consider two mixed-integer nonlinear optimization problems, where the unknowns are the number of the doses and the quantity of each one, adjusting the administration times a posteriori.

Recommended citation: Fernández, L. A., Pola, C., & Sáinz-Pardo, J. (2022). A mathematical justification for metronomic chemotherapy in oncology. Mathematical Modelling of Natural Phenomena, 17, 12.
Download Paper

A Python library to check the level of anonymity of a dataset

Published in Scientific Data, 2022

Abstract Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy, loss and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two different problems related to intermittent clients are discussed, together with two approaches to be followed for each of them. Specifically, this type of problems may occur because in a real scenario some clients may leave the training, and others enter it, and on the other hand because of client technical or connectivity problems. Finally, improvements and future work in the field are proposed.

Recommended citation: Sáinz-Pardo Díaz, J., López García, Á. A Python library to check the level of anonymity of a dataset. Sci Data 9, 785 (2022). https://doi.org/10.1038/s41597-022-01894-2
Download Paper

Study of the performance and scalability of federated learning for medical imaging with intermittent clients

Published in Neurocomputing, 2023

Abstract Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy, loss and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two different problems related to intermittent clients are discussed, together with two approaches to be followed for each of them. Specifically, this type of problems may occur because in a real scenario some clients may leave the training, and others enter it, and on the other hand because of client technical or connectivity problems. Finally, improvements and future work in the field are proposed.

Recommended citation: Sáinz-Pardo Díaz, J., & López García, Á. (2023). Study of the performance and scalability of federated learning for medical imaging with intermittent clients. Neurocomputing, 518, 142-154. https://doi.org/10.1016/j.neucom.2022.11.011
Download Paper

Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain’s case study

Published in Scientific Reports, 2023

Abstract In this work the applicability of an ensemble of population and machine learning models to predict the evolution of the COVID-19 pandemic in Spain is evaluated, relying solely on public datasets. Firstly, using only incidence data, we trained machine learning models and adjusted classical ODE-based population models, especially suited to capture long term trends. As a novel approach, we then made an ensemble of these two families of models in order to obtain a more robust and accurate prediction. We then proceed to improve machine learning models by adding more input features: vaccination, human mobility and weather conditions. However, these improvements did not translate to the overall ensemble, as the different model families had also different prediction patterns. Additionally, machine learning models degraded when new COVID variants appeared after training. We finally used Shapley Additive Explanation values to discern the relative importance of the different input features for the machine learning models’ predictions. The conclusion of this work is that the ensemble of machine learning models and population models can be a promising alternative to SEIR-like compartmental models, especially given that the former do not need data from recovered patients, which are hard to collect and generally unavailable.

Recommended citation: Heredia Cacha, I., Sáinz-Pardo Díaz, J., Castrillo, M. et al. Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain’s case study. Sci Rep 13, 6750 (2023). https://doi.org/10.1038/s41598-023-33795-8
Download Paper

Comparison of machine learning models applied on anonymized data with different techniques

Published in Proceeding of the 2023 IEEE International Conference on Cyber Security and Resilience (CSR), 2023

Abstract Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or l-diversity. However, the application of these methods is directly connected to a reduction of their utility in prediction and decision making tasks. In this work we study four classical machine learning methods currently used for classification purposes in order to analyze the results as a function of the anonymization techniques applied and the parameters selected for each of them. The performance of these models is studied when varying the value of for k-anonymity and additional tools such as l-diversity, t-closeness and delta-disclosure privacy are also deployed on the well-known adult dataset.

Recommended citation: Sáinz-Pardo Díaz, J., & López García, Á. (2023). Comparison of machine learning models applied on anonymized data with different techniques, 2023 IEEE International Conference on Cyber Security and Resilience (CSR), Venice, Italy, 2023, pp. 618-623. https://doi.org/10.1109/CSR57506.2023.10224917
Download Paper

Deep learning based soft-sensor for continuous chlorophyll estimation on decentralized data

Published in Water Research, 2023

Abstract Monitoring the concentration of pigments like chlorophyll (Chl) in water-bodies is a key task to contribute to their conservation. However, with the existing sensor technology, measurement in real-time and with enough frequency to ensure proper risk management is not completely feasible. In this work, with the concept of data-driven soft-sensing, three hydrophysical features are used together with three meteorological ones to estimate the concentration of Chl in two tributaries of the River Thames. Data driven models, specifically neural networks, are used with three learning approaches: individual, centralized and federated. Data reduction scenarios are proposed in order to analyze the performance of each approach when less data is available. The best results in the training are usually obtained with the individual approach. However, the federated learning provides better generalization ability. It was also observed that in most of the cases the results of the federated learning approach improve those of the centralized one.

Recommended citation: Sáinz-Pardo Díaz, J., Castrillo, M., & López García, Á. (2023). Deep learning based soft-sensor for continuous chlorophyll estimation on decentralized data. Water Research, 120726. https://doi.org/10.1016/j.watres.2023.120726
Download Paper

Making Federated Learning Accessible to Scientists: The AI4EOSC Approach

Published in Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security, 2024

Abstract Access to computing resources is a critical requirement for researchers in a wide diversity of areas. This has become even more important with the rise of artificial intelligence techniques through the training of machine learning and deep learning models. In this sense, the AI4EOSC project aims to respond to this need by delivering an enhanced set of advanced services and tools for the development of artificial intelligence, machine and deep models, such as federated learning, in the European Open Science Cloud (EOSC). Federated learning is a technology in the field of privacy-preserving machine learning techniques that has revolutionized the current state of the art, evolving from classical centralized approaches to allow training models in a decentralized way, without sharing raw data. In this work, we present the production implementation of a federated learning system based on the Flower framework that allows users, without a technological background, to exploit this technique, performing federated learning training within the AI4EOSC platform. The objective is to be able to train this type of architecture in an intuitive way; for this purpose, a user-friendly dashboard has been implemented, whose development will be reviewed. The frameworks and technologies used for this implementation will be exposed together with an example of use from scratch, in order to demonstrate the use of this functionality of the platform. Finally, two scenarios concerning client availability are analyzed.

Recommended citation: Judith Sáinz-Pardo Díaz, Andrés Heredia Canales, Ignacio Heredia Cachá, Viet Tran, Giang Nguyen, Khadijeh Alibabaei, Marta Obregón Ruiz, Susana Rebolledo Ruiz, and Álvaro López García. 2024. Making Federated Learning Accessible to Scientists: The AI4EOSC Approach. In Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec 24). Association for Computing Machinery, New York, NY, USA, 253–264. https://doi.org/10.1145/3658664.3659642
Download Paper

Personalized federated learning for improving radar based precipitation nowcasting on heterogeneous areas

Published in Earth Science Informatics, 2024

Abstract The increasing generation of data in different areas of life, such as the environment, highlights the need to explore new techniques for processing and exploiting data for useful purposes. In this context, artificial intelligence techniques, especially through deep learning models, are key tools to be used on the large amount of data that can be obtained, for example, from weather radars. In many cases, the information collected by these radars is not open, or belongs to different institutions, thus needing to deal with the distributed nature of this data. In this work, the applicability of a personalized federated learning architecture, which has been called adapFL, on distributed weather radar images is addressed. To this end, given a single available radar covering 400 km in diameter, the captured images are divided in such a way that they are disjointly distributed into four different federated clients. The results obtained with adapFL are analyzed in each zone, as well as in a central area covering part of the surface of each of the previously distributed areas. The ultimate goal of this work is to study the generalization capability of this type of learning technique for its extrapolation to use cases in which a representative number of radars is available, whose data can not be centralized due to technical, legal or administrative concerns. The results of this preliminary study indicate that the performance obtained in each zone with the adapFL approach allows improving the results of the federated learning approach, the individual deep learning models and the classical Continuity Tracking Radar Echoes by Correlation approach.

Recommended citation: Sáinz-Pardo Díaz, J., Castrillo, M., Bartok, J., Heredia Cachá, I., Malkin Ondík, I., Martynovskyi, I., Alibabaei, K., Berberi, L., Kozlov, V. & López García, Á. (2024). Personalized federated learning for improving radar based precipitation nowcasting on heterogeneous areas. Earth Sci Inform (2024). https://doi.org/10.1007/s12145-024-01438-9
Download Paper

An Open Source Python Library for Anonymizing Sensitive Data

Published in Scientific Data, 2024

Abstract Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented following best practices for integration and continuous development, as well as the use of workflows to test code coverage based on unit and functional tests.

Recommended citation: Sáinz-Pardo Díaz, J., López García, Á. An Open Source Python Library for Anonymizing Sensitive Data. Sci Data 11, 1289 (2024). https://doi.org/10.1038/s41597-024-04019-z
Download Paper

Landscape of machine learning evolution: privacy-preserving federated learning frameworks and tools

Published in Artificial Intelligence Review, 2024

Abstract Machine learning is one of the most widely used technologies in the field of Artificial Intelligence. As machine learning applications become increasingly ubiquitous, concerns about data privacy and security have also grown. The work in this paper presents a broad theoretical landscape concerning the evolution of machine learning and deep learning from centralized to distributed learning, first in relation to privacy-preserving machine learning and secondly in the area of privacy-enhancing technologies. It provides a comprehensive landscape of the synergy between distributed machine learning and privacy-enhancing technologies, with federated learning being one of the most prominent architectures. Various distributed learning approaches to privacy-aware techniques are structured in a review, followed by an in-depth description of relevant frameworks and libraries, more particularly in the context of federated learning. The paper also highlights the need for data protection and privacy addressed from different approaches, key findings in the field concerning AI applications, and advances in the development of related tools and techniques.

Recommended citation: Nguyen, G., Sáinz-Pardo Díaz, J., Calatrava, A. et al. Landscape of machine learning evolution: privacy-preserving federated learning frameworks and tools. Artif Intell Rev 58, 51 (2025). https://doi.org/10.1007/s10462-024-11036-2
Download Paper

Machine learning operations landscape: platforms and tools

Published in Artificial Intelligence Review, 2025

Abstract As the field of machine learning advances, managing and monitoring intelligent models in production, also known as machine learning operations (MLOps), has become essential. Organizations are increasingly adopting artificial intelligence as a strategic tool, thus increasing the need for reliable, and scalable MLOps platforms. Consequently, every aspect of the machine learning life cycle, from workflow orchestration to performance monitoring, presents both challenges and opportunities that require sophisticated, flexible, and scalable technological solutions. This research addresses this demand by providing a comprehensive assessment framework of MLOps platforms highlighting the key features necessary for a robust MLOps solution. The paper examines 16 MLOps tools widely used, which revolve around capabilities within AI infrastructure management, including but not limited to experiment tracking, model deployment, and model inference. Our three-step evaluation framework starts with a feature analysis of the MLOps platforms, then GitHub stars growth assessment for adoption and prominence, and finally, a weighted scoring method to single out the most influential platforms. From this process, we derive valuable insights into the essential components of effective MLOps systems and provide a decision-making flowchart that simplifies platform selection. This framework provides hands-on guidance for organizations looking to initiate or enhance their MLOps strategies, whether they require an end-end solutions or specialized tools.

Recommended citation: Berberi, L., Kozlov, V., Nguyen, G. Sáinz-Pardo Díaz, J. et al. Machine learning operations landscape: platforms and tools. Artif Intell Rev 58, 167 (2025). https://doi.org/10.1007/s10462-025-11164-3
Download Paper

AI4EOSC: Artificial Intelligence for the European Open Science Cloud

Published in International Symposium on Grids and Clouds (ISGC2025), 2025

Abstract The AI4EOSC (Artificial Intelligence for the European Open Science Cloud) project aims at contributing to the landscape of Artificial Intelligence (AI) research with a comprehensive and user-friendly suite of tools and services within the framework of the European Open Science Cloud (EOSC). This innovative platform is specifically designed to empower researchers by enabling the development, deployment, and management of advanced AI solutions. Key features of the platform include support for federated learning, which facilitates collaborative model training across distributed datasets while ensuring data privacy; zero-touch model deployment, which streamlines the transition from development to production environments; MLOps tools, which optimize the lifecycle management of AI models; model serving on serverless computing platforms, and the visual design of composite AI pipelines, which integrate multiple AI techniques for enhanced analytical capabilities. By offering a robust and flexible infrastructure, the platform not only supports domain-specific customization but also fosters interdisciplinary collaboration, reflecting the ethos of the European Open Science Cloud. We will also discuss the foundational frameworks and technologies that constitute the backbone of the platform, emphasizing their scalability, interoperability, and adherence to open science principles. These technologies enable seamless integration with existing research workflows and ensure that the platform remains accessible and sustainable for the scientific community.

Recommended citation: Tran, V., López, A., Nguyen, G., Sáinz-Pardo, J., & Moltó, G. (2025, March). AI4EOSC: Artificial Intelligence for the European Open Science Cloud. In International Symposium on Grids and Clouds (ISGC2025) (Vol. 16, p. 21).
Download Paper

Exploring Federated Learning for Thermal Urban Feature Segmentation - A Comparison of Centralized and Decentralized Approaches.

Published in Computational Science and Its Applications, 2025

Abstract Federated Learning (FL) is an approach for training a shared Machine Learning (ML) model with distributed training data and multiple participants. FL allows bypassing limitations of the traditional Centralized Machine Learning (CL) if data cannot be shared or stored centrally due to privacy or technical restrictions – the participants train the model locally with their training data and do not need to share it among the other participants. This paper investigates the practical implementation and effectiveness of FL in a real-world scenario, specifically focusing on unmanned aerial vehicle (UAV)-based thermal images for common thermal feature detection in urban environments. The distributed nature of the data arises naturally and makes it suitable for FL applications, as images captured in two German cities are available. This application presents unique challenges due to non-identical distribution and feature characteristics of data captured at both locations. The study makes several key contributions by evaluating FL algorithms in real deployment scenarios rather than simulation. We compare several FL approaches with a centralized learning baseline across key performance metrics such as model accuracy, training time, communication overhead, and energy usage. This paper also explores various FL workflows, comparing client-controlled workflows and server-controlled workflows. The findings of this work serve as a valuable reference for understanding the practical application and limitations of the FL methods in segmentation tasks in UAV-based imaging.

Recommended citation: Duda, L. et al. (2025). Exploring Federated Learning for Thermal Urban Feature Segmentation - A Comparison of Centralized and Decentralized Approaches. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2025. ICCSA 2025. Lecture Notes in Computer Science, vol 15648. Springer, Cham. https://doi.org/10.1007/978-3-031-97000-9_18
Download Paper

Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data

Published in 2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA), 2026

Abstract The development of deep learning techniques is a leading field applied to cases in which medical data is used, particularly in cases of image diagnosis. This type of data has privacy and legal restrictions that in many cases prevent it from being processed from central servers. However, in this area collaboration between different research centers, in order to create models as robust as possible, trained with the largest quantity and diversity of data available, is a critical point to be taken into account. In this sense, the application of privacy aware distributed architectures, such as federated learning arises. When applying this type of architecture, the server aggregates the different local models trained with the data of each data owner to build a global model. This point is critical and therefore it is fundamental to analyze different ways of aggregation according to the use case, taking into account the distribution of the clients, the characteristics of the model, etc. In this paper we propose a novel aggregation strategy and we apply it to a use case of cerebral magnetic resonance image classification. In this use case the aggregation function proposed manages to improve the convergence obtained over the rounds of the federated learning process in relation to different aggregation strategies classically implemented and applied.

Recommended citation: J. Sáinz-Pardo Díaz and Á. López García, "Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data," 2025 3rd International Conference on Federated Learning Technologies and Applications (FLTA), Dubrovnik, Croatia, 2025, pp. 9-16, doi: 10.1109/FLTA67013.2025.11336682.
Download Paper

Assessing metadata privacy in neuroimaging Open Access

Published in Imaging Neuroscience, 2026

Abstract The ethical and legal imperative to share research data without causing harm requires careful attention to privacy risks. While mounting evidence demonstrates that data sharing benefits science, legitimate concerns persist regarding the potential leakage of personal information that could lead to reidentification and subsequent harm. We reviewed metadata accompanying neuroimaging datasets from heterogeneous studies openly available on OpenNeuro, involving participants across the lifespan—from children to older adults—with and without clinical diagnoses, and including associated clinical score data. Using metaprivBIDS (https://github.com/CPernet/metaprivBIDS), a software application for BIDS-compliant tsv/json files that computes and reports different privacy metrics (k-anonymity, k-global, l-diversity, SUDA, PIF), we found that privacy is generally well maintained, with serious vulnerabilities being rare. Nonetheless, issues were identified in nearly all datasets and warrant mitigation. Notably, clinical score data (e.g., neuropsychological results) posed minimal reidentification risk, whereas demographic variables — age, sex assigned at birth, sexual orientations, race, income, and geolocation — represented the principal privacy vulnerabilities. We outline practical measures to address these risks, enabling safer data sharing practices.

Recommended citation: Emilie Kibsgaard, Anita Sue Jwa, Christopher J Markiewicz, David Rodriguez Gonzalez, Judith Sainz Pardo, Russell A. Poldrack, Cyril R. Pernet; Assessing metadata privacy in neuroimaging. Imaging Neuroscience 2026; doi: https://doi.org/10.1162/IMAG.a.11442
Download Paper

Metric-privacy-inspired noise calibration in federated learning: Improving convergence and preventing client inference attacks

Published in Knowledge-Based Systems, 2026

Abstract Federated learning (FL) enables the training of a global model across multiple data owners (clients) without sharing raw data. This distributed architecture is orchestrated by a central server that aggregates the local models from the clients. In cases where the server is trusted but not all network nodes, differential privacy (DP) can be used to privatize the aggregated model by adding noise. However, this may affect convergence across the FL rounds. In this work, we build on the notion of metric-privacy as a design principle to calibrate the noise added by the server under a global-DP setting, with the objective of mitigating its impact on the convergence of the aggregated model. We do not enforce metric-privacy as a formal guarantee, but rather use it to guide noise calibration. We compare our approach with vanilla FL and global-DP by analyzing the impact on six aggregation strategies and applying it to a medical imaging use case, simulating different scenarios with homogeneous and non-i.i.d. clients. Finally, we introduce the client inference attack (CIA), where a semi-honest client tries to find whether another client participated in the training and study how it can be mitigated using DP and metric-aware noise calibration. Our experiments show that metric-privacy aware noise calibration strategy improves the accuracy compared to standard DP in all the scenarios analyzed, while achieving a comparable success rate against CIA. These results indicate that metric-privacy inspired noise calibration can deliver a superior utility-privacy trade-off in medical-imaging federated settings.

Recommended citation: Sáinz-Pardo Díaz, J., Athanasiou, A., Jung, K., Palamidessi, C., & López García, Á. (2026). Metric-privacy-inspired noise calibration in federated learning: Improving convergence and preventing client inference attacks. Knowledge-Based Systems, 115993. DOI: https://doi.org/10.1016/j.knosys.2026.115993
Download Paper

talks

AI4EOSC users workshop

Published:

Talk: Federated Learning with Flower. The workshop intended to bring AI4EOSC platform users, supporters, and developers together to share their experiences and upcoming updates of the platform.

Flower Monthly (January)

Published:

Talk: “Federated Learning with Flower in the European Open Science Cloud”. Talk given joinly with Álvaro López García.

Data in research: challenges and opportunities.

Published:

Event: Data in research: challenges and opportunities. Summer courses International University Menéndez Pelayo (UIMP). Spanish. Curso: Los datos en investigación: retos y oportunidades. Cursos de verano de la Universidad internacional Menéndez Pelayo (UIMP).

NeuroAI: from neurons and connectomes to engrams.

Published:

Event: NeuroAI: from neurons and connectomes to engrams. Summer courses International University Menéndez Pelayo (UIMP). Spanish. Curso: NeuroAI: desde las neuronas y conectomas a los engramas.. Cursos de verano de la Universidad internacional Menéndez Pelayo (UIMP).

teaching