Posts by Collection

conferences

publications

A mathematical justification for metronomic chemotherapy in oncology

Published in Mathematical Modelling of Natural Phenomena, 2022

Abstract We mathematically justify metronomic chemotherapy as the best strategy to apply many cytotoxic drugs in oncology for both curative and palliative approaches, assuming the classical pharmacokinetic model together with the Emax pharmacodynamic and the Norton-Simon hypothesis. From the mathematical point of view, we will consider two mixed-integer nonlinear optimization problems, where the unknowns are the number of the doses and the quantity of each one, adjusting the administration times a posteriori.

Recommended citation: Fernández, L. A., Pola, C., & Sáinz-Pardo, J. (2022). A mathematical justification for metronomic chemotherapy in oncology. Mathematical Modelling of Natural Phenomena, 17, 12.
Download Paper

A Python library to check the level of anonymity of a dataset

Published in Scientific Data, 2022

Abstract Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy, loss and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two different problems related to intermittent clients are discussed, together with two approaches to be followed for each of them. Specifically, this type of problems may occur because in a real scenario some clients may leave the training, and others enter it, and on the other hand because of client technical or connectivity problems. Finally, improvements and future work in the field are proposed.

Recommended citation: Sáinz-Pardo Díaz, J., López García, Á. A Python library to check the level of anonymity of a dataset. Sci Data 9, 785 (2022). https://doi.org/10.1038/s41597-022-01894-2
Download Paper

Study of the performance and scalability of federated learning for medical imaging with intermittent clients

Published in Neurocomputing, 2023

Abstract Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy, loss and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two different problems related to intermittent clients are discussed, together with two approaches to be followed for each of them. Specifically, this type of problems may occur because in a real scenario some clients may leave the training, and others enter it, and on the other hand because of client technical or connectivity problems. Finally, improvements and future work in the field are proposed.

Recommended citation: Sáinz-Pardo Díaz, J., & López García, Á. (2023). Study of the performance and scalability of federated learning for medical imaging with intermittent clients. Neurocomputing, 518, 142-154. https://doi.org/10.1016/j.neucom.2022.11.011
Download Paper

Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain’s case study

Published in Scientific Reports, 2023

Abstract In this work the applicability of an ensemble of population and machine learning models to predict the evolution of the COVID-19 pandemic in Spain is evaluated, relying solely on public datasets. Firstly, using only incidence data, we trained machine learning models and adjusted classical ODE-based population models, especially suited to capture long term trends. As a novel approach, we then made an ensemble of these two families of models in order to obtain a more robust and accurate prediction. We then proceed to improve machine learning models by adding more input features: vaccination, human mobility and weather conditions. However, these improvements did not translate to the overall ensemble, as the different model families had also different prediction patterns. Additionally, machine learning models degraded when new COVID variants appeared after training. We finally used Shapley Additive Explanation values to discern the relative importance of the different input features for the machine learning models’ predictions. The conclusion of this work is that the ensemble of machine learning models and population models can be a promising alternative to SEIR-like compartmental models, especially given that the former do not need data from recovered patients, which are hard to collect and generally unavailable.

Recommended citation: Heredia Cacha, I., Sáinz-Pardo Díaz, J., Castrillo, M. et al. Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain’s case study. Sci Rep 13, 6750 (2023). https://doi.org/10.1038/s41598-023-33795-8
Download Paper

Comparison of machine learning models applied on anonymized data with different techniques

Published in Proceeding of the 2023 IEEE International Conference on Cyber Security and Resilience (CSR), 2023

Abstract Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or l-diversity. However, the application of these methods is directly connected to a reduction of their utility in prediction and decision making tasks. In this work we study four classical machine learning methods currently used for classification purposes in order to analyze the results as a function of the anonymization techniques applied and the parameters selected for each of them. The performance of these models is studied when varying the value of for k-anonymity and additional tools such as l-diversity, t-closeness and delta-disclosure privacy are also deployed on the well-known adult dataset.

Recommended citation: Sáinz-Pardo Díaz, J., & López García, Á. (2023). Comparison of machine learning models applied on anonymized data with different techniques, 2023 IEEE International Conference on Cyber Security and Resilience (CSR), Venice, Italy, 2023, pp. 618-623. https://doi.org/10.1109/CSR57506.2023.10224917
Download Paper

Deep learning based soft-sensor for continuous chlorophyll estimation on decentralized data

Published in Water Research, 2023

Abstract Monitoring the concentration of pigments like chlorophyll (Chl) in water-bodies is a key task to contribute to their conservation. However, with the existing sensor technology, measurement in real-time and with enough frequency to ensure proper risk management is not completely feasible. In this work, with the concept of data-driven soft-sensing, three hydrophysical features are used together with three meteorological ones to estimate the concentration of Chl in two tributaries of the River Thames. Data driven models, specifically neural networks, are used with three learning approaches: individual, centralized and federated. Data reduction scenarios are proposed in order to analyze the performance of each approach when less data is available. The best results in the training are usually obtained with the individual approach. However, the federated learning provides better generalization ability. It was also observed that in most of the cases the results of the federated learning approach improve those of the centralized one.

Recommended citation: Sáinz-Pardo Díaz, J., Castrillo, M., & López García, Á. (2023). Deep learning based soft-sensor for continuous chlorophyll estimation on decentralized data. Water Research, 120726. https://doi.org/10.1016/j.watres.2023.120726
Download Paper

Making Federated Learning Accessible to Scientists: The AI4EOSC Approach

Published in Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security, 2024

Abstract Access to computing resources is a critical requirement for researchers in a wide diversity of areas. This has become even more important with the rise of artificial intelligence techniques through the training of machine learning and deep learning models. In this sense, the AI4EOSC project aims to respond to this need by delivering an enhanced set of advanced services and tools for the development of artificial intelligence, machine and deep models, such as federated learning, in the European Open Science Cloud (EOSC). Federated learning is a technology in the field of privacy-preserving machine learning techniques that has revolutionized the current state of the art, evolving from classical centralized approaches to allow training models in a decentralized way, without sharing raw data. In this work, we present the production implementation of a federated learning system based on the Flower framework that allows users, without a technological background, to exploit this technique, performing federated learning training within the AI4EOSC platform. The objective is to be able to train this type of architecture in an intuitive way; for this purpose, a user-friendly dashboard has been implemented, whose development will be reviewed. The frameworks and technologies used for this implementation will be exposed together with an example of use from scratch, in order to demonstrate the use of this functionality of the platform. Finally, two scenarios concerning client availability are analyzed.

Recommended citation: Judith Sáinz-Pardo Díaz, Andrés Heredia Canales, Ignacio Heredia Cachá, Viet Tran, Giang Nguyen, Khadijeh Alibabaei, Marta Obregón Ruiz, Susana Rebolledo Ruiz, and Álvaro López García. 2024. Making Federated Learning Accessible to Scientists: The AI4EOSC Approach. In Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec 24). Association for Computing Machinery, New York, NY, USA, 253–264. https://doi.org/10.1145/3658664.3659642
Download Paper

Personalized federated learning for improving radar based precipitation nowcasting on heterogeneous areas

Published in Earth Science Informatics, 2024

Abstract The increasing generation of data in different areas of life, such as the environment, highlights the need to explore new techniques for processing and exploiting data for useful purposes. In this context, artificial intelligence techniques, especially through deep learning models, are key tools to be used on the large amount of data that can be obtained, for example, from weather radars. In many cases, the information collected by these radars is not open, or belongs to different institutions, thus needing to deal with the distributed nature of this data. In this work, the applicability of a personalized federated learning architecture, which has been called adapFL, on distributed weather radar images is addressed. To this end, given a single available radar covering 400 km in diameter, the captured images are divided in such a way that they are disjointly distributed into four different federated clients. The results obtained with adapFL are analyzed in each zone, as well as in a central area covering part of the surface of each of the previously distributed areas. The ultimate goal of this work is to study the generalization capability of this type of learning technique for its extrapolation to use cases in which a representative number of radars is available, whose data can not be centralized due to technical, legal or administrative concerns. The results of this preliminary study indicate that the performance obtained in each zone with the adapFL approach allows improving the results of the federated learning approach, the individual deep learning models and the classical Continuity Tracking Radar Echoes by Correlation approach.

Recommended citation: Sáinz-Pardo Díaz, J., Castrillo, M., Bartok, J., Heredia Cachá, I., Malkin Ondík, I., Martynovskyi, I., Alibabaei, K., Berberi, L., Kozlov, V. & López García, Á. (2024). Personalized federated learning for improving radar based precipitation nowcasting on heterogeneous areas. Earth Sci Inform (2024). https://doi.org/10.1007/s12145-024-01438-9
Download Paper

An Open Source Python Library for Anonymizing Sensitive Data

Published in Scientific Data, 2024

Abstract Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented following best practices for integration and continuous development, as well as the use of workflows to test code coverage based on unit and functional tests.

Recommended citation: Sáinz-Pardo Díaz, J., López García, Á. An Open Source Python Library for Anonymizing Sensitive Data. Sci Data 11, 1289 (2024). https://doi.org/10.1038/s41597-024-04019-z
Download Paper

talks

AI4EOSC users workshop

Published:

Talk: Federated Learning with Flower. The workshop intended to bring AI4EOSC platform users, supporters, and developers together to share their experiences and upcoming updates of the platform.

Flower Monthly (January)

Published:

Talk: “Federated Learning with Flower in the European Open Science Cloud”. Talk given joinly with Álvaro López García.

Data in research: challenges and opportunities.

Published:

Event: Data in research: challenges and opportunities. Summer courses International University Menéndez Pelayo (UIMP). Spanish. Curso: Los datos en investigación: retos y oportunidades. Cursos de verano de la Universidad internacional Menéndez Pelayo (UIMP).

teaching