Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

conferences

publications

A mathematical justification for metronomic chemotherapy in oncology

Published in Mathematical Modelling of Natural Phenomena, 2022

Abstract We mathematically justify metronomic chemotherapy as the best strategy to apply many cytotoxic drugs in oncology for both curative and palliative approaches, assuming the classical pharmacokinetic model together with the Emax pharmacodynamic and the Norton-Simon hypothesis. From the mathematical point of view, we will consider two mixed-integer nonlinear optimization problems, where the unknowns are the number of the doses and the quantity of each one, adjusting the administration times a posteriori.

Recommended citation: Fernández, L. A., Pola, C., & Sáinz-Pardo, J. (2022). A mathematical justification for metronomic chemotherapy in oncology. Mathematical Modelling of Natural Phenomena, 17, 12.
Download Paper

A Python library to check the level of anonymity of a dataset

Published in Scientific Data, 2022

Abstract Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy, loss and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two different problems related to intermittent clients are discussed, together with two approaches to be followed for each of them. Specifically, this type of problems may occur because in a real scenario some clients may leave the training, and others enter it, and on the other hand because of client technical or connectivity problems. Finally, improvements and future work in the field are proposed.

Recommended citation: Sáinz-Pardo Díaz, J., López García, Á. A Python library to check the level of anonymity of a dataset. Sci Data 9, 785 (2022). https://doi.org/10.1038/s41597-022-01894-2
Download Paper

Study of the performance and scalability of federated learning for medical imaging with intermittent clients

Published in Neurocomputing, 2023

Recommended citation: Sáinz-Pardo Díaz, J., & López García, Á. (2023). Study of the performance and scalability of federated learning for medical imaging with intermittent clients. Neurocomputing, 518, 142-154. https://doi.org/10.1016/j.neucom.2022.11.011
Download Paper

Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain’s case study

Published in Scientific Reports, 2023

Abstract In this work the applicability of an ensemble of population and machine learning models to predict the evolution of the COVID-19 pandemic in Spain is evaluated, relying solely on public datasets. Firstly, using only incidence data, we trained machine learning models and adjusted classical ODE-based population models, especially suited to capture long term trends. As a novel approach, we then made an ensemble of these two families of models in order to obtain a more robust and accurate prediction. We then proceed to improve machine learning models by adding more input features: vaccination, human mobility and weather conditions. However, these improvements did not translate to the overall ensemble, as the different model families had also different prediction patterns. Additionally, machine learning models degraded when new COVID variants appeared after training. We finally used Shapley Additive Explanation values to discern the relative importance of the different input features for the machine learning models’ predictions. The conclusion of this work is that the ensemble of machine learning models and population models can be a promising alternative to SEIR-like compartmental models, especially given that the former do not need data from recovered patients, which are hard to collect and generally unavailable.

Recommended citation: Heredia Cacha, I., Sáinz-Pardo Díaz, J., Castrillo, M. et al. Forecasting COVID-19 spreading through an ensemble of classical and machine learning models: Spain’s case study. Sci Rep 13, 6750 (2023). https://doi.org/10.1038/s41598-023-33795-8
Download Paper

Comparison of machine learning models applied on anonymized data with different techniques

Published in Proceeding of the 2023 IEEE International Conference on Cyber Security and Resilience (CSR), 2023

Abstract Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or l-diversity. However, the application of these methods is directly connected to a reduction of their utility in prediction and decision making tasks. In this work we study four classical machine learning methods currently used for classification purposes in order to analyze the results as a function of the anonymization techniques applied and the parameters selected for each of them. The performance of these models is studied when varying the value of for k-anonymity and additional tools such as l-diversity, t-closeness and delta-disclosure privacy are also deployed on the well-known adult dataset.

Recommended citation: Sáinz-Pardo Díaz, J., & López García, Á. (2023). Comparison of machine learning models applied on anonymized data with different techniques, 2023 IEEE International Conference on Cyber Security and Resilience (CSR), Venice, Italy, 2023, pp. 618-623. https://doi.org/10.1109/CSR57506.2023.10224917
Download Paper

Deep learning based soft-sensor for continuous chlorophyll estimation on decentralized data

Published in Water Research, 2023

Abstract Monitoring the concentration of pigments like chlorophyll (Chl) in water-bodies is a key task to contribute to their conservation. However, with the existing sensor technology, measurement in real-time and with enough frequency to ensure proper risk management is not completely feasible. In this work, with the concept of data-driven soft-sensing, three hydrophysical features are used together with three meteorological ones to estimate the concentration of Chl in two tributaries of the River Thames. Data driven models, specifically neural networks, are used with three learning approaches: individual, centralized and federated. Data reduction scenarios are proposed in order to analyze the performance of each approach when less data is available. The best results in the training are usually obtained with the individual approach. However, the federated learning provides better generalization ability. It was also observed that in most of the cases the results of the federated learning approach improve those of the centralized one.

Recommended citation: Sáinz-Pardo Díaz, J., Castrillo, M., & López García, Á. (2023). Deep learning based soft-sensor for continuous chlorophyll estimation on decentralized data. Water Research, 120726. https://doi.org/10.1016/j.watres.2023.120726
Download Paper

Making Federated Learning Accessible to Scientists: The AI4EOSC Approach

Published in Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security, 2024

Abstract Access to computing resources is a critical requirement for researchers in a wide diversity of areas. This has become even more important with the rise of artificial intelligence techniques through the training of machine learning and deep learning models. In this sense, the AI4EOSC project aims to respond to this need by delivering an enhanced set of advanced services and tools for the development of artificial intelligence, machine and deep models, such as federated learning, in the European Open Science Cloud (EOSC). Federated learning is a technology in the field of privacy-preserving machine learning techniques that has revolutionized the current state of the art, evolving from classical centralized approaches to allow training models in a decentralized way, without sharing raw data. In this work, we present the production implementation of a federated learning system based on the Flower framework that allows users, without a technological background, to exploit this technique, performing federated learning training within the AI4EOSC platform. The objective is to be able to train this type of architecture in an intuitive way; for this purpose, a user-friendly dashboard has been implemented, whose development will be reviewed. The frameworks and technologies used for this implementation will be exposed together with an example of use from scratch, in order to demonstrate the use of this functionality of the platform. Finally, two scenarios concerning client availability are analyzed.

Recommended citation: Judith Sáinz-Pardo Díaz, Andrés Heredia Canales, Ignacio Heredia Cachá, Viet Tran, Giang Nguyen, Khadijeh Alibabaei, Marta Obregón Ruiz, Susana Rebolledo Ruiz, and Álvaro López García. 2024. Making Federated Learning Accessible to Scientists: The AI4EOSC Approach. In Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec 24). Association for Computing Machinery, New York, NY, USA, 253–264. https://doi.org/10.1145/3658664.3659642
Download Paper

Personalized federated learning for improving radar based precipitation nowcasting on heterogeneous areas

Published in Earth Science Informatics, 2024

Abstract The increasing generation of data in different areas of life, such as the environment, highlights the need to explore new techniques for processing and exploiting data for useful purposes. In this context, artificial intelligence techniques, especially through deep learning models, are key tools to be used on the large amount of data that can be obtained, for example, from weather radars. In many cases, the information collected by these radars is not open, or belongs to different institutions, thus needing to deal with the distributed nature of this data. In this work, the applicability of a personalized federated learning architecture, which has been called adapFL, on distributed weather radar images is addressed. To this end, given a single available radar covering 400 km in diameter, the captured images are divided in such a way that they are disjointly distributed into four different federated clients. The results obtained with adapFL are analyzed in each zone, as well as in a central area covering part of the surface of each of the previously distributed areas. The ultimate goal of this work is to study the generalization capability of this type of learning technique for its extrapolation to use cases in which a representative number of radars is available, whose data can not be centralized due to technical, legal or administrative concerns. The results of this preliminary study indicate that the performance obtained in each zone with the adapFL approach allows improving the results of the federated learning approach, the individual deep learning models and the classical Continuity Tracking Radar Echoes by Correlation approach.

Recommended citation: Sáinz-Pardo Díaz, J., Castrillo, M., Bartok, J., Heredia Cachá, I., Malkin Ondík, I., Martynovskyi, I., Alibabaei, K., Berberi, L., Kozlov, V. & López García, Á. (2024). Personalized federated learning for improving radar based precipitation nowcasting on heterogeneous areas. Earth Sci Inform (2024). https://doi.org/10.1007/s12145-024-01438-9
Download Paper

An Open Source Python Library for Anonymizing Sensitive Data

Published in Scientific Data, 2024

Abstract Open science is a fundamental pillar to promote scientific progress and collaboration, based on the principles of open data, open source and open access. However, the requirements for publishing and sharing open data are in many cases difficult to meet in compliance with strict data protection regulations. Consequently, researchers need to rely on proven methods that allow them to anonymize their data without sharing it with third parties. To this end, this paper presents the implementation of a Python library for the anonymization of sensitive tabular data. This framework provides users with a wide range of anonymization methods that can be applied on the given dataset, including the set of identifiers, quasi-identifiers, generalization hierarchies and allowed level of suppression, along with the sensitive attribute and the level of anonymity required. The library has been implemented following best practices for integration and continuous development, as well as the use of workflows to test code coverage based on unit and functional tests.

Recommended citation: Sáinz-Pardo Díaz, J., López García, Á. An Open Source Python Library for Anonymizing Sensitive Data. Sci Data 11, 1289 (2024). https://doi.org/10.1038/s41597-024-04019-z
Download Paper

Published in , 1900

Landscape of machine learning evolution: privacy-preserving federated learning frameworks and tools

Published in Artificial Intelligence Review, 2024

Abstract Machine learning is one of the most widely used technologies in the field of Artificial Intelligence. As machine learning applications become increasingly ubiquitous, concerns about data privacy and security have also grown. The work in this paper presents a broad theoretical landscape concerning the evolution of machine learning and deep learning from centralized to distributed learning, first in relation to privacy-preserving machine learning and secondly in the area of privacy-enhancing technologies. It provides a comprehensive landscape of the synergy between distributed machine learning and privacy-enhancing technologies, with federated learning being one of the most prominent architectures. Various distributed learning approaches to privacy-aware techniques are structured in a review, followed by an in-depth description of relevant frameworks and libraries, more particularly in the context of federated learning. The paper also highlights the need for data protection and privacy addressed from different approaches, key findings in the field concerning AI applications, and advances in the development of related tools and techniques.

Recommended citation: Nguyen, G., Sáinz-Pardo Díaz, J., Calatrava, A. et al. Landscape of machine learning evolution: privacy-preserving federated learning frameworks and tools. Artif Intell Rev 58, 51 (2025). https://doi.org/10.1007/s10462-024-11036-2
Download Paper

talks

AIHUB CSIC Summer School 2022

Published: July 04, 2022

Event: AIHUB CSIC Summer School 2023.

2nd Inria-DFKI European Summer School on Artificial Intelligence 2022 (IDESSAI 2022)

Published: September 01, 2022

Poster: “pyCANON: A Python library to check the level of anonymity of a dataset”.

AI4EOSC project kick off meeting

Published: October 05, 2022

Attendance to the kick off meeting of the AI4EOSC project.

X jornadas doctorales y V jornadas de divulgación grupo de Universidades del G9

Published: June 01, 2023

Poster: “Privacy preserving techniques for data science”.

AIHUB CSIC Summer School 2023

Published: July 05, 2023

Poster: “Application of federated learning to medical imaging scenarios”.

IEEE International Conference on Cyber Security and Resilience 2023

Published: August 02, 2023

Presentation of the paper: “Comparison of machine learning models applied on anonymized data with different techniques”.

AI4EOSC users workshop

Published: November 15, 2023

Talk: Federated Learning with Flower. The workshop intended to bring AI4EOSC platform users, supporters, and developers together to share their experiences and upcoming updates of the platform.

Flower Monthly (January)

Published: January 03, 2024

Talk: “Federated Learning with Flower in the European Open Science Cloud”. Talk given joinly with Álvaro López García.

EOSC SIESTA project kick off meeting

Published: January 25, 2024

Attendance to the kick off meeting of the EOSC SIESTA project.

Flower AI Summit 2024

Published: March 15, 2024

Talk: “Federated AI in the European Open Science Cloud”.

AI4EOSC webinars (3)

Published: April 22, 2024

Talk: “Demo: FL in AI4EOSC”.

Zero-code tools & BMZ Community partners

Published: June 11, 2024

Talk: “Advanced AI for scientists: the AI4EOSC platform approach” (given jointly with Ignacio Heredia).

12th ACM Workshop on Information Hiding and Multimedia Security

Published: June 26, 2024

Presentation of the paper: “Making Federated Learning Accessible to Scientists: The AI4EOSC Approach”.

Data in research: challenges and opportunities.

Published: August 27, 2024

Event: Data in research: challenges and opportunities. Summer courses International University Menéndez Pelayo (UIMP). Spanish. Curso: Los datos en investigación: retos y oportunidades. Cursos de verano de la Universidad internacional Menéndez Pelayo (UIMP).

Joint Workshop PTI Digital Science - PTI Green Horizon.

Published: September 17, 2024

Event: Joint Workshop PTI Digital Science - PTI Green Horizon. Spanish. Workshop conjunto de la PTI Ciencia Digital y la PTI Horizonte verde.

The eyes of AI. Topic: Federated learning applications.

Published: January 19, 2025

Event: University of Doha for Science and Technology (UDST).

EOSC SIESTA All Hands Meeting

Published: February 04, 2025

Event: EOSC SIESTA All Hands Meeting.

Exploring AI4EOSC: AI and LLMs from Theory to Practice

Published: March 07, 2025

Talk: “Why AI4EOSC? Take advantage of the platform”.

teaching

Master in Data Science (2021-2022)

Master course, International University Menéndez Pelayo (UIMP) and University of Cantabria (UC), 2022

Teaching at the course on: Security, privacy and legal aspects. 4 sessions (2 hours each). Topics: differential privacy, federated learning. Check the website of the master’s program.

Master in Data Science (2022-2023)

Master course, International University Menéndez Pelayo (UIMP) and University of Cantabria (UC), 2023

Teaching at the course on: Security, privacy and legal aspects. 5 sessions (2 hours each). Topics: anonymity, differential privacy, federated learning. Check the website of the master’s program.

Final master thesis supervision: Analyzing the performance of machine learning models on anonymized data

Final master thesis supervision, International University Menéndez Pelayo (UIMP) and University of Cantabria (UC), 2023

Title of the master thesis supervised: “Analyzing the performance of machine learning models on anonymized data”.

Final master thesis supervision: Building a Python library for anonymizing sensitive data

Final master thesis supervision, International University Menéndez Pelayo (UIMP) and University of Cantabria (UC), 2023

Title of the master thesis supervised: “Building a Python library for anonymizing sensitive data”.

Master in Data Science (2023-2024)

Master course, International University Menéndez Pelayo (UIMP) and University of Cantabria (UC), 2024

Teaching at the course on: Security, privacy and legal aspects. 7 sessions (2 hours each). Topics: privacy, anonymity, differential privacy, federated learning. Check the website of the master’s program.

Final master thesis supervision: Comparison of distributed machine learning techniques applied to openly available medical data

Final master thesis supervision, International University Menéndez Pelayo (UIMP) and University of Cantabria (UC), 2024

Title of the master thesis supervised: Comparison of distributed machine learning techniques applied to openly available medical data.

Master in Data Science (2024-2025)

Master course, International University Menéndez Pelayo (UIMP) and University of Cantabria (UC), 2025

Teaching at the course on: Security, privacy and legal aspects. 8 sessions (2 hours each). Topics: privacy, anonymity, differential privacy, federated learning. Check the website of the master’s program.

Judith Sáinz-Pardo Díaz