ch

Hot Keywords
Connected health

Top
Conn Health 2022;1:98-100. 10.20517/ch.2022.15 © The Author(s) 2022.
Open Access Editorial

Clinical data sharing using Generative Adversarial Networks

1Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran 1417744361, Iran.

2Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4M6, Canada.

Correspondence to: Dr. Marzieh Esmaeili, Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, 3rd Floor, No #17, Farredanesh Alley, Ghods St, Enghelab Ave, Tehran, Iran. E-mail: marzieh.esmaeili@gmail.com

    Views:79 | Downloads:57 | Cited:0 | Comments:0 | :15
    Academic Editor: Stefano Omboni | Copy Editor: Peng-Juan Wen | Production Editor: Peng-Juan Wen

    © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

    Abstract

    Obtaining data is challenging for researchers, especially when it comes to medical data. Moreover, using medical data as there are concerns about privacy and confidentiality issues requires specific considerations. Generative models aim to learn data distribution via various statistical learning approaches. Among generative models, a machine learning-based approach named Generative Adversarial Networks (GANs) has proved their potential in the implicit density estimation of high dimensional data. Therefore, we suggest an approach that each healthcare organization, especially hospitals, could create and share their own GAN model, entitled Hospital-Based GANs (H-GANs), instead of sharing raw data of patients.

    MEDICAL DATA SHARING PROBLEM

    Obtaining data is challenging for researchers, especially when it comes to medical data. Using medical data as there are concerns about privacy and confidentiality issues requires specific considerations. Also, Sharing this data is necessary to verify the experiments and extract more knowledge from the data[1]. One of the potential solutions for data sharing while preserving privacy is the de-identification of data. The main concern in this approach is that the process could be reversed, and the real patients’ identities would be unveiled. Another solution for sharing data is to encourage the patient populations to share data by giving rewards to them or benefiting their communities[2]. While it can be a feasible solution for small health ecosystems, the scalability of this approach is questionable. Many stakeholders, including each one of the patients, could have a different viewpoint. Thus, reaching a consensus might be challenging. In this paper, we have proposed a new solution to overcome the medical sharing problem. The main idea behind our solution can be demonstrated by a simple example: assume that in a scenario, we want to share the heights of individuals without disclosure of their identities. In this case, we could share the distribution of the heights (in the case of normal distribution, sharing the mean and standard deviation). Having the parameters of this distribution enables others to reuse the data and create samples of the heights. The cornerstone of this approach is to identify the distribution of the data. It is worth mentioning that the estimation of the data distribution would be a very complicated task when it comes to high-dimensional data such as medical images. A well-studied branch of machine learning called generative models has emerged to address such a problem.

    GENERATIVE MODELS AS A SAFE WAY TO SHARE PRIVATE DATA

    The underlying assumption in most machine learning tasks is that data samples are drawn from a unique data-generating distribution[3]. Generative models aim to learn this distribution via various statistical learning approaches. Once we have the data generating distribution, we can generate new samples of data that are not necessarily the same as input data. Hence, the generative models can be viewed as a secure tool for sharing new data while preserving the patients’ privacy. Generative models fall into two categories: implicit density estimation and explicit density estimation[4]. Here, what we are interested in is generating new samples from the data distribution and not the parametric distribution. Among generative models, Generative Adversarial Networks (GANs) have proved their potential in the implicit density estimation of high dimensional data.

    STATE OF THE ARTS OF THE GENERATIVE MODELS: GAN NETWORKS

    Recently, Deep Learning has outperformed traditional methods in different areas, including computer vision, natural language processing, and image processing. Deep learning models are powerful in learning highly nonlinear mappings. GANs can be viewed as the marriage of deep learning and generative models. GANs are composed of two neural networks: a generator and a discriminator network[5]. The generator tries to fool the discriminator by generating realistic data that are close to the distribution of the data, and the discriminator tries to discriminate between these so-called fake data and the real data. In other words, the training process is a minimax game. Note that, after training the GAN to generate new samples, we only require the generator network, and the discriminator can be discarded. As a result, the generator creates samples that are from the same distribution of the data. They successfully have been implemented for generating samples by learning the data generating distribution from a limited amount of data[6]. Currently, GANs are widely used to generate new texts and images for different purposes. One important application of GANs is to enhance the performance of the classifiers that are trained by imbalanced datasets. An imbalanced dataset can severely affect the performance of the classifier, and these types of datasets are prevalent in medical applications. For example, in breast cancer datasets, the number of mammography images with malignancy is much less than benign ones. This makes the classifier biased towards the benign class[4]. To solve this problem, GANs can be used to make such datasets balanced. We can train a GAN to generate malignant images, then make new samples of the malignant cases.

    INTRODUCING HOSPITAL-BASED GANS

    We suggest an approach that each healthcare organization, especially hospitals, could create and share their own GAN - Hospital-Based GANs (H-GANs) instead of sharing raw data of patients. This solution provides a framework for sharing the hospital data without violating patients’ privacy by providing a generator of data instead of the patients’ data records. In summary, this solution provides three major advantages: first and foremost is preserving patients’ privacy. Second, it enables the researchers to create an unlimited amount of data to train complex models that require huge amounts of data, such as deep learning classifiers. Also, it mitigates the imbalanced dataset issue. Besides, it reduces the required storage and bandwidth for storing and transferring the data by sharing the models instead of the whole images. For example, a dataset consisting of 5000 mammography images requires around 100GB, while the GAN model created from this dataset is around 100MB. That means a 1:1000 compression ratio. At the next level, The H-GANs could theoretically be combined to create multi-hospital, national, regional, and even global GANs, and these models could include a comprehensive range of samples.

    DECLARATIONS

    Authors’ contributions

    Made substantial contributions to the conception and design of the study and performed data analysis, interpretation and data acquisition, as well as providing administrative, technical, and material support: Ayyoubzadeh SM (Seyed Mohammad Ayyoubzadeh), Ayyoubzadeh SM (Seyed Mehdi Ayyoubzadeh), Marzieh Esmaeili

    Availability of data and materials

    Not applicable.

    Financial support and sponsorship

    None.

    Conflicts of interest

    All authors declared that there are no conflicts of interest.

    Ethical approval and consent to participate

    Not applicable.

    Consent for publication

    Not applicable.

    Copyright

    © The Author(s) 2022.

    References

    Cite This Article

    Ayyoubzadeh SM, Ayyoubzadeh SM, Esmaeili M. Clinical data sharing using Generative Adversarial Networks. Conn Health 2022;1:98-100. http://dx.doi.org/10.20517/ch.2022.15

    Views
    79
    Downloads
    57
    Citations
     0
    Comments
    0

    15

    Comments

    Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at support@oaepublish.com.

    © 2016-2022 OAE Publishing Inc., except certain content provided by third parties