The growth of Machine Learning (ML) algorithms offers the predictive power that is useful in many applications. Deep Learning (DL) is a subset of ML algorithms that are based on neural networks. There are two phases for training a deep learning algorithm: first, we train an algorithm using a training dataset (training phase). After the model is trained, we can feed new data to the algorithm and it can calculate the predictions (inference phase). However, the training or inference data can be sensitive, especially in healthcare. Solving the data privacy problem can open up many opportunities for data sharing and collaborative training of algorithms.
In this blog post, we will learn how to preserve the privacy of the training and inference data by using homomorphic encryption and a distributed training technique called “split learning”.
Homomorphic Encryption (HE) is a public-secret key cryptographic algorithm that allow us to do computations on encrypted data without decrypting it. HE addition and multiplication are demonstrated in Figure 1.
In the client-server setting, a client can encrypt her data with the public key and sends the encrypted data to the server. The server can do computations on the encrypted data, then returns the results in encrypted form to the client. The client can afterwards decrypt and get the transformed results with her secret key. Using HE, the client can outsource the computations (including training or inferencing with neural networks) to the server while keeping her data secret.
Even though we can do addition and multiplication on encrypted data, it is not possible to perform non-linear functions using HE. However, deep learning models make use extensively of the non-linear functions in their architectures. This makes it difficult to train or do inference on homomorphically encrypted data.
One way to solve this problem is to approximate the non-linear functions with polynomials. This is essentially the core idea of our paper “Blind Faith: Privacy-Preserving Machine Learning using Function Approximation” , where the authors propose a protocol to do inference on HE encrypted data for an image classification task.
The basic idea of split learning is to cut the neural network into multiple parts, where the client and server jointly train the neural network. This idea is nicely demonstrated in Figure 2, where the client trains the first part of the network and then sends the output of the cut off layer (the activation maps) to the server to continue training. The server then trains his part of the network, then send his output to the client to compare it to the ground-truth output data.
We can see that using split learning, the client never needs to expose her data to the server. However, it has been shown that the activation maps still reveal sensitive information about the raw input data of the client. More specifically, the authors of the paper “Can We Use Split Learning on 1D CNN Models for Privacy Preserving Training?”  have shown that the activation maps can leak information about the heartbeat data of the client. To mitigate this privacy leakage, we can use HE to encrypt the activation maps before sending it to the server. While enhancing the privacy of client’s data, encrypting the activation maps with HE causes new problems and constraints on the whole model architecture, which is the active focus of our research.
In this blog post, we briefly walked through two technologies that have the potential to enhance the privacy of data in deep learning for healthcare, namely homomorphic encryption and split learning. Combining these two techniques offers a greater improvement in privacy for deep learning models but also gives rise to new problems. At ASCLEPIOS, we are actively studying these problems with the goal to effectively construct privacy-preserving neural network training and inferencing algorithms, especially for healthcare data.
 Tanveer Khan, Alexandros Bakas, Antonis Michalas, Blind Faith: Privacy-Preserving Machine Learning using Function Approximation, 26th IEEE Symposium on Computers and Communications (ISCC 2021).
 Sharif Abuadbba et al., Can We Use Split Learning on 1D CNN Models for Privacy Preserving Training?, ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2020).