Defense Mechanisms for Weights Adjustment Attack

5.7 Discussion

5.7.3 Defense Mechanisms for Weights Adjustment Attack

(1) Existing Defense Methods

There are a few mitigation schemes we can explore for the weights adjustment attack. For PLMs contributed by reputable sources e.g., Google, one can use digital signature created by contributors which can be verified by PLM users. However, the efficiency of signature generation and verification needs to be explored for there are RNN models which comprise tens of millions of parameters. In addition, for PLMs contributed by untrusted sources, as suggested in [170], users can perform outlier detection using the training set. If a feature

extractor PLM generates bigger variations in feature vectors belonging to a group of similar inputs, then this PLM warrants further investigation.

(2) An End-to-End Training Strategy

Table 5.12: End-To-End Defense Method Against Weights Adjustment Attack for T-LSTM Dataset # of Fea- tures # of Classes Accuracy (Without Attack) Accuracy (Attack on Important Features with Θ = 70) Accuracy (Attack on Important Features with Θ = 80) SD 36 2 78.1% 73.7% 74.5% PD 26 2 77.5% 72.5% 75% PPMI 306 2 60.9% 56.3% 57.8% PPMI 306 6 43.8% 39.1% 40.6%

Since the T-LSTM model trains the representations and the classifier separately, so a possible defense solution that can construct a more robust model is to train the classifica- tion task end to end by including an extra component to the loss function which measures the distance between a victim class and a target class. Our preliminary result in Table 5.12 suggests that such a strategy makes it hard for an adversary to produce a malicious PLM by merely adjusting the weights of important features, which can prevent the weights adjustment attack from succeeding. However, this method is not general enough to defense against the weights adjustment attack for all RNN-based models.

5.8 Summary

Identifying the security and privacy risks of machine learning models is an active research area. In this chapter, we have presented two potential attacks: (i) adversarial samples attack, (ii) a new weight adjustment (PLMs based) attack approach, which can force a RNN-based model to make wrong predictions. We also design low-cost detection and defense mechanisms to prevent such adversarial attacks. Finally, we conduct extensive experiments using both synthetic and real-world datasets to validate the feasibility and practicality of our proposed schemes.

Chapter 6 Conclusions and Discussions

In this chapter, we summarize our research findings and discuss future research directions.

6.1 Summary

Nowadays, cloud has become a popular platform for data storage and processing. With the availability of cloud resources, many organizations have outsourced their data into the cloud. Healthcare companies have followed the same trends. For example, Personal Health Records (PHR) services allow patients to create, manage, and control their data in a centralized place through the web, which has made the storage, retrieval, and sharing of the medical information more efficient. Furthermore, affordable wearables and powerful smartphones with embedded sensors have allowed users’ health status to be monitored and useful sensor data to be uploaded to the cloud easily. Thus, with this exponential growth of the stored large scale data and the growing need for personalized care, researchers are keen on developing data mining methodologies to identify critical factors which affect the prediction results and use such information to aid the healthcare professionals in making better treatment decisions.

While remarkable progresses have been made in the healthcare domain, many challenges and open questions remain. The first obstacle is that in order to prevent information leakage, sensitive medical data needs to be encrypted before outsourcing to the cloud, which make the effective data utilization becomes a big challenge. The second challenge is that unlike

other data sources, medical data is highly ambiguous and noise, which make it difficult to generate predictive clinical models for real-world applications. The third obstacle is that it is hard to gather a large collection of high quality clinical data since institutions or hospitals may not be interested in sharing their useful data and healthcare related information also various from different data sources. Last but not least, most machine learning models only provide predictions without explanations, which prevent medical personnel and patients from adopting such healthcare learning systems. Finally, despite the efficiency of machine learning systems and their outstanding prediction performance, it is still a risk to reuse pre- trained models since most machine learning models that are contributed and maintained by third parties lack proper checking to ensure that they are robust to various adversarial attacks.

In this thesis, we address those challenges by designing an accurate and secure personalized cloud-assisted healthcare system, which allows patients to conduct searches for disease diagnosis based on their own personalized profiles. In terms of methodologies, the summaries of this thesis can be described as follows:

• In Chapter2, we have proposed a Privacy-Preserving Disease Treatment, Complication Prediction Scheme (PDTCPS), which allows users to conduct privacy-aware searches for health related questions based on their individual profiles and lab tests results. Our design also allows healthcare providers and the public cloud to collectively generate aggregated training models to diagnose diseases, predict complications and offer possible treatment options. In addition, to enrich search functionality and protect the clients’ privacy, we also design an encrypte index tree, which can support fuzzy keyword search and query unlinkability. Moreover, PDTCPS also hides access patterns and hence addresses the security threat via exposing access patterns in existing searchable encryption schemes.

• In Chapter3, we have proposed useful learning models for Amyotrophic Lateral Scle- rosis (ALS), Right Heart Catheterization (RHC) and Depression Disorder Relapse (STAR*D) predictions, which can be used to aid efficient clinical care. We also design an incentive mechanism (IHESS) to encourage participants to share their more

truthful and high quality medical data so that aggregated training models can yield high accuracy.

• In Chapter4, we first design a medical knowledge extraction framework to collect useful data from multiple sources to produce an aggregated dataset, which can be used to generate comprehensive medical diagnosis models. Then, we propose a deep learning based medical diagnosis system (DL-MDS), which allows authorized users to conduct searches for medical diagnosis based on their personalized queries.

• In Chapter5, we have presented two potential attacks: (i) an adversarial samples attack, (ii) a new weights adjustment attack approach, which can force a RNN-based learning model to make wrong predictions. We also propose low-cost detection and defense mechanisms to defend against such adversarial attacks.

For all the schemes we have designed above, we conduct extensive experiments using both synthetic and real-world datasets to validate the feasibility and practicality of our proposed methods.

In document Effective and Secure Healthcare Machine Learning System with Explanations Based on High Quality Crowdsourcing Data (Page 151-155)