This paper will discuss the European funded iToBoS project, tasked by the European Commission to develop an AI diagnostic platform for the early detection of skin melanoma. The paper will outline the ...project, provide an overview of the data being processed, describe the impact assessment processes, and explain the AI privacy risk mitigation methods being deployed. Following this, the paper will offer a brief discussion of some of the more complex aspects: (1) the relatively low population clinical trial study cohort, which poses risks associated with data distinguishability and the masking ability of the applied anonymisation tools, (2) the project's ability to obtain informed consent from the study cohort given the complexity of the technologies, (3) the project's commitment to an open research data strategy and the additional privacy risk mitigations required to protect the multi-modal study data, and (4) the ability of the project to adequately explain the outputs of the algorithmic components to a broad range of stakeholders. The paper will discuss how the complexities have caused tension which are reflective of wider tensions in the health domain. A project level solution includes collaboration with a melanoma patient network, as an avenue for fair and representative qualification of risks and benefits with the patient stakeholder group. However, it is unclear how scalable this process is given the relentless pursuit of innovation within the health domain, accentuated by the continued proliferation of artificial intelligence, open data strategies, and the integration of multi-modal data sets inclusive of genomics.
AI privacy toolkit Goldsteen, Abigail; Saadi, Ola; Shmelkin, Ron ...
SoftwareX,
20/May , Volume:
22
Journal Article
Peer reviewed
Open access
The need to analyse personal data to drive business alongside the requirement to preserve the privacy of data subjects creates a known tension. Data protection regulations such as GDPR and CCPA ...define strict restrictions and obligations on the collection and processing of personal data. These are also relevant for machine learning models, which can be used to derive personal information about their training sets. The open-source ai-privacy-toolkit is designed to help organizations navigate this challenging area and build more trustworthy AI solutions, with tools that protect privacy and help ensure the compliance of AI models.
Artificial Intelligence (AI) has proven effective in classifying skin cancers using dermoscopy images. In experimental settings, algorithms have outperformed expert dermatologists in classifying ...melanoma and keratinocyte cancers. However, clinical application is limited when algorithms are presented with 'untrained' or out-of-distribution lesion categories, often misclassifying benign lesions as malignant, or misclassifying malignant lesions as benign. Another limitation often raised is the lack of clinical context (e.g., medical history) used as input for the AI decision process. The increasing use of Total Body Photography (TBP) in clinical examinations presents new opportunities for AI to perform holistic analysis of the whole patient, rather than a single lesion. Currently there is a lack of existing literature or standards for image annotation of TBP, or on preserving patient privacy during the machine learning process.
This protocol describes the methods for the acquisition of patient data, including TBP, medical history, and genetic risk factors, to create a comprehensive dataset for machine learning. 500 patients of various risk profiles will be recruited from two clinical sites (Australia and Spain), to undergo temporal total body imaging, complete surveys on sun behaviors and medical history, and provide a DNA sample. This patient-level metadata is applied to image datasets using DICOM labels. Anonymization and masking methods are applied to preserve patient privacy. A two-step annotation process is followed to label skin images for lesion detection and classification using deep learning models. Skin phenotype characteristics are extracted from images, including innate and facultative skin color, nevi distribution, and UV damage. Several algorithms will be developed relating to skin lesion detection, segmentation and classification, 3D mapping, change detection, and risk profiling. Simultaneously, explainable AI (XAI) methods will be incorporated to foster clinician and patient trust. Additionally, a publicly released dataset of anonymized annotated TBP images will be released for an international challenge to advance the development of new algorithms using this type of data.
The anticipated results from this protocol are validated AI-based tools to provide holistic risk assessment for individual lesions, and risk stratification of patients to assist clinicians in monitoring for skin cancer.
Large organizations often face difficult tradeoffs in balancing the need to share information with the need to safeguard sensitive data. A prominent way to deal with this tradeoff is on-the-fly ...screen masking of sensitive data in applications. A proposed hybrid approach for masking Web application screens combines the advantages of the context available at the presentation layer with the flexibility and low overhead of masking at the network layer. This solution can identify sensitive information in the visual context of the application screen and then automatically generate the masking rules to enforce at run time. This approach supports the creation of highly expressive masking rules, while keeping rule authoring easy and intuitive, resulting in an easy to use, effective system. This article is part of a special issue on Security and Privacy on the Web. The Web extra at https://youtu.be/4u2FLqjaIiI is a short demonstration of a proposed hybrid approach for masking Web application screens that combines the advantages of the context available at the presentation layer with the flexibility and low overhead of masking at the network layer. The second Web extra at https://youtu.be/-Hz3P_H0UnU is a full-length demonstration of a proposed hybrid approach for masking Web application screens that combines the advantages of the context available at the presentation layer with the flexibility and low overhead of masking at the network layer.
Artificial intelligence systems are prevalent in everyday life, with use cases in retail, manufacturing, health, and many other fields. With the rise in AI adoption, associated risks have been ...identified, including privacy risks to the people whose data was used to train models. Assessing the privacy risks of machine learning models is crucial to enabling knowledgeable decisions on whether to use, deploy, or share a model. A common approach to privacy risk assessment is to run one or more known attacks against the model and measure their success rate. We present a novel framework for running membership inference attacks against classification models. Our framework takes advantage of the ensemble method, generating many specialized attack models for different subsets of the data. We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label, both on classical and language classification tasks.
Analyzing time-series data that may contain personal information, particularly in the medical field, presents serious privacy concerns. Sensitive health data from patients is often used to train ...machine-learning models for diagnostics and ongoing care. Assessing the privacy risk of such models is crucial to making knowledgeable decisions on whether to use a model in production, share it with third parties, or deploy it in patients homes. Membership Inference Attacks (MIA) are a key method for this kind of evaluation, however time-series prediction models have not been thoroughly studied in this context. We explore existing MIA techniques on time-series models, and introduce new features, focusing on the seasonality and trend components of the data. Seasonality is estimated using a multivariate Fourier transform, and a low-degree polynomial is used to approximate trends. We applied these techniques to various types of time-series models, using datasets from the health domain. Our results demonstrate that these new features enhance the effectiveness of MIAs in identifying membership, improving the understanding of privacy risks in medical data applications.
We present a first-of-a-kind end-to-end framework for running privacy risk assessments of AI models that enables assessing models from multiple ML frameworks, using a variety of low-level privacy ...attacks and metrics. The tool automatically selects which attacks and metrics to run based on answers to questions, runs the attacks, summarizes and visualizes the results in an easy-to-consume manner.
Retrieval Augmented Generation (RAG) systems have shown great promise in natural language processing. However, their reliance on data stored in a retrieval database, which may contain proprietary or ...sensitive information, introduces new privacy concerns. Specifically, an attacker may be able to infer whether a certain text passage appears in the retrieval database by observing the outputs of the RAG system, an attack known as a Membership Inference Attack (MIA). Despite the significance of this threat, MIAs against RAG systems have yet remained under-explored. This study addresses this gap by introducing an efficient and easy-to-use method for conducting MIA against RAG systems. We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models, showing that the membership of a document in the retrieval database can be efficiently determined through the creation of an appropriate prompt in both black-box and gray-box settings. Moreover, we introduce an initial defense strategy based on adding instructions to the RAG template, which shows high effectiveness for some datasets and models. Our findings highlight the importance of implementing security countermeasures in deployed RAG systems and developing more advanced defenses to protect the privacy and security of retrieval databases.