Cloud computing allows to utilize servers in efficient and scalable ways through exploitation of virtualization technology. In the Infrastructure-as-a-Server (IaaS) Cloud model, many virtualized ...servers (instances) can be created on a single physical machine. There are many such Cloud providers that are now in widespread use offering such capabilities. However, Cloud computing has overheads and can constrain the scalability and flexibility, especially when diverse users with different needs wish to use the Cloud resources. To accommodate such communities, an alternative to Cloud computing and virtualization of whole servers that is gaining widespread adoption is micro-hosting services and container-based solutions. Container-based technologies such as Docker allow hosting of micro-services on Cloud infrastructures. These enable bundling of applications and data in a manner that allows their easy deployment and subsequent utilization. Docker is just one of the many such solutions that have been put forward. The purpose of this paper is to compare and contrast a range of existing container-based technologies for the Cloud and evaluate their pros and cons and overall performances. The OpenStack-based Australia-wide National eResearch Collaboration Tools and Resources (NeCTAR) Research Cloud (www.nectar.org.au) was used for this purpose. We describe the design of the experiments and benchmarks that were chosen and relate these to literature review findings.
•The key features of the micro-service hosting technologies for the Cloud were identified.•We perform test cases to evaluate virtualization performance of these technologies.•There were roughly no overheads on memory utilization or CPU by the examined technologies.•I/O and operating system interactions incurred some overheads.
Computational bioinformatics workflows are extensively used to analyse genomics data, with different approaches available to support implementation and execution of these workflows. Reproducibility ...is one of the core principles for any scientific workflow and remains a challenge, which is not fully addressed. This is due to incomplete understanding of reproducibility requirements and assumptions of workflow definition approaches. Provenance information should be tracked and used to capture all these requirements supporting reusability of existing workflows.
We have implemented a complex but widely deployed bioinformatics workflow using three representative approaches to workflow definition and execution. Through implementation, we identified assumptions implicit in these approaches that ultimately produce insufficient documentation of workflow requirements resulting in failed execution of the workflow. This study proposes a set of recommendations that aims to mitigate these assumptions and guides the scientific community to accomplish reproducible science, hence addressing reproducibility crisis.
Reproducing, adapting or even repeating a bioinformatics workflow in any environment requires substantial technical knowledge of the workflow execution environment, resolving analysis assumptions and rigorous compliance with reproducibility requirements. Towards these goals, we propose conclusive recommendations that along with an explicit declaration of workflow specification would result in enhanced reproducibility of computational genomic analyses.
The idea of big data has gained extensive attention from governments and academia all over the world. It is especially relevant for the establishment of a smart city environment combining complex ...heterogeneous data with data analytics and artificial intelligence (AI) technology. Big data is generated from many facilities and sensor networks in smart cities and often streamed and stored in the cloud storage platform. Ensuring the integrity and subsequent auditability of such big data is essential for the performance of AI-driven data analysis. Recent years has witnessed the emergence of many big data auditing schemes that are often characterized by third party auditors (TPAs). However, the TPA is a centralized entity, which is vulnerable to many security threats from both inside and outside the cloud. To avoid this centralized dependency, we propose a decentralized big data auditing scheme for smart city environments featuring blockchain capabilities supporting improved reliability and stability without the need for a centralized TPA in auditing schemes. To support this, we have designed an optimized blockchain instantiation and conducted a comprehensive comparison between the existing schemes and the proposed scheme through both theoretical analysis and experimental evaluation. The comparison shows that lower communication and computation costs are incurred with our scheme than with existing schemes.
Abstract
Context:
Pheochromocytomas and paragangliomas (PPGLs) in children are often hereditary and may present with different characteristics compared with adults. Hereditary PPGLs can be separated ...into cluster 1 and cluster 2 tumors due to mutations impacting hypoxia and kinase receptor signaling pathways, respectively.
Objective:
To identify differences in presentation of PPGLs between children and adults.
Design:
A retrospective cross-sectional clinical study.
Setting:
Seven tertiary medical centers.
Patients:
The study included 748 patients with PPGLs, including 95 with a first presentation during childhood. Genetic testing was available in 611 patients. Other data included locations of primary tumors, presence of recurrent or metastatic disease, and plasma concentrations of metanephrines and 3-methoxytyramine.
Results:
Children showed higher (P < 0.0001) prevalence than adults of hereditary (80.4% vs 52.6%), extra-adrenal (66.3% vs 35.1%), multifocal (32.6% vs 13.5%), metastatic (49.5% vs 29.1%), and recurrent (29.5% vs 14.2%) PPGLs. Tumors due to cluster 1 mutations were more prevalent among children than adults (76.1% vs 39.3%; P < 0.0001), and this paralleled a higher prevalence of noradrenergic tumors, characterized by relative lack of increased plasma metanephrine, in children than in adults (93.2% vs 57.3%; P < 0.0001).
Conclusions:
The higher prevalence of hereditary, extra-adrenal, multifocal, and metastatic PPGLs in children than adults represents interrelated features that, in part, reflect the lower age of disease presentation of noradrenergic cluster 1 than adrenergic cluster 2 tumors. The differences in disease presentation are important to consider in children at risk for PPGLs due to a known mutation or previous history of tumor.
This study establishes the link between extraadrenal, multifocal, metastatic, reccurent, hereditary PPGLs to a higher prevalence of noradrenergic and cluster 1 tumors in pediatric than adults.
Cloud computing has emerged as a mainstream paradigm for hosting various types of applications by supporting easy-to-use computing services. Among the many different forms of cloud computing, hybrid ...clouds, which mix on-premises private cloud and third-party public cloud services to deploy applications, have gained broad acceptance. They are particularly relevant for applications requiring large volumes of computing power exceeding the computational capacity within the premises of a single organization. However, the use of hybrid clouds introduces the challenge of how much and when public cloud resources should be added to the pool of resources – and especially when it is necessary to support quality of service requirements of applications with deadline constraints. These resource provisioning decisions are far from trivial if scheduling involves data-intensive applications using voluminous amounts of data. Issues such as the impact of network latency, bandwidth constraints, and location of data must be taken into account in order to minimize the execution cost while meeting the deadline for such applications. In this paper, we propose a new resource provisioning algorithm to support the deadline requirements of data-intensive applications in hybrid cloud environments. To evaluate our proposed algorithm, we implement it in Aneka, a platform for developing scalable applications on the Cloud. Experimental results using a real case study executing a data-intensive application to measure the walkability index on a hybrid cloud platform consisting of dynamic resources from the Microsoft Azure cloud show that our proposed provisioning algorithm is able to more efficiently allocate resources compared to existing methods.
•A new data-aware provisioning algorithm is proposed to meet user-defined deadline requirements for data-intensive applications. The proposed algorithm takes into account available bandwidth and data transfer time.•The proposed provisioning algorithm is integrated into the Aneka platform. Aneka is extended to support the Microsoft Azure Resource Manager (ARM) deployment service model.•In an actual hybrid cloud environment, we evaluate the proposed algorithm’s ability in meeting deadlines for a case study data-intensive application in smart cities context.
In recent years, one of the most popular techniques in the computer vision community has been the deep learning technique. As a data-driven technique, deep model requires enormous amounts of ...accurately labelled training data, which is often inaccessible in many real-world applications. A data-space solution is Data Augmentation (DA), that can artificially generate new images out of original samples. Image augmentation strategies can vary by dataset, as different data types might require different augmentations to facilitate model training. However, the design of DA policies has been largely decided by the human experts with domain knowledge, which is considered to be highly subjective and error-prone. To mitigate such problem, a novel direction is to automatically learn the image augmentation policies from the given dataset using Automated Data Augmentation (AutoDA) techniques. The goal of AutoDA models is to find the optimal DA policies that can maximize the model performance gains. This survey discusses the underlying reasons of the emergence of AutoDA technology from the perspective of image classification. We identify three key components of a standard AutoDA model: a search space, a search algorithm and an evaluation function. Based on their architecture, we provide a systematic taxonomy of existing image AutoDA approaches. This paper presents the major works in AutoDA field, discussing their pros and cons, and proposing several potential directions for future improvements.
As replications of individual studies are resource intensive, techniques for predicting the replicability are required. We introduce the repliCATS (Collaborative Assessments for Trustworthy Science) ...process, a new method for eliciting expert predictions about the replicability of research. This process is a structured expert elicitation approach based on a modified Delphi technique applied to the evaluation of research claims in social and behavioural sciences. The utility of processes to predict replicability is their capacity to test scientific claims without the costs of full replication. Experimental data supports the validity of this process, with a validation study producing a classification accuracy of 84% and an Area Under the Curve of 0.94, meeting or exceeding the accuracy of other techniques used to predict replicability. The repliCATS process provides other benefits. It is highly scalable, able to be deployed for both rapid assessment of small numbers of claims, and assessment of high volumes of claims over an extended period through an online elicitation platform, having been used to assess 3000 research claims over an 18 month period. It is available to be implemented in a range of ways and we describe one such implementation. An important advantage of the repliCATS process is that it collects qualitative data that has the potential to provide insight in understanding the limits of generalizability of scientific claims. The primary limitation of the repliCATS process is its reliance on human-derived predictions with consequent costs in terms of participant fatigue although careful design can minimise these costs. The repliCATS process has potential applications in alternative peer review and in the allocation of effort for replication studies.
Research investigating problem drinking often relies on retrospective measures to assess alcohol consumption behaviour. Limitations associated with such instruments can, however, distort actual ...consumption levels and patterns. We developed the smartphone application (app), CNLab-A, to assess alcohol intake behaviour in real-time.
Healthy individuals (N=671, M age 23.12) completed demographic questions plus the Alcohol Use Questionnaire and a 21-day Timeline Followback before using CNLab-A for 21days. The app asked participants to record alcohol consumption details in real time. We compared data reported via retrospective measures with that captured using CNLab-A.
On average, participants submitted data on 20.27days using CNLab-A. Compared to Timeline Followback, a significantly greater percentage of drinking days (24.79% vs. 26.44%) and significantly higher total intake (20.30 vs. 24.26 standard drinks) was recorded via the app. CNLab-A captured a substantially greater number of high intake occasions at all levels from 8 or more drinks than Timeline Followback. Additionally, relative to the Alcohol Use Questionnaire, a significantly faster rate of consumption was recorded via the app.
CNLab-A provided more nuanced information regarding quantity and pattern of alcohol intake than the retrospective measures. In particular, it revealed higher levels of drinking than retrospective reporting. This will have implications for how particular at-risk alcohol consumption patterns are identified in future and might enable a more sophisticated exploration of the causes and consequences of drinking behaviour.
•Real-time assessment of drinking behaviour using a smartphone app was explored.•Participants logged more drinking days via the app compared to Timeline Followback.•Total intake was higher when recorded using the app relative to Timeline Followback.•The app captured a greater number of high intake episodes than Timeline Followback.•The app showed faster rate of consumption than the Alcohol Use Questionnaire.
•Proposing a semantic privacy-preserving framework for secure record linkage.•Proposing access control policy formalisation by using semantic web technologies.•Detecting privacy leakage through the ...leverage of semantic reasoning.•Refining authorisation by enforcing privacy requirements via obligations.
The combination of digitized health information and web-based technologies offers many possibilities for data analysis and business intelligence. In the healthcare and biomedical research domain, applications depending on electronic health records (EHRs) identify privacy preservation as a major concern. Existing solutions cannot always satisfy the evolving research demands such as linking patient records across organizational boundaries due to the potential for patient re-identification. In this work, we show how semantic methods can be applied to support the formulation and enforcement of access control policy whilst ensuring that privacy leakage can be detected and prevented. The work is illustrated through a case study associated with the Australasian Diabetes Data Network (ADDN – www.addn.org.au), the national paediatric type-1 diabetes data registry, and the Australian Urban Research Infrastructure Network (AURIN – www.aurin.org.au) platform that supports Australia-wide access to urban and built environment data sets. We demonstrate that through extending the eXtensible Access Control Markup Language (XACML) with semantic capabilities, finer-grained access control encompassing data risk disclosure mechanisms can be supported. We discuss the contributions that can be made using this approach to socio-economic development and political management within business systems, and especially those situations where secure data access and data linkage is required.