Satellite systems have attracted great attention from academic and industrial communities in recent years. There are many satellites that have been launched for weather forecast, environment ...monitoring, and target surveillance. One important task of the satellites is to download the data they have collected in space to the ground servers via earth stations (ESs). Since satellites move at high speed along their own orbits and have very limited contact time with ESs, satellites may not be able to download all the data they have to the ground on time. In this paper, we propose a collaborative scheme that allows satellites to offload data among themselves using inter-satellite links (ISLs) before they come into contact with the ES, such that satellites will carry the right amount of data according to the length of their contact time with the ES and the throughput of data downloading at the ES is maximized. We develop an iterative optimization algorithm that jointly schedules data offload among the satellites and data downloading from satellites to the ES. Extensive simulations have been conducted to evaluate the effectiveness of our proposed method. The simulation results show that the data downloading throughput by using ISL data offload can be increased significantly. In many cases, the throughput reaches close to 100% of the capacity of the ES.
Today, the prominence of data science within organizations has given rise to teams of data science workers collaborating on extracting insights from data, as opposed to individual data scientists ...working alone. However, we still lack a deep understanding of how data science workers collaborate in practice. In this work, we conducted an online survey with 183 participants who work in various aspects of data science. We focused on their reported interactions with each other (e.g., managers with engineers) and with different tools (e.g., Jupyter Notebook). We found that data science teams are extremely collaborative and work with a variety of stakeholders and tools during the six common steps of a data science workflow (e.g., clean data and train model). We also found that the collaborative practices workers employ, such as documentation, vary according to the kinds of tools they use. Based on these findings, we discuss design implications for supporting data science team collaborations and future research directions.
Here, we present Party Facts (www.partyfacts.org), a modern online database of political parties worldwide. With this project, we provide a comprehensive database of political parties across time and ...world regions, link party information from some of the core social science data sets, and offer a platform to link political parties across data sets. An initial list of 4000 core parties in 212 countries is mainly based on four major data sets. The core parties in Party Facts are linked with party information from some of the key social science data sets, currently 26. From these data sets, we have included and linked about 15,000 party observations. Party Facts is an important step in developing a more coherent operationalization of political parties across time and space and a gateway to existing data sets on political parties. It allows answering innovative party research questions that require the combination of multiple data sets.
The exponential growth in data generated by satellites, radars, sensors, and analysis and reanalysis from model outputs for the hydrological domain requires efficient real-time data management and ...distribution mechanisms. This paper introduces HydroRTC, a web-based data transfer and communication library designed to accelerate large-scale data sharing and analysis. Leveraging next-generation web technologies like WebSockets, WebRTC and Node.js, the library enables seamless peer-to-peer sharing, smart data transmission, and large dataset streaming. Three primary scenarios are presented as use cases, demonstrating the potential of HydroRTC as server-to-peer with intelligent data scheduling and large data streaming, peer-to-peer data sharing, and peer-to-server for data exchange. HydroRTC offers a promising solution for collaborative infrastructures in the hydrological and environmental domain, allowing real-time and high-throughput data sharing and transfer for enhancing research efficiency and collaboration capabilities.
•HydroRTC accelerates large-scale data sharing with next-gen web technologies.•Three primary scenarios: server-to-peer, peer-to-peer, peer-to-server data exchange.•Promising solution for collaborative infrastructures in hydrological and environmental domains.•Exponential growth in data necessitates efficient real-time management mechanisms.•Leverages WebSockets, WebRTC, Node.js for seamless peer-to-peer sharing.
Mobile edge computing (MEC) can use wireless access network (RAN) to provide the services required by user's information technology (IT) and cloud computing functions nearby, which can create a ...high-performance and low latency service environment. Performing task offloading and data caching at access points (APs) in a cooperative manner can reduce the heavy backhaul load and the retransmission of content downloading. However, in edge networks (ENs), how to maximize storage utilization while reducing service latency and energy consumption is still a key issue, because the heterogeneity of ENs and the uneven distribution of users make it difficult to determine which MEC server and what data should be cached. In this paper, we study a two-tier MEC system, which enables data caching and computing offloading policy to minimize the network cost at the user equipment (UE) side, while satisfying the constraints of task offloading deadline, the cache capacity at APs and the computing capability of MEC servers. The optimization problem is formulated as a mixed integer nonlinear program (MINLP) problem. In order to solve the problem, we transform it into an equivalent task offloading convex optimization problem by fixing an optimization variable. Furthermore, we solve a cache placement problem by dynamic programming (DP) algorithm. Then we propose a distributed collaborative data caching and computing offloading (CDCCO) iterative algorithm. Simulation results demonstrate that our proposed CDCCO algorithm can significantly reduce the network cost and achieve better performance than other existing schemes.
Nowadays, cloud computing has developed well and been applied in many kinds of areas. However, privacy is still the most challenging problem which obstructs it being applied in some privacy-sensitive ...fields, such as finance and government. Advanced cryptographic algorithms provide data privacy with encryption, which can also support computation on such encrypted data. However, new challenge arises when such ciphertexts come from different parties. In particular, how to execute collaboratively data mining on encrypted data coming from different parties is a key issue from cloud service point of view. This paper focuses on privacy problem on outsourced k-means clustering scheme for two parties. In particular, each party’s data are encrypted only once and then stored in cloud. The proposed privacy-preserving k-means collaborative clustering protocol is executed mainly at the cloud, with O(k(m+n)) rounds of interactions among the two parties and the cloud, where m and n represent the total numbers of records for the two parties, respectively. It is shown that the protocol is secure in the semi-honest security model and in the malicious model in which only one party is corrupted during the process of centroids re-computation. Both theoretical and experimental analysis of the proposed scheme are also provided.
In recent years, the development of technologies for causal inference with privacy preservation of distributed data has gained considerable attention. Many existing methods for distributed data focus ...on resolving the lack of subjects (samples) and can only reduce random errors in estimating treatment effects. In this study, we propose a data collaboration quasi-experiment (DC-QE) that resolves the lack of both subjects and covariates, reducing random errors and biases in the estimation. Our method involves constructing dimensionality-reduced intermediate representations from private data from local parties, sharing intermediate representations instead of private data for privacy preservation, estimating propensity scores from the shared intermediate representations, and finally, estimating the treatment effects from propensity scores. Through numerical experiments on both artificial and real-world data, we confirm that our method leads to better estimation results than individual analyses. While dimensionality reduction loses some information in the private data and causes performance degradation, we observe that sharing intermediate representations with many parties to resolve the lack of subjects and covariates sufficiently improves performance to overcome the degradation caused by dimensionality reduction. Although external validity is not necessarily guaranteed, our results suggest that DC-QE is a promising method. With the widespread use of our method, intermediate representations can be published as open data to help researchers find causalities and accumulate a knowledge base.
•A privacy-preserving statistical causal inference method on distributed data.•Our method can reduce both random errors and biases in treatment-effect estimation.•Privacy of data is preserved by sharing only the intermediate representations.•Numerical experiments showed good estimation results in artificial and real data.•Intermediate representations can be accumulated as a knowledge base.
Patient and Public Involvement (PPI) in mental health research is increasing, especially in early (pre-funding) stages. PPI is less consistent in later stages, including in analysing qualitative ...data. The aims of this study were to develop a methodology for involving PPI co-researchers in collaboratively analysing qualitative mental health research data with academic researchers, to pilot and refine this methodology, and to create a best practice framework for collaborative data analysis (CDA) of qualitative mental health research.
In the context of the RECOLLECT Study of Recovery Colleges, a critical literature review of collaborative data analysis studies was conducted, to identify approaches and recommendations for successful CDA. A CDA methodology was developed and then piloted in RECOLLECT, followed by refinement and development of a best practice framework.
From 10 included publications, four CDA approaches were identified: (1) consultation, (2) development, (3) application and (4) development and application of coding framework. Four characteristics of successful CDA were found: CDA process is co-produced; CDA process is realistic regarding time and resources; demands of the CDA process are manageable for PPI co-researchers; and group expectations and dynamics are effectively managed. A four-meeting CDA process was piloted to co-produce a coding framework based on qualitative data collected in RECOLLECT and to create a mental health service user-defined change model relevant to Recovery Colleges. Formal and informal feedback demonstrated active involvement. The CDA process involved an extra 80 person-days of time (40 from PPI co-researchers, 40 from academic researchers). The process was refined into a best practice framework comprising Preparation, CDA and Application phases.
This study has developed a typology of approaches to collaborative analysis of qualitative data in mental health research, identified from available evidence the characteristics of successful involvement, and developed, piloted and refined the first best practice framework for collaborative analysis of qualitative data. This framework has the potential to support meaningful PPI in data analysis in the context of qualitative mental health research studies, a previously neglected yet central part of the research cycle.
•An interpretable distributed data analysis with sharing intermediate representations.•A practical supplement to the federated learning systems.•The obtained interpretable model is based on the whole ...features of distributed data.•Each party can individually select an interpretable model according to its own needs.•The proposed method achieves good recognitions for artificial and real-world data.
This paper proposes an interpretable non-model sharing collaborative data analysis method as a federated learning system, which is an emerging technology for analyzing distributed data. Analyzing distributed data is essential in many applications, such as medicine, finance, and manufacturing, due to privacy and confidentiality concerns. In addition, interpretability of the obtained model plays an important role in the practical applications of federated learning systems. By centralizing intermediate representations, which are individually constructed by each party, the proposed method obtains an interpretable model, achieving collaborative analysis without revealing the individual data and learning models distributed between local parties. Numerical experiments indicate that the proposed method achieves better recognition performance than individual analysis and comparable performance to centralized analysis for both artificial and real-world problems.
With the rapid development of artificial intelligence technology, unmanned surface vehicles (USVs) in marine Internet of Things (MIoTs) have become an important paradigm for marine environment ...exploration. However, in MIoTs, when collecting environmental information, USVs face a series of threats such as engine failure, grounding and collision, etc., resulting in damage to shipboard memory, vessel breakage and sinking, which may cause loss or damage of stored data. The USV fleet consisting of multiple USVs is recently advocated to enable collaborative communication and storage resource sharing. As such, in this paper, the USV fleet-assisted data backup scheme for the damaged USVs is proposed to guarantee the availability of stored data. First, a data backup framework for USV fleets is designed, where the USVs are classified into high-risk USVs and low-risk USVs according to the damage risk probability of sailing. Within the USV fleet, high-risk USVs (i.e., requesters) back up data to low-risk USVs (i.e., assistants) under emergency time. Second, the coalition game based on cost sharing is utilized to incentivize individual USVs to form the optimal USV fleets by maximizing the expected revenues, where the cost sharing fashion effectively ensures the stability of the coalitions. Finally, the joint optimization problem of the requesters' allocating data decisions and the assistants' receiving data decisions is formulated to maximize the average amount of data backup. The predictor-corrector interior point method (PIPM) and Q-learning method are leveraged to derive the reasonable solution of the formulated problem, with achieving the optimal allocating data decision and receiving data decision. Extensive simulation results demonstrate that the proposed scheme outperforms the benchmark schemes in terms of individual expected revenue, participation degree and the average amount of data backup.