Coronavirus Disease 2019 (COVID-19) has affected day to day life and slowed down the global economy. Most countries are enforcing strict quarantine to control the havoc of this highly contagious ...disease. Since the outbreak of COVID-19, many data analyses have been done to provide close support to decision-makers. We propose a method comprising data analytics and machine learning classification for evaluating the effectiveness of lockdown regulations. Lockdown regulations should be reviewed on a regular basis by governments, to enable reasonable control over the outbreak. The model aims to measure the efficiency of lockdown procedures for various countries. The model shows a direct correlation between lockdown procedures and the infection rate. Lockdown efficiency is measured by finding a correlation coefficient between lockdown attributes and the infection rate. The lockdown attributes include retail and recreation, grocery and pharmacy, parks, transit stations, workplaces, residential, and schools. Our results show that combining all the independent attributes in our study resulted in a higher correlation (0.68) to the dependent value Interquartile 3 (Q3). Mean Absolute Error (MAE) was found to be the least value when combining all attributes.
One of the biggest concerns of big data and analytics is privacy. We believe the forthcoming frameworks and theories will establish several solutions for the privacy protection. One of the known ...solutions is the k-anonymity that was introduced for traditional data. Recently, two major frameworks leveraged big data processing and applications; these are MapReduce and Spark. Spark data processing has been attracting more attention due to its crucial impacts on a wide range of big data applications. One of the predominant big data applications is data analytics and anonymization. We previously proposed an anonymization method for implementing k-anonymity in MapReduce processing framework. In this paper, we investigate Spark performance in processing data anonymization. Spark is a fast processing framework that was implemented in several applications such as: SQL, multimedia, and data stream. Our focus is the SQL Spark, which is adequate for big data anonymization. Since Spark operates in-memory, we need to observe its limitations, speed, and fault tolerance on data size increase, and to compare MapReduce to Spark in processing anonymity. Spark introduces an abstraction called resilient distributed datasets, which reads and serializes a collection of objects partitioned across a set of machines. Developers claim that Spark can outperform MapReduce by 10 times in iterative machine learning jobs. Our experiments in this paper compare between MapReduce and Spark. The overall results show a better performance for Spark’s processing time in anonymity operations. However, in some limited cases, we prefer to implement the old MapReduce framework, when the cluster resources are limited and the network is non-congested.
Big data is predominantly associated with data retrieval, storage, and analytics. Data analytics is prone to privacy violations and data disclosures, which can be partly attributed to the multi-user ...characteristics of big data environments. Adversaries may link data to external resources, try to access confidential data, or deduce private information from the large number of data pieces that they can obtain. Data anonymization can address some of these concerns by providing tools to mask and can help with concealing the vulnerable data. Currently available anonymization methods, however, are not capable of accommodating the big data scalability, granularity, and performance in efficient manners. In this paper, we introduce a novel framework that implements SQL-like Hadoop ecosystems, incorporating Pig Latin with the additional splitting of data. The splitting reduces data masking and increases the information gained from the anonymized data. Our solution provides a fine-grained masking and concealment, which is based on access level privileges of the user. We also introduce a simple classification technique that can accurately measure the anonymization extent in any anonymized data. The results of testing this classification technique and the proposed sensitivity-based anonymization method using different samples will also be discussed. These results show the significant benefits of the proposed approach, particularly regarding reduced information loss associated with the anonymization processes.
During the coronavirus disease (COVID-19) pandemic, different technologies, including telehealth, are maximised to mitigate the risks and consequences of the disease. Telehealth has been widely ...utilised because of its usability and safety in providing healthcare services during the COVID-19 pandemic. However, a systematic literature review which provides extensive evidence on the impact of COVID-19 through telehealth and which covers multiple directions in a large-scale research remains lacking. This study aims to review telehealth literature comprehensively since the pandemic started. It also aims to map the research landscape into a coherent taxonomy and characterise this emerging field in terms of motivations, open challenges and recommendations. Articles related to telehealth during the COVID-19 pandemic were systematically searched in the WOS, IEEE, Science Direct, Springer and Scopus databases. The final set included (n = 86) articles discussing telehealth applications with respect to (i) control (n = 25), (ii) technology (n = 14) and (iii) medical procedure (n = 47). Since the beginning of the pandemic, telehealth has been presented in diverse cases. However, it still warrants further attention. Regardless of category, the articles focused on the challenges which hinder the maximisation of telehealth in such times and how to address them. With the rapid increase in the utilization of telehealth in different specialised hospitals and clinics, a potential framework which reflects the authors’ implications of the future application and opportunities of telehealth has been established. This article improves our understanding and reveals the full potential of telehealth during these difficult times and beyond.
•State-of-the-art Literature Categorization for Telehealth utilization during COVID-19.•Challenges, motivations and recommended solutions are identified for Telehealth during COVID-19.•Different Applications of Telehealth during the COVID-19 pandemic.
COVID-19 as a global pandemic has had an unprecedented impact on the entire world. Projecting the future spread of the virus in relation to its characteristics for a specific suite of countries ...against a temporal trend can provide public health guidance to governments and organizations. Therefore, this paper presented an epidemiological comparison of the traditional SEIR model with an extended and modified version of the same model by splitting the infected compartment into asymptomatic mild and symptomatic severe. We then exposed our derived layered model into two distinct case studies with variations in mitigation strategies and non-pharmaceutical interventions (NPIs) as a matter of benchmarking and comparison. We focused on exploring the United Arab Emirates (a small yet urban centre (where clear sequential stages NPIs were implemented). Further, we concentrated on extending the models by utilizing the effective reproductive number (
R
t
) estimated against time, a more realistic than the static
R
0
, to assess the potential impact of NPIs within each case study. Compared to the traditional SEIR model, the results supported the modified model as being more sensitive in terms of peaks of simulated cases and flattening determinations.
Big data is predominantly associated with data retrieval, storage, and analytics. The world is creating a massive data size, which increases exponentially. Since the dawn of time until 2015, human ...had created 7.9 Zettabyte. This number will be exponentially raised up to 40.9 Zettabyte by 2020. Analytics in big data is maturing and moving towards mass adoption. The emergence of analytics increases the need for innovative tools and methodologies to protect data against privacy violation. Data analytics is prone to privacy violations and data disclosures, which can be partly attributed to the multi-user characteristics of big data environments. Adversaries may link data to external resources, try to access confidential data, or deduce private information from the large number of data pieces that they can obtain. Many data anonymisation methods were proposed to provide some degree of privacy protection by applying data suppression and other distortion techniques. However, currently available methods suffer from poor scalability and performance, low granularity, and lack of framework standardization. Current anonymisation methods are unable to cope with the processing of massive size of data. Some of these methods were especially proposed for the MapReduce framework to operate in big data. However, they still operate in conventional data management approaches. Therefore, there were no remarkable gains in the performance. To fill this gap, this thesis introduces a sensitivity-based anonymity framework that can operate in a MapReduce environment to benefit from its advantages, as well as from those in Hadoop ecosystems. The framework provides a granular user’s access that can be tuned to different authorization levels. The proposed solution provides a fine-grained alteration based on the user’s authorization level to access a domain for analytics. The framework’s core concept was derived from k-anonymisation techniques, which was proposed by Sweeney in 1998 for data protection. Using well-developed role-based access control approaches, this framework is capable of assigning roles to users and mapping them to relevant data attributes. Moreover, the thesis introduces a simple classification technique that can properly measure the anonymisation extent in any anonymised data. Various experiments showed promising results in applying the framework proposed in this thesis. The framework anonymisation expirements demonstrate fine granularity, good performance of parallel processing with high scalability and low distortion. To examine the effectiveness of the proposed framework in protecting privacy and reducing data loss, a diverse range of experimental studies are carried out. The experimental studies aimed to demonstrate the capability of the framework’s fine granularity by applying granular levels of anonymisation for data analysers. The experiments also meant to compare between the proposed anonymisation framework and the currently available frameworks. Also, all experiments are conducted by using big data operational tools, such as Hadoop and Spark. The comparison has been made in both systems. The results of the experiments showed higher performance output, in general, when anonymisation was conducted in Spark. However, in some limited cases, MapReduce is preferable when the cluster resources are limited, and the network is non-congested. The experiments unveil several facts regarding big data behaviour. For instance, big data tends to be more equivalent as the data size increases. Moreover, the major concern on big data is security, hence, focusing on security side should be the primary target. The few obfuscated records do not have a major impact on the overall statistical results. Therefore, the trade-off between security and information gain tends to give security a higher priority. It is expected that big data access is requested by a great number of users. This massive demand has recently increased with the social media blossom over the Internet. Personal and contextual information are available online publicly. Thus, personal re-identification has never been easier than now. For this reason, we believe that security should be the major focus of anonymisation algorithms. The experiments have also shown a high performance of processing and an average information loss for the proposed anonymisation framework. The anonymised data has gained a low classification error by the Bayesian classifier. In comparison to the current anonymisation methods, the proposed framework has a little lower classification error by 0.12%. From the performance perspective, the proposed framework has reached up to 40% faster than the current anonymisation frameworks. For the security side, it was strengthened by increasing the k-anonymity value and assigning granularity for user’s access.
Analytics in big data is maturing and moving towards mass adoption. The emergence of analytics increases the need for innovative tools and methodologies to protect data against privacy violation. ...Many data anonymization methods were proposed to provide some degree of privacy protection by applying data suppression and other distortion techniques. However, currently available methods suffer from poor scalability, performance and lack of framework standardization. Current anonymization methods are unable to cope with the massive size of data processing. Some of these methods were especially proposed for MapReduce framework to operate in Big Data. However, they still operate in conventional data management approaches. Therefore, there were no remarkable gains in the performance. We introduce a framework that can operate in MapReduce environment to benefit from its advantages, as well as from those in Hadoop ecosystems. Our framework provides a granular user's access that can be tuned to different authorization levels. The proposed solution provides a fine-grained alteration based on the user's authorization level to access MapReduce domain for analytics. Using well-developed role-based access control approaches, this framework is capable of assigning roles to users and map them to relevant data attributes.
Datasets containing private and sensitive information are useful for data analytics. Data owners cautiously release such sensitive data using privacy-preserving publishing techniques. Personal ...re-identification possibility is much larger than ever before. For instance, social media has dramatically increased the exposure to privacy violation. One well-known technique of k-anonymity proposes a protection approach against privacy exposure. K-anonymity tends to find k equivalent number of data records. The chosen attributes are known as Quasiidentifiers. This approach may reduce the personal reidentification. However, this may lessen the usefulness of information gained. The value of k should be carefully determined, to compromise both security and information gained. Unfortunately, there is no any standard procedure to define the value of k. The problem of the optimal k-anonymization is NP-hard. In this paper, we propose a greedy-based heuristic approach that provides an optimal value for k. The approach evaluates the empirical risk concerning our Sensitivity-Based Anonymization method. Our approach is derived from the finegrained access and business role anonymization for big data, which forms our framework.
Sensitivity-Based Anonymization of Big Data Al-Zobbi, Mohammed; Shahrestani, Seyed; Chun Ruan
2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops),
2016-Nov.
Conference Proceeding
Data Analytics is widely used as a means of extracting useful information from available data. It is only natural that it is increasingly adapted for processing big data. The rapidly growing demand ...for big data analytics has several undesirable side-effects. Perhaps, the most significant of those relates to increased risks for data disclosure and privacy violations. Data anonymization can provide promising solutions for minimizing such risks. In this paper, we discuss some of the specific requirements of the anonymization process when dealing with big data. We show that in general, information loss is the result of avoidable generalization of similar or equivalent data. Using these analyses, we propose a novel framework for data anonymization, which expands the k-anonymity properties and concepts and takes the data class values and the sensitivity of data into account. As such, the proposed process can utilize a bottom-up approach, in contrast to most other anonymization methods. The top-down approaches usually generalize all records, the equivalent and the non-equivalent ones. Ours is more methodical, as it avoids the generalization of the equivalent records. With the inclusion of sensitivity levels, we demonstrate that our framework can reduce the iteration steps and the time required to finalize the anonymization, and therefore enhance the overall efficiency of the process.