Data mining in distributed environment: a survey Gan, Wensheng; Lin, Jerry Chun‐Wei; Chao, Han‐Chieh ...
Wiley interdisciplinary reviews. Data mining and knowledge discovery,
November/December 2017, Letnik:
7, Številka:
6
Journal Article
Recenzirano
Due to the rapid growth of resource sharing, distributed systems are developed, which can be used to utilize the computations. Data mining (DM) provides powerful techniques for finding meaningful and ...useful information from a very large amount of data, and has a wide range of real‐world applications. However, traditional DM algorithms assume that the data is centrally collected, memory‐resident, and static. It is challenging to manage the large‐scale data and process them with very limited resources. For example, large amounts of data are quickly produced and stored at multiple locations. It becomes increasingly expensive to centralize them in a single place. Moreover, traditional DM algorithms generally have some problems and challenges, such as memory limits, low processing ability, and inadequate hard disk, and so on. To solve the above problems, DM on distributed computing environment also called distributed data mining (DDM) has been emerging as a valuable alternative in many applications. In this study, a survey of state‐of‐the‐art DDM techniques is provided, including distributed frequent itemset mining, distributed frequent sequence mining, distributed frequent graph mining, distributed clustering, and privacy preserving of distributed data mining. We finally summarize the opportunities of data mining tasks in distributed environment. WIREs Data Mining Knowl Discov 2017, 7:e1216. doi: 10.1002/widm.1216
This article is categorized under:
Application Areas > Business and Industry
Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
Technologies > Computer Architectures for Data Mining
An overview of distributed data mining.
The fatty liver index (FLI) is an algorithm involving the waist circumference, body mass index, and serum levels of triglyceride and gamma-glutamyl transferase to identify fatty liver. Although some ...studies have attempted to validate the FLI, few studies have been conducted for external validation among Asians. We attempted to validate FLI to predict ultrasonographic fatty liver in Taiwanese subjects.
We enrolled consecutive subjects who received health check-up services at the Taipei Veterans General Hospital from 2002 to 2009. Ultrasonography was applied to diagnose fatty liver. The ability of the FLI to detect ultrasonographic fatty liver was assessed by analyzing the area under the receiver operating characteristic (AUROC) curve.
Among the 29,797 subjects enrolled in this study, fatty liver was diagnosed in 44.5% of the population. Subjects with ultrasonographic fatty liver had a significantly higher FLI than those without fatty liver by multivariate analysis (odds ratio 1.045; 95% confidence interval, CI 1.044-1.047, p< 0.001). Moreover, FLI had the best discriminative ability to identify patients with ultrasonographic fatty liver (AUROC: 0.827, 95% confidence interval, 0.822-0.831). An FLI < 25 (negative likelihood ratio (LR-) 0.32) for males and <10 (LR- 0.26) for females rule out ultrasonographic fatty liver. Moreover, an FLI ≥ 35 (positive likelihood ratio (LR+) 3.12) for males and ≥ 20 (LR+ 4.43) for females rule in ultrasonographic fatty liver.
FLI could accurately identify ultrasonographic fatty liver in a large-scale population in Taiwan but with lower cut-off value than the Western population. Meanwhile the cut-off value was lower in females than in males.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
A Survey of Utility-Oriented Pattern Mining Gan, Wensheng; Lin, Jerry Chun-Wei; Fournier-Viger, Philippe ...
IEEE transactions on knowledge and data engineering,
2021-April-1, 2021-4-1, Letnik:
33, Številka:
4
Journal Article
Recenzirano
Odprti dostop
The main purpose of data mining and analytics is to find novel, potentially useful patterns that can be utilized in real-world applications to derive beneficial knowledge. For identifying and ...evaluating the usefulness of different kinds of patterns, many techniques and constraints have been proposed, such as support, confidence, sequence order, and utility parameters (e.g., weight, price, profit, quantity, satisfaction, etc.). In recent years, there has been an increasing demand for utility-oriented pattern mining (UPM, or called utility mining). UPM is a vital task, with numerous high-impact applications, including cross-marketing, e-commerce, finance, medical, and biomedical applications. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods of UPM. First, we introduce an in-depth understanding of UPM, including concepts, examples, and comparisons with related concepts. A taxonomy of the most common and state-of-the-art approaches for mining different kinds of high-utility patterns is presented in detail, including Apriori-based, tree-based, projection-based, vertical-/horizontal-data-format-based, and other hybrid approaches. A comprehensive review of advanced topics of existing high-utility pattern mining techniques is offered, with a discussion of their pros and cons. Finally, we present several well-known open-source software packages for UPM. We conclude our survey with a discussion on open and practical challenges in this field.
Timely and reliable information sharing among autonomous vehicles (AVs) provides a promising approach for reducing traffic congestion and improving traffic efficiency in future intelligent ...transportation systems. In this paper, we consider millimeter-wave (mmWave) based multi-hop vehicle-to-vehicle (V2V) communications to facilitate ultra-reliable low-latency information sharing among AVs. We propose a novel framework for performance analysis and design of relay selection schemes in mmWave multi-hop V2V communications, while taking into account the mmWave signal propagation characteristics, road topology, and traffic conditions. In particular, considering the minimum tracking distance requirement of road traffic, the headway, i.e., the distance between adjacent AVs, is modeled as shifted-exponential distribution. Moreover, we model the communication path losses using the Manhattan distance metric in the taxicab geometry, which can more accurately capture the characteristics of mmWave signal propagation in urban grid roads than conventional Euclidean distance geometry. Based on the proposed model, we investigate the latency and reliability of mmWave multi-hop V2V communications for three widely adopted relay selection schemes, i.e., random with forward progress (RFP), most forward with fixed radius (MFR), and nearest with forward progress (NFP), respectively. Furthermore, we propose a novel relay selection scheme for joint optimization of the single-hop forward progress (FP) and single-hop latency according to the AVs' instantaneous locations and an estimate of the residual multi-hop latency. Simulation results show that, by balancing the current single-hop latency and the residual multi-hop latency for the multi-hop V2V network, the proposed relay selection scheme significantly outperforms the MFR, NFP and RFP in both multi-hop transmission latency and reliability of mmWave V2V communications.
Summary
Background
Globally, chronic hepatitis B (CHB) is a major public health concern. Timely and effective management can prevent disease progression to cirrhosis and reduce the risk of ...hepatocellular carcinoma (HCC). Currently, there is no consensus on the clinical management of CHB in East Asia.
Aim
To establish an East Asia expert opinion on treatment initiation for CHB based on alanine aminotransferase (ALT) level, hepatitis B virus (HBV) deoxyribonucleic acid (DNA) level, cirrhosis and HCC risk scores.
Methods
A meeting was held online with a panel of 10 experts from East Asia to discuss ALT, HBV DNA, cirrhosis and HCC risk scores. Indications for CHB treatment in the latest international guidelines were reviewed. Consensus was summarised to provide recommendations on the initiation of treatment for CHB.
Results
Anti‐viral therapy is recommended for CHB patients with (a) HBV DNA ≥ 2000 IU/mL and ALT ≥ 1× upper limit of normal (ULN); (b) HBV DNA ≥ 2000 IU/mL, ALT < 1× ULN and ≥ F2 fibrosis and/or ≥ A2 necroinflammation occurs; (c) cirrhosis and detectable HBV DNA; or (d) HBV DNA ≥ 2000 IU/mL, ALT < 1× ULN and a family history of cirrhosis or HCC, extrahepatic manifestations or age > 40 years. Patients with cirrhosis and/or HCC should be treated regardless of ALT levels if HBV DNA level is detectable. Initiating anti‐viral therapy or close monitoring at 3‐month intervals is recommended for CHB patients with at least two HCC risk factors.
Conclusions
These expert recommendations will contribute to a new standard of daily clinical practice in East Asia.
Background & Aim
Anthropometric data are associated with nonalcoholic fatty liver disease (NAFLD) development and progression. We investigated whether the quantity and quality of muscle and visceral ...fat assessed by computed tomography (CT) are associated with fibrosis severity in NAFLD.
Methods
In a prospective biopsy‐confirmed NAFLD cohort of 521 patients, we measured skeletal muscle index (SMI), muscle attenuation (MA) and visceral adipose tissue index (VATI) via CT. Low skeletal muscle mass (LSMM) was defined using previously validated cut‐offs. Myosteatosis and visceral adiposity were defined as the lowest and highest quartile, respectively. Significant fibrosis was defined as F2‐F4 in liver histology.
Results
Patients with significant fibrosis had lower SMI and MA and higher VATI than those without. The significant fibrosis prevalence was significantly higher in subjects with LSMM (45.1% vs 30.8%, P = .005), myosteatosis (46.1% vs 29.7%, P = .001) and visceral adiposity (46.9% vs 29.9%, P = .001) than those without. The significant fibrosis risk increased with increasing numbers of body composition components (24.5%, 35.6%, 53.0% and 72.7% in patients with 0, 1, 2 and 3 components respectively). Multivariable analysis revealed that LSMM (OR, 1.72; 95% CI, 1.05‐2.84), myosteatosis (OR, 1.65; 95% CI, 1.01‐2.68) and visceral adiposity (OR, 1.75; 95% CI, 1.09‐2.83) were independent predictors of significant fibrosis. Subjects with sarcopenia had a higher risk of significant fibrosis (OR, 2.17; 95% CI, 1.03‐4.56).
Conclusion
Muscle alterations and visceral adiposity assessed by CT are associated with significant fibrosis in NAFLD. LSMM and myosteatosis have additive values in prediction of significant fibrosis.
Summary
Background
Non‐alcoholic fatty liver disease (NAFLD) and non‐alcoholic steatohepatitis (NASH) account for an increasing proportion of liver disease in the Asia‐Pacific region. Many areas in ...the region are experiencing epidemics of metabolic syndrome among rapidly ageing populations.
Aims
To estimate using modelling the growth in NAFLD populations, including cases with significant fibrosis that are most likely to experience advanced liver disease and related mortality.
Methods
A disease progression model was used to summarise and project fibrosis progression among the NAFLD populations of Hong Kong, Singapore, South Korea and Taiwan. For each area, changes in the adult prevalence of obesity was used to extrapolate long‐term trends in NAFLD incidence.
Results
In the areas studied, prevalent NAFLD cases were projected to increase 6%‐20% during 2019‐2030, while prevalent NASH cases increase 20%‐35%. Incident cases of hepatocellular carcinoma are projected to increase by 65%‐85%, while incident decompensated cirrhosis cases increase 65%‐100% by 2030. Likewise, NAFLD‐related mortality is projected to increase between 65% and 100% from 2019 to 2030. NAFLD disease burden is expected to increase alongside rising trends in metabolic syndrome and obesity among populations in the region. This leads to more cases of advanced liver disease and associated mortality.
Conclusions
Preventing the growth of diabetic and obese populations will be a key factor in reducing ongoing increases in NAFLD‐related disease burden in the Asia‐Pacific region.
A survey of incremental high‐utility itemset mining Gan, Wensheng; Lin, Jerry Chun‐Wei; Fournier‐Viger, Philippe ...
Wiley interdisciplinary reviews. Data mining and knowledge discovery,
March/April 2018, Letnik:
8, Številka:
2
Journal Article
Recenzirano
Odprti dostop
Traditional association rule mining has been widely studied. But it is unsuitable for real‐world applications where factors such as unit profits of items and purchase quantities must be considered. ...High‐utility itemset mining (HUIM) is designed to find highly profitable patterns by considering both the purchase quantities and unit profits of items. However, most HUIM algorithms are designed to be applied to static databases. But in real‐world applications such as market basket analysis and business decision‐making, databases are often dynamically updated by inserting new data such as customer transactions. Several researchers have proposed algorithms to discover high‐utility itemsets (HUIs) in dynamically updated databases. Unlike batch algorithms, which always process a database from scratch, incremental high‐utility itemset mining (iHUIM) algorithms incrementally update and output HUIs, thus reducing the cost of discovering HUIs. This paper provides an up‐to‐date survey of the state‐of‐the‐art iHUIM algorithms, including Apriori‐based, tree‐based, and utility‐list‐based approaches. To the best of our knowledge, this is the first survey on the mining task of incremental high‐utility itemset mining. The paper also identifies several important issues and research challenges for iHUIM. WIREs Data Mining Knowl Discov 2018, 8:e1242. doi: 10.1002/widm.1242
This article is categorized under:
Algorithmic Development > Association Rules
Application Areas > Data Mining Software Tools
Fundamental Concepts of Data and Knowledge > Knowledge Representation
Utility‐oriented pattern mining
Development of specific antiviral agents is an urgent unmet need for SARS-coronavirus 2 (SARS-CoV-2) infection. This study focuses on host proteases that proteolytically activate the SARS-CoV-2 spike ...protein, critical for its fusion after binding to angiotensin-converting enzyme 2 (ACE2), as antiviral targets. We first validate cleavage at a putative furin substrate motif at SARS-CoV-2 spikes by expressing it in VeroE6 cells and find prominent syncytium formation. Cleavage and the syncytium are abolished by treatment with the furin inhibitors decanoyl-RVKR-chloromethylketone (CMK) and naphthofluorescein, but not by the transmembrane protease serine 2 (TMPRSS2) inhibitor camostat. CMK and naphthofluorescein show antiviral effects on SARS-CoV-2-infected cells by decreasing virus production and cytopathic effects. Further analysis reveals that, similar to camostat, CMK blocks virus entry, but it further suppresses cleavage of spikes and the syncytium. Naphthofluorescein acts primarily by suppressing viral RNA transcription. Therefore, furin inhibitors may be promising antiviral agents for prevention and treatment of SARS-CoV-2 infection.
Display omitted
•The furin cleavage site in the SARS-CoV-2 spike protein mediates syncytium formation•The SARS-CoV-2 spike-mediated syncytium is suppressed by specific furin inhibitors•Furin inhibitors block SARS-CoV-2 virus entry and virus replication•Furin inhibitors are potential antiviral agents for SARS-CoV-2 infection and pathogenesis
Development of effective antiviral agents is an urgent unmet need for SARS-CoV-2 infection. Cheng et al. find that cleavage of the furin substrate site in the viral spike protein is critical for virus production and cytopathic effects. Two inhibitors targeting furin are potential antiviral agents to control SARS-CoV-2 infection and pathogenesis.
Background and Aim
The severity of liver dysfunction in hepatocellular carcinoma (HCC) is often estimated with Child–Turcotte–Pugh (CTP) classification or model for end‐stage liver disease (MELD) ...score. We aim to investigate the performance of albumin‐bilirubin (ALBI) and platelet‐albumin‐bilirubin (PALBI) grade, which are recently reported to be simple and objective measurements for liver reserve in HCC.
Methods
Between 2002 and 2014, consecutive 3182 HCC patients were enrolled to follow up their survival. The area under receiver‐operator‐characteristic curve (AUC) was calculated to test the discriminatory powers over 1‐year, 3‐year, and 5‐year survival.
Results
Significant survival differences were found across all ALBI and PALBI grades (both P < 0.001). The majority (73%) of patients were CTP class A. Within CTP class A, ALBI revealed two prognostic groups while PALBI segregated three prognostic groups. The PABLI grade also identified three different survival groups for patients undergoing resection, ablation, and chemoembolization. Both ALBI and PALBI grade were capable of discerning survival among different HCC stages. The PALBI grade had significantly higher AUC compared with CTP classification and ALBI grade at 1, 3, and 5 years. For CTP class A patients, the PALBI grade was also associated with significantly higher AUC compared with ALBI grade at 1‐year and 3‐year intervals. The MELD score has the lowest AUC compared with other systems.
Conclusions
Both ALBI and PALBI grade are adequate models to assess liver dysfunction in HCC. The PALBI grade is consistently better in all patients, in patients with minimally decreased liver function, and in patients receiving different aggressive therapies.