Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex ...multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Social network analysis (SNA) has emerged as a powerful method for understanding the importance of relationships in networks. However, interactive exploration of networks is currently challenging ...because: (1) it is difficult to find patterns and comprehend the structure of networks with many nodes and links, and (2) current systems are often a medley of statistical methods and overwhelming visual output which leaves many analysts uncertain about how to explore in an orderly manner. This results in exploration that is largely opportunistic. Our contributions are techniques to help structural analysts understand social networks more effectively. We present SocialAction, a system that uses attribute ranking and coordinated views to help users systematically examine numerous SNA measures. Users can (1) flexibly iterate through visualizations of measures to gain an overview, filter nodes, and find outliers, (2) aggregate networks using link structure, find cohesive subgroups, and focus on communities of interest, and (3) untangle networks by viewing different link types separately, or find patterns across different link types using a matrix overview. For each operation, a stable node layout is maintained in the network visualization so users can make comparisons. SocialAction offers analysts a strategy beyond opportunism, as it provides systematic, yet flexible, techniques for exploring social networks
•Zika and other mosquito-borne flaviviruses persist in wild primates.•High biodiversity and low data availability prevent targeted surveillance.•Imputation and machine learning confront data sparsity ...to predict primate hosts.•Hosts with highest risk of Zika positivity are in close proximity to humans.•Targeted surveillance of predicted hosts and vectors may mitigate spillover risk.
The recent Zika virus (ZIKV) epidemic in the Americas ranks among the largest outbreaks in modern times. Like other mosquito-borne flaviviruses, ZIKV circulates in sylvatic cycles among primates that can serve as reservoirs of spillover infection to humans. Identifying sylvatic reservoirs is critical to mitigating spillover risk, but relevant surveillance and biological data remain limited for this and most other zoonoses. We confronted this data sparsity by combining a machine learning method, Bayesian multi-label learning, with a multiple imputation method on primate traits. The resulting models distinguished flavivirus-positive primates with 82% accuracy and suggest that species posing the greatest spillover risk are also among the best adapted to human habitations. Given pervasive data sparsity describing animal hosts, and the virtual guarantee of data sparsity in scenarios involving novel or emerging zoonoses, we show that computational methods can be useful in extracting actionable inference from available data to support improved epidemiological response and prevention.
Fluctuating symptoms and side effects are common during outpatient cancer treatment, and approaches to monitoring symptoms vary widely across providers, patients, and clinical settings. To design a ...remote symptom monitoring system that patients and providers find to be useful, it may be helpful to understand current clinical approaches to monitoring and managing chemotherapy-related symptoms among patients and providers and assess how more frequent and systematic assessment and sharing of data could improve patient and provider experiences.
The goals of this study were to learn about patient and provider perspectives on monitoring symptoms during chemotherapy, understand barriers and challenges to effective symptom monitoring at one institution, and explore the potential value of remote symptom monitoring between provider visits.
A total of 15 patients who were currently undergoing or had recently completed chemotherapy and 7 oncology providers participated in semistructured interviews. Interviews were transcribed and coded using an iterative thematic analysis approach. The study was conducted at a National Cancer Institute-Designated Comprehensive Cancer Center.
Four main themes were discussed by patients and providers: (1) asynchronous nature of current methods for tracking and managing symptoms, (2) variability in reported symptoms due to patient factors, (3) limitations of existing communication channels, and (4) potential value of real-time remote symptom monitoring during chemotherapy. Current asynchronous methods and existing communication channels resulted in a disconnect between when symptoms are most severe and when conversations about symptoms happen, a situation further complicated by memory impairments during chemotherapy. Patients and providers both highlighted improvements in patient-provider communication as a potential benefit of remote real-time symptom monitoring. Providers also emphasized the value of temporal data regarding when symptoms first emerge and how they progress over time, as well as the potential value of concurrent activity or other data about daily activities and functioning. Patients noted that symptom monitoring could result in better preparation for subsequent treatment cycles.
Both patients and providers highlighted significant challenges of asynchronous, patient-initiated, phone-dependent symptom monitoring and management. Oncology patients and providers reported that more routine remote monitoring of symptoms between visits could improve patient-provider communication, prepare patients for subsequent chemotherapy cycles, and facilitate provider insight and clinical decision-making with regard to symptom management.
Many researchers across diverse disciplines aim to analyze the behavior of cohorts whose behaviors are recorded in large event databases. However, extracting cohorts from databases is a difficult yet ...important step, often overlooked in many analytical solutions. This is especially true when researchers wish to restrict their cohorts to exhibit a particular temporal pattern of interest. In order to fill this gap, we designed COQUITO, a visual interface that assists users defining cohorts with temporal constraints. COQUITO was designed to be comprehensible to domain experts with no preknowledge of database queries and also to encourage exploration. We then demonstrate the utility of COQUITO via two case studies, involving medical and social media researchers.
Display omitted
•We extract sequences of events from EMRs to correlate with patient outcome.•We propose Care Pathway Explorer that combines sequence mining with visualizations.•We support the ...integration of data-driven insights into care pathway discovery.•We analyze the diagnoses and treatments of hyperlipidemic patients.•We demonstrate the clinical relevance of patterns mined from EMR data.
In order to derive data-driven insights, we develop Care Pathway Explorer, a system that mines and visualizes a set of frequent event sequences from patient EMR data. The goal is to utilize historical EMR data to extract common sequences of medical events such as diagnoses and treatments, and investigate how these sequences correlate with patient outcome.
The Care Pathway Explorer uses a frequent sequence mining algorithm adapted to handle the real-world properties of EMR data, including techniques for handling event concurrency, multiple levels-of-detail, temporal context, and outcome. The mined patterns are then visualized in an interactive user interface consisting of novel overview and flow visualizations.
We use the proposed system to analyze the diagnoses and treatments of a cohort of hyperlipidemic patients with hypertension and diabetes pre-conditions, and demonstrate the clinical relevance of patterns mined from EMR data. The patterns that were identified corresponded to clinical and published knowledge, some of it unknown to the physician at the time of discovery.
Care Pathway Explorer, which combines frequent sequence mining techniques with advanced visualizations supports the integration of data-driven insights into care pathway discovery.
As datasets grow and analytic algorithms become more complex, the typical workflow of analysts launching an analytic, waiting for it to complete, inspecting the results, and then re-Iaunching the ...computation with adjusted parameters is not realistic for many real-world tasks. This paper presents an alternative workflow, progressive visual analytics, which enables an analyst to inspect partial results of an algorithm as they become available and interact with the algorithm to prioritize subspaces of interest. Progressive visual analytics depends on adapting analytical algorithms to produce meaningful partial results and enable analyst intervention without sacrificing computational speed. The paradigm also depends on adapting information visualization techniques to incorporate the constantly refining results without overwhelming analysts and provide interactions to support an analyst directing the analytic. The contributions of this paper include: a description of the progressive visual analytics paradigm; design goals for both the algorithms and visualizations in progressive visual analytics systems; an example progressive visual analytics system (Progressive Insights) for analyzing common patterns in a collection of event sequences; and an evaluation of Progressive Insights and the progressive visual analytics paradigm by clinical researchers analyzing electronic medical records.
Neural sequence-to-sequence models have proven to be accurate and robust for many sequence prediction tasks, and have become the standard approach for automatic translation of text. The models work ...with a five-stage blackbox pipeline that begins with encoding a source sequence to a vector space and then decoding out to a new target sequence. This process is now standard, but like many deep learning methods remains quite difficult to understand or debug. In this work, we present a visual analysis tool that allows interaction and "what if"-style exploration of trained sequence-to-sequence models through each stage of the translation process. The aim is to identify which patterns have been learned, to detect model errors, and to probe the model with counterfactual scenario. We demonstrate the utility of our tool through several real-world sequence-to-sequence use cases on large-scale models.
Display omitted
•Differences in patient progression can significantly impact outcomes.•We present a methodology for interactive pattern mining and analysis of patient data.•Our approach combines ad ...hoc visual queries, mining, and interactive visualization.•Our methods uncover key event patterns and their associations with outcome over time.•Prototype implementation applied to population of 32,000 cardiology patients.
Patients’ medical conditions often evolve in complex and seemingly unpredictable ways. Even within a relatively narrow and well-defined episode of care, variations between patients in both their progression and eventual outcome can be dramatic. Understanding the patterns of events observed within a population that most correlate with differences in outcome is therefore an important task in many types of studies using retrospective electronic health data. In this paper, we present a method for interactive pattern mining and analysis that supports ad hoc visual exploration of patterns mined from retrospective clinical patient data. Our approach combines (1) visual query capabilities to interactively specify episode definitions, (2) pattern mining techniques to help discover important intermediate events within an episode, and (3) interactive visualization techniques that help uncover event patterns that most impact outcome and how those associations change over time. In addition to presenting our methodology, we describe a prototype implementation and present use cases highlighting the types of insights or hypotheses that our approach can help uncover.
Predictive modeling techniques are increasingly being used by data scientists to understand the probability of predicted outcomes. However, for data that is high-dimensional, a critical step in ...predictive modeling is determining which features should be included in the models. Feature selection algorithms are often used to remove non-informative features from models. However, there are many different classes of feature selection algorithms. Deciding which one to use is problematic as the algorithmic output is often not amenable to user interpretation. This limits the ability for users to utilize their domain expertise during the modeling process. To improve on this limitation, we developed INFUSE, a novel visual analytics system designed to help analysts understand how predictive features are being ranked across feature selection algorithms, cross-validation folds, and classifiers. We demonstrate how our system can lead to important insights in a case study involving clinical researchers predicting patient outcomes from electronic medical records.