Abstract
Background
Summary data furnishing a two-sample Mendelian randomization (MR) study are often visualized with the aid of a scatter plot, in which single-nucleotide polymorphism (SNP)–outcome ...associations are plotted against the SNP–exposure associations to provide an immediate picture of the causal-effect estimate for each individual variant. It is also convenient to overlay the standard inverse-variance weighted (IVW) estimate of causal effect as a fitted slope, to see whether an individual SNP provides evidence that supports, or conflicts with, the overall consensus. Unfortunately, the traditional scatter plot is not the most appropriate means to achieve this aim whenever SNP–outcome associations are estimated with varying degrees of precision and this is reflected in the analysis.
Methods
We propose instead to use a small modification of the scatter plot—the Galbraith Radial plot—for the presentation of data and results from an MR study, which enjoys many advantages over the original method. On a practical level, it removes the need to recode the genetic data and enables a more straightforward detection of outliers and influential data points. Its use extends beyond the purely aesthetic, however, to suggest a more general modelling framework to operate within when conducting an MR study, including a new form of MR-Egger regression.
Results
We illustrate the methods using data from a two-sample MR study to probe the causal effect of systolic blood pressure on coronary heart disease risk, allowing for the possible effects of pleiotropy. The Radial plot is shown to aid the detection of a single outlying variant that is responsible for large differences between IVW and MR-Egger regression estimates. Several additional plots are also proposed for informative data visualization.
Conclusions
The Radial plot should be considered in place of the scatter plot for visualizing, analysing and interpreting data from a two-sample summary data MR study. Software is provided to help facilitate its use.
It is already true that Big Data has drawn huge attention from researchers in information sciences, policy and decision makers in governments and enterprises. As the speed of information growth ...exceeds Moore’s Law at the beginning of this new century, excessive data is making great troubles to human beings. However, there are so much potential and highly useful values hidden in the huge volume of data. A new scientific paradigm is born as data-intensive scientific discovery (DISD), also known as Big Data problems. A large number of fields and sectors, ranging from economic and business activities to public administration, from national security to scientific researches in many areas, involve with Big Data problems. On the one hand, Big Data is extremely valuable to produce productivity in businesses and evolutionary breakthroughs in scientific disciplines, which give us a lot of opportunities to make great progresses in many fields. There is no doubt that the future competitions in business productivity and technologies will surely converge into the Big Data explorations. On the other hand, Big Data also arises with many challenges, such as difficulties in data capture, data storage, data analysis and data visualization. This paper is aimed to demonstrate a close-up view about Big Data, including Big Data applications, Big Data opportunities and challenges, as well as the state-of-the-art techniques and technologies we currently adopt to deal with the Big Data problems. We also discuss several underlying methodologies to handle the data deluge, for example, granular computing, cloud computing, bio-inspired computing, and quantum computing.
Parallel coordinate plots (PCPs) have been widely used for high-dimensional (HD) data storytelling because they allow for presenting a large number of dimensions without distortions. The axes ...ordering in PCP presents a particular story from the data based on the user perception of PCP polylines. Existing works focus on directly optimizing for PCP axes ordering based on some common analysis tasks like clustering, neighborhood, and correlation. However, direct optimization for PCP axes based on these common properties is restrictive because it does not account for multiple properties occurring between the axes, and for local properties that occur in small regions in the data. Also, many of these techniques do not support the human-in-the-loop (HIL) paradigm, which is crucial (i) for explainability and (ii) in cases where no single reordering scheme fits the users' goals. To alleviate these problems, we present PC-Expo, a real-time visual analytics framework for all-in-one PCP line pattern detection and axes reordering. We studied the connection of line patterns in PCPs with different data analysis tasks and datasets. PC-Expo expands prior work on PCP axes reordering by developing real-time, local detection schemes for the 12 most common analysis tasks (properties). Users can choose the story they want to present with PCPs by optimizing directly over their choice of properties. These properties can be ranked, or combined using individual weights, creating a custom optimization scheme for axes reordering. Users can control the granularity at which they want to work with their detection scheme in the data, allowing exploration of local regions. PC-Expo also supports HIL axes reordering via local-property visualization, which shows the regions of granular activity for every axis pair. Local-property visualization is helpful for PCP axes reordering based on multiple properties, when no single reordering scheme fits the user goals. A comprehensive evaluation was done with real users and diverse datasets confirm the efficacy of PC-Expo in data storytelling with PCPs.
The Mantid framework is a software solution developed for the analysis and visualization of neutron scattering and muon spin measurements. The framework is jointly developed by a large team of ...software engineers and scientists at the ISIS Neutron and Muon Facility and the Oak Ridge National Laboratory. The objective of the development is to improve software quality, both in terms of performance and ease of use, for the the user community of large scale facilities. The functionality and novel design aspects of the framework are described.
With the rapid development of sensing technologies, massive spatiotemporal data have been acquired from the urban space with respect to different domains, such as transportation and environment. ...Numerous co-occurrence patterns (e.g., traffic speed < 10km/h, weather = foggy , and air quality = unhealthy ) between the transportation data and other types of data can be obtained with given spatiotemporal constraints (e.g., within 3 kilometers and lasting for 2 hours) from these heterogeneous data sources. Such patterns present valuable implications for many urban applications, such as traffic management, pollution diagnosis, and transportation planning. However, extracting and understanding these patterns is beyond manual capability because of the scale, diversity, and heterogeneity of the data. To address this issue, a novel visual analytics system called CorVizor is proposed to identify and interpret these co-occurrence patterns. CorVizor comprises two major components. The first component is a co-occurrence mining framework involving three steps, namely, spatiotemporal indexing, co-occurring instance generation, and pattern mining. The second component is a visualization technique called CorView that implements a level-of-detail mechanism by integrating tailored visualizations to depict the extracted spatiotemporal co-occurrence patterns. The case studies and expert interviews are conducted to demonstrate the effectiveness of CorVizor.
Fostering data visualization literacy (DVL) as part of childhood education could lead to a more data literate society. However, most work in DVL for children relies on a more formal educational ...context (i.e., a teacher-led approach) that limits children's engagement with data to classroom-based environments and, consequently, children's ability to ask questions about and explore data on topics they find personally meaningful. We explore how a curiosity-driven, child-led approach can provide more agency to children when they are authoring data visualizations. This paper explores how informal learning with crafting physicalizations through play and curiosity may foster increased literacy and engagement with data. Employing a constructionist approach, we designed a do-it-yourself toolkit made out of everyday materials (e.g., paper, cardboard, mirrors) that enables children to create, customize, and personalize three different interactive visualizations (bar, line, pie). We used the toolkit as a design probe in a series of in-person workshops with 5 children (6 to 11-year-olds) and interviews with 5 educators. Our observations reveal that the toolkit helped children creatively engage and interact with visualizations. Children with prior knowledge of data visualization reported the toolkit serving as more of an authoring tool that they envision using in their daily lives, while children with little to no experience found the toolkit as an engaging introduction to data visualization. Our study demonstrates the potential of using the constructionist approach to cultivate children's DVL through curiosity and play.
Digital marketing is leading the way in offering new features to reach, inform, engage, offer, and sell products and services to customers, and is expected to continue to be at the forefront of the ...technological revolution. The purpose of this study is to identify influential cited works in digital marketing communication (DMC) research, to determine the current status of the research on DMC, and to indicate the extent to which influential works have shaped it. This bibliometric study assesses articles published over a 12-year period in core DMC-related journals. The analysis examines 5865 citations of 141 digital-related articles in the targeted journals in the given publications using both citation and co-citation analyses. After a broad disciplinary review of key cited DMC works, this study suggests thematic insights and implications for academics and practitioners that are promising avenues for creating effective DMC.
The clinical use of molecular targeted therapy is rapidly evolving but has primarily focused on genomic alterations. Transcriptomic analysis offers an opportunity to dissect the complexity of tumors, ...including the tumor microenvironment (TME), a crucial mediator of cancer progression and therapeutic outcome. TME classification by transcriptomic analysis of >10,000 cancer patients identifies four distinct TME subtypes conserved across 20 different cancers. The TME subtypes correlate with patient response to immunotherapy in multiple cancers, with patients possessing immune-favorable TME subtypes benefiting the most from immunotherapy. Thus, the TME subtypes act as a generalized immunotherapy biomarker across many cancer types due to the inclusion of malignant and microenvironment components. A visual tool integrating transcriptomic and genomic data provides a global tumor portrait, describing the tumor framework, mutational load, immune composition, anti-tumor immunity, and immunosuppressive escape mechanisms. Integrative analyses plus visualization may aid in biomarker discovery and the personalization of therapeutic regimens.
Display omitted
•Development of a holistic transcriptomic-based TME classification platform•Detection of four immune/fibrotic TME subtypes conserved in a broad array of cancers•The four TME subtypes are predictive of response to immunotherapy in multiple cancers•Integration of genomics and transcriptomics into a visual tool with a planetary view
Bagaev et al. identify four tumor microenvironment (TME) subtypes that are conserved across diverse cancers and correlate with immunotherapy response in melanoma, bladder, and gastric cancers. A visual tool revealing the TME subtypes integrated with targetable genomic alterations provides a planetary view of each tumor that can aid in oncology clinical decision making.
The fields of machine learning and causal inference have developed many concepts, tools, and theory that are potentially useful for each other. Through exploring the possibility of extracting causal ...interpretations from black-box machine-trained models, we briefly review the languages and concepts in causal inference that may be interesting to machine learning researchers. We start with the curious observation that Friedman's partial dependence plot has exactly the same formula as Pearl's back-door adjustment and discuss three requirements to make causal interpretations: a model with good predictive performance, some domain knowledge in the form of a causal diagram and suitable visualization tools. We provide several illustrative examples and find some interesting and potentially causal relations using visualization tools for black-box models.