Big data analytics (BDA) is of paramount importance in healthcare aspects such as patient diagnostics, fast epidemic recognition, and improvement of patient management. The objective of this ...profiling study is (a) to provide an overview of the BDA publication dynamics in the healthcare domain and (b) to discuss this scientific field through related examples. A sampling literature review has been conducted. A total of 804 papers have been identified and content analysis has been performed to mine knowledge in the domain for the years 2000-2016. The findings show that co-authors' backgrounds are from the subject areas of medicine and computer sciences. Most articles are experimental in nature and use modeling and machine learning techniques to exploit clinical data, for health monitoring and prediction purposes. Many articles are relevant to the medical specialties of neurology/neurosurgery/neuropsychiatry, medical oncology, and cardiology. Well-cited papers investigate the identification and management of high-risk/cost patients, the use of big data, Hadoop and cloud computing in genomics, and the development of mobile applications for disease management. Important is also the research about improving disease prediction by investigating patients' medical results using advanced analysis (such as segmentation and predictive modelling, machine learning, visualisation, etc.).
Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two‐step methods: ...association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.
SUMMARY
Bayesian hierarchical models produce shrinkage estimators that can be used as the basis for integrating supplementary data into the analysis of a primary data source. Established approaches ...should be considered limited, however, because posterior estimation either requires prespecification of a shrinkage weight for each source or relies on the data to inform a single parameter, which determines the extent of influence or shrinkage from all sources, risking considerable bias or minimal borrowing. We introduce multisource exchangeability models (MEMs), a general Bayesian approach for integrating multiple, potentially non-exchangeable, supplemental data sources into the analysis of a primary data source. Our proposed modeling framework yields source-specific smoothing parameters that can be estimated in the presence of the data to facilitate a dynamic multi-resolution smoothed estimator that is asymptotically consistent while reducing the dimensionality of the prior space. When compared with competing Bayesian hierarchical modeling strategies, we demonstrate that MEMs achieve approximately 2.2 times larger median effective supplemental sample size when the supplemental data sources are exchangeable as well as a 56% reduction in bias when there is heterogeneity among the supplemental sources. We illustrate the application of MEMs using a recently completed randomized trial of very low nicotine content cigarettes, which resulted in a 30% improvement in efficiency compared with the standard analysis.
ABSTRACT
Recently, researchers have begun using online labor markets to recruit participants for experimental studies examining the judgments and decisions of nonprofessional investors. This study ...investigates the quality and generalizability of data collected from these sources by replicating an experimental task from Elliott, Hodge, Kennedy, and Pronk (2007) using nonprofessional investor participants from two popular online labor markets—Amazon's Mechanical Turk (MTurk) and Qualtrics Online Sample (Qualtrics). Compared to Qualtrics participants, we find that MTurk participants pay greater attention to the experimental materials and better acquire and recall information. Further, the MTurk sample more closely replicates EHKP's investment club member results on measures of information integration than does the Qualtrics sample. These results provide some evidence that many interesting research questions can be satisfactorily answered using nonprofessional investor participants from MTurk. We believe further investigation is needed before Qualtrics can be endorsed as a high-quality source of nonprofessional investor participants.
Trial investigators often have a primary interest in the estimation of the survival curve in a population for which there exists acceptable historical information from which to borrow strength. ...However, borrowing strength from a historical trial that is non‐exchangeable with the current trial can result in biased conclusions. In this article we propose a fully Bayesian semiparametric method for the purpose of attenuating bias and increasing efficiency when jointly modeling time‐to‐event data from two possibly non‐exchangeable sources of information. We illustrate the mechanics of our methods by applying them to a pair of post‐market surveillance datasets regarding adverse events in persons on dialysis that had either a bare metal or drug‐eluting stent implanted during a cardiac revascularization surgery. We finish with a discussion of the advantages and limitations of this approach to evidence synthesis, as well as directions for future work in this area. The article's Supplementary Materials offer simulations to show our procedure's bias, mean squared error, and coverage probability properties in a variety of settings.
El objetivo del presente artículo es poner de manifiesto la limitada capacidad que tenemos desde la Criminología para estudiar cuantitativamente el ejercicio de los poderes policiales en España a ...partir de los datos disponibles de fuentes oficiales. Para ello se seleccionan dos potestades policiales, la identificación y la detención, y a partir de ellas se muestran los datos disponibles, así como las posibilidades y las carencias que tienen para su uso en la investigación criminológica. Posteriormente, se exponen a modo de contraste dos ejemplos internacionales, concretamente el de las paradas policiales en Reino Unido y las detenciones en Estados Unidos. Por último, se señalan las cautelas que se han de tomar a la hora de trabajar con datos secundarios y las implicaciones que tienen la falta de datos y una política de transparencia decidida no solo para la investigación criminológica sino para un necesario monitoreo de la actividad policial en democracia.
Issues of supporting the book publishing as an economic activity by a set of statistical indicators are investigated. It is found out that the existing set of statistical indicators does not meet the ...needs of researchers and practitioners, which is the case of not only Ukraine, but the global book publishing area. The case of the Ukrainian book publishing is taken for analysis to identify core problems faced by this industry. It is emphasized that a comprehensive study of the book publishing industry and presentation of the statistical information with high level of quality and aggregation requires the involvement of new alternative sources of data, of which big data should be highlighted. The component of scientific novelty is that an updated system of statistical indicators is proposed for the first time, with eight modules of sources of statistical information as alternative ones: questionnaires, electronic books, digital libraries, websites of publishers and bookstores, electronic diaries of reading (Goodreads as an example), social networks (Instagram, Facebook, Telegram (open channels)), video hosts (YouTube being the most popular one), and blogs. It is stressed that all the modules of alternative data must be involved for obtaining reliable data, where output data will be processed anew and have direct and reverse links, which will require the use of neural networks with efferent type of links. This statistical support to the book publishing industry is an innovation designed to meet urgent needs of the public and official statistics.
We examine different aspects of nuptiality and fertility in the Länder of the Austrian Empire using the Tafeln zur Statistik der Österreichischen Monarchie (Statistical Tables of the Austrian ...Monarchy). This source, published from 1829 to 1871, contains data on population and natural movement. After discussing its quality, we study marriage and birth rates, and also age at wedding, illegitimacy ratio, and marital fertility. We find meaningful differences between the regions of Empire: low and late nuptiality in some central Länder, which generally have consequences for birth rates. The frequency of illegitimacy and marital fertility rates are also examined for the 15 Länder.
Se analizan los datos geolocalizados de la red social Twitter con el fin de conocer las posibilidades que ofrecen para el estudio de las pautas de movilidad diarias, aplicado al caso del área urbana ...de Valencia, España. Basado en dicho análisis, se propone una metodología para el tratamiento y explotación de los datos, focalizada en la detección del lugar de residencia de sus usuarios, información básica en el análisis de la movilidad. El buen ajuste de los resultados con distintas fuentes de comprobación ratifica la adecuación de la metodología y las amplias posibilidades de la fuente analizada.
This teaching note describes the design and implementation of an activity in a 90-minute teaching session that was developed to introduce a diverse cohort of first-year criminology and sociology ...students to the use of documents as sources of data. This approach was contextualized in real-world research through scaffolded, student-centered tasks focused on archival material and contemporary estate agents' brochures so as to investigate changes in the suburbs that surround a university in north London. To contribute to the growing discussion on pedagogic dialogical spaces in teaching research methods, we provide empirical evidence of students' greater engagement via group work and the opportunity to draw on experiential knowledge in analyzing sources. Beyond stimulating students' engagement with research skills and methods, the data also show the value of our approach in helping students develop their analytical skills, particularly through a process of comparison and contrast.