Using Semantic Web Technologies for Exploratory OLAP: A Survey Abello, Alberto; Romero, Oscar; Bach Pedersen, Torben ...
IEEE transactions on knowledge and data engineering,
2015-Feb.-1, 2015-2-1, 2015-02-01, Letnik:
27, Številka:
2
Journal Article, Publication
Recenzirano
Odprti dostop
This paper describes the convergence of some of the most influential technologies in the last few years, namely data warehousing (DW), on-line analytical processing (OLAP), and the Semantic Web (SW). ...OLAP is used by enterprises to derive important business-critical knowledge from data inside the company. However, the most interesting OLAP queries can no longer be answered on internal data alone, external data must also be discovered (most often on the web), acquired, integrated, and (analytically) queried, resulting in a new type of OLAP, exploratory OLAP. When using external data, an important issue is knowing the precise semantics of the data. Here, SW technologies come to the rescue, as they allow semantics (ranging from very simple to very complex) to be specified for web-available resources. SW technologies do not only support capturing the "passive" semantics, but also support active inference and reasoning on the data. The paper first presents a characterization of DW/OLAP environments, followed by an introduction to the relevant SW foundation concepts. Then, it describes the relationship of multidimensional (MD) models and SW technologies, including the relationship between MD models and SW formalisms. Next, the paper goes on to survey the use of SW technologies for data modeling and data provisioning, including semantic data annotation and semantic-aware extract, transform, and load (ETL) processes. Finally, all the findings are discussed and a number of directions for future research are outlined, including SW support for intelligent MD querying, using SW technologies for providing context to data warehouses, and scalability issues.
Diffuse alveolar damage is the histological hallmark of acute respiratory distress syndrome (ARDS). However, the chronology of histological lesions is not well established. We aimed to determine the ...time to onset of exudative or proliferative changes and end-stage fibrosis in ARDS.
We analysed all patients who died between Jan 1, 1991, and Dec 31, 2010, in the intensive-care unit at the Hospital Universitario de Getafe, Madrid, Spain, and who had a clinical autopsy. Patients had to have clinical criteria for ARDS at time of death and histological features of diffuse alveolar damage at autopsy examination. Capillary congestion and intra-alveolar oedema characterised the exudative phase whereas proliferation of alveolar cell type 2 or fibroblasts, or fibrosis characterised the proliferative phase.
We analysed 159 patients. The prevalence of exudative changes decreased over time, being reported in 74 (90%) of 82 patients with ARDS of less than 1 week duration, 40 (74%) of 54 patients with disease of 1-3 week duration, and only four (17%) of 23 patients with disease of longer than 3 weeks' duration (p<0·0001). The incidence of proliferative changes increased over time, and was reported in 44 (54%) of 82 patients with ARDS of less than 1-week duration, 42 (78%) of 54 patients with disease duration of 1-3 weeks, and 23 (100%) of 23 patients with disease duration longer than 3 weeks (p<0·0001). Fibrosis was noted in three (4%) of 82 patients with disease of less than 1 week duration, 13 (24%) of 54 patients with disease of 1-3-weeks' duration, and 14 (61%) of 23 patients with disease longer than 3-week duration (p<0·0001). Fibrosis was more frequent in ARDS of pulmonary origin than in that of extrapulmonary origin.
Histological features of the lungs were related to duration of ARDS. Within the first week of evolution, exudative changes were predominant and fibrosis was rarely noted. Beyond the third week of evolution, proliferative changes were noted in all patients and fibrosis in two-thirds of them. Treatments with a potential effect on inflammation or fibrosis, or both, should probably focus on the first week after the onset of ARDS.
None.
This paper presents a novel author profiling method specially aimed at classifying social network users into the multidimensional perspectives for social business intelligence (SBI) applications. In ...this scenario, being the user profiles defined on demand for each particular SBI application, we cannot assume the existence of labelled datasets for training purposes. Thus, we propose an unsupervised method to obtain the required labelled datasets for training the profile classifiers. Contrary to other author profiling approaches in the literature, we only make use of the users’ descriptions, which are usually part of the metadata posts. We exhaustively evaluated the proposed method under four different tasks for multidimensional author profiling along with state-of-the-art text classifiers. We achieved performances around 88% and 98% of F1 score for a gold standard and a silver standard datasets respectively. Additionally, we compare our results to other supervised approaches previously proposed for two of our tasks, getting very close performances despite using an unsupervised method. To the best of our knowledge, this is the first method designed to label user profiles in an unsupervised way for training profile classifiers with a similar performance to fully supervised ones.
This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query and retrieve web ...data, and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semi-structured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources and the XML extensions of On-Line Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich documents collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as, to identify open research lines.
A revised definition of clinical criteria for acute respiratory distress syndrome (ARDS), the Berlin definition, was recently established to classify patients according to their severity.
To evaluate ...the accuracy of these clinical criteria using diffuse alveolar damage (DAD) at autopsy as the reference standard.
All patients who died and had a clinical autopsy in our intensive care unit over a 20-year period (1991-2010) were included. Patients with clinical criteria for ARDS were identified from the medical charts and were classified as mild, moderate, or severe according to the Berlin definition using PaO2/FiO2 oxygenation criteria. Microscopic analysis from each pulmonary lobe was performed by two pathologists.
Among 712 autopsies analyzed, 356 patients had clinical criteria for ARDS at time of death, classified as mild (n = 49, 14%), moderate (n = 141, 40%), and severe (n = 166, 46%). Sensitivity was 89% and specificity 63% to identify ARDS using the Berlin definition. DAD was found in 159 of 356 (45%) patients with clinical criteria for ARDS (in 12, 40, and 58% of patients with mild, moderate, and severe ARDS, respectively). DAD was more frequent in patients who met clinical criteria for ARDS during more than 72 hours and was found in 69% of those with severe ARDS for 72 hours or longer.
Histopathological findings were correlated to severity and duration of ARDS. Using clinical criteria the revised Berlin definition for ARDS allowed the identification of severe ARDS of more than 72 hours as a homogeneous group of patients characterized by a high proportion of DAD.
In this study we present the results of a baseline study designed to assess the status of the raccoon (
Procyon lotor
) throughout Spain. The species was reported in 28 localities, mostly consisting ...of sporadic observations of single individuals. In central Spain an apparently thriving population of raccoons has been recently discovered. Our data confirmed the spread of feral raccoons throughout this region, where the species has already colonized about 100 km of streams and rivers. Predation on local fauna was also proved, and the first approximation for spatial movement and habitat use analyses in Spain is presented. Our results suggest that deliberate releases of raccoons by pet owners are an important cause for the existence of feral raccoons in Spain. Further research should focus on monitoring established individuals to collect detailed data on their population and reproductive parameters. Meanwhile, urgent actions should be taken to stop releases into the wild and to control and eradicate this unwelcome invasive species.
: Recent work in social network analysis has shown the usefulness of analysing and predicting outcomes from user-generated data in the context of Public Health Surveillance (PHS). Most of the ...proposals have focused on dealing with static datasets gathered from social networks, which are processed and mined off-line. However, little work has been done on providing a general framework to analyse the highly dynamic data of social networks from a multidimensional perspective. In this paper, we claim that such a framework is crucial for including social data in PHS systems.
We propose a dynamic multidimensional approach to deal with social data streams. In this approach, dynamic dimensions are continuously updated by applying unsupervised text mining methods. More specifically, we analyse the semantics and temporal patterns in posts for identifying relevant events, topics and users. We also define quality metrics to detect relevant user profiles. In this way, the incoming data can be further filtered to cope with the goals of PHS systems.
We have evaluated our approach over a long-term stream of Twitter. We show how the proposed quality metrics allow us to filter out the users that are out-of-domain as well as those with low quality in their messages. We also explain how specific user profiles can be identified through their descriptions. Finally, we illustrate how the proposed multidimensional model can be used to identify main events and topics, as well as to analyse their audience and impact.
The results show that the proposed dynamic multidimensional model is able to identify relevant events and topics and analyse them from different perspectives, which is especially useful for PHS systems.
Abstract
Social media platforms have become a new source of useful information for companies. Ensuring the business value of social media first requires an analysis of the quality of the relevant ...data and then the development of practical business intelligence solutions. This paper aims at building high-quality datasets for social business intelligence (SoBI). The proposed method offers an integrated and dynamic approach to identify the relevant quality metrics for each analysis domain. This method employs a novel multidimensional data model for the construction of cubes with impact measures for various quality metrics. In this model, quality metrics and indicators are organized in two main axes. The first one concerns the kind of facts to be extracted, namely: posts, users, and topics. The second axis refers to the quality perspectives to be assessed, namely: credibility, reputation, usefulness, and completeness. Additionally, quality cubes include a user-role dimension so that quality metrics can be evaluated in terms of the user business roles. To demonstrate the usefulness of this approach, the authors have applied their method to two separate domains: automotive business and natural disasters management. Results show that the trade-off between quantity and quality for social media data is focused on a small percentage of relevant users. Thus, data filtering can be easily performed by simply ranking the posts according to the quality metrics identified with the proposed method. As far as the authors know, this is the first approach that integrates both the extraction of analytical facts and the assessment of social media data quality in the same framework.