Self-tuning is a feature of autonomic databases that includes the problem of automatic schema design. It aims at providing an optimized schema that increases the overall database performance. While ...in relational databases automatic schema design focuses on the automated design of the physical schema, in NoSQL databases all levels of representation are considered: conceptual, logical, and physical. This is mainly because the latter are mostly schema-less and lack a standard schema design procedure as is the case for SQL databases. In this work, we carry out a systematic literature survey on automatic schema design in both SQL and NoSQL databases. We identify the levels of representation and the methods that are used for the schema design problem, and we present a novel taxonomy to classify and compare different schema design solutions. Our comprehensive analysis demonstrates that, despite substantial progress that has been made, schema design is still a developing field and considerable challenges need to be addressed, notably for NoSQL databases. We highlight the most important findings from the results of our analysis and identify areas for future research work.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, SAZU, UL, UM, UPUK
Information Modeling and Relational Databases, Second Edition, provides an introduction to ORM (Object-Role Modeling)and much more. In fact, it is the only book to go beyond introductory coverage and ...provide all of the in-depth instruction you need to transform knowledge from domain experts into a sound database design. This book is intended for anyone with a stake in the accuracy and efficacy of databases: systems analysts, information modelers, database designers and administrators, and programmers. Terry Halpin, a pioneer in the development of ORM, blends conceptual information with practical instruction that will let you begin using ORM effectively as soon as possible. Supported by examples, exercises, and useful background information, his step-by-step approach teaches you to develop a natural-language-based ORM model, and then, where needed, abstract ER and UML models from it. This book will quickly make you proficient in the modeling technique that is proving vital to the development of accurate and efficient databases that best meet real business objectives. * Presents the most indepth coverage of Object-Role Modeling available anywhere, including a thorough update of the book for ORM2, as well as UML2 and E-R (Entity-Relationship) modeling * Includes clear coverage of relational database concepts, and the latest developments in SQL and XML, including a new chapter on the impact of XML on information modeling, exchange and transformation * New and improved case studies and exercises are provided for many topics
Cleaning Denial Constraint Violations through Relaxation Giannakopoulou, Stella; Karpathiotakis, Manos; Ailamaki, Anastasia
Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data,
06/2020
Conference Proceeding
Open access
Data cleaning is a time-consuming process that depends on the data analysis that users perform. Existing solutions treat data cleaning as a separate offline process that takes place before analysis ...begins. Applying data cleaning before analysis assumes a priori knowledge of the inconsistencies and the query workload, thereby requiring effort on understanding and cleaning the data that is unnecessary for the analysis. We propose an approach that performs probabilistic repair of denial constraint violations on-demand, driven by the exploratory analysis that users perform. We introduce Daisy, a system that seamlessly integrates data cleaning into the analysis by relaxing query results. Daisy executes analytical query-workloads over dirty data by weaving cleaning operators into the query plan. Our evaluation shows that Daisy adapts to the workload and outperforms traditional offline cleaning on both synthetic and real-world workloads.
Choose the right Azure data service and correct model design for successful implementation of your data model with the help of this hands-on guideKey FeaturesDesign a cost-effective, performant, and ...scalable database in AzureChoose and implement the most suitable design for a databaseDiscover how your database can scale with growing data volumes, concurrent users, and query complexityBook DescriptionData is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation. Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory. By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.What you will learnModel relational database using normalization, dimensional, or Data Vault modelingProvision and implement Azure SQL DB and Azure Synapse SQL PoolsDiscover how to model a Data Lake and implement it using Azure StorageModel a NoSQL database and provision and implement an Azure Cosmos DBUse Azure Data Factory to implement ETL/ELT processesCreate a star schema model using dimensional modelingWho this book is forThis book is for business intelligence developers and consultants who work on (modern) cloud data warehousing and design and implement databases. Beginner-level knowledge of cloud data management is expected.
Pathway analysis of PTM data sets is typically performed at a gene-centric level because of the lack of appropriately curated PTM signature databases. We have developed a PTM signatures database ...(PTMsigDB) providing curated phosphorylation signatures of kinases, perturbations and signaling pathways to enable site-specific PTM signature enrichment analysis (PTM-SEA). Application of PTM-SEA to phosphoproteomes of several cell lines perturbed with growth factors, cell cycle inhibitors, or a specific PI3K inhibitor demonstrated the potential of our site centric approach to study dysregulated pathways in cancers.
Display omitted
Highlights
•Database of PTM site-specific phosphorylation signatures of kinases, perturbations and signaling pathways (PTMsigDB).•PTM signature enrichment analysis (PTM-SEA) outperformed gene-centric analysis in detection of EGF induced phospho signaling events.•PI3K perturbation signatures were readily detected in PI3Ka inhibited human breast cancer cells.•PTMsigDB and PTM-SEA can be freely accessed at https://github.com/broadinstitute/ssGSEA2.0.
Signaling pathways are orchestrated by post-translational modifications (PTMs) such as phosphorylation. However, pathway analysis of PTM data sets generated by mass spectrometry (MS)-based proteomics is typically performed at a gene-centric level because of the lack of appropriately curated PTM signature databases and bioinformatic tools that leverage PTM site-specific information. Here we present the first version of PTMsigDB, a database of modification site-specific signatures of perturbations, kinase activities and signaling pathways curated from more than 2,500 publications. We adapted the widely used single sample Gene Set Enrichment Analysis approach to utilize PTMsigDB, enabling PTMSignature Enrichment Analysis (PTM-SEA) of quantitative MS data. We used a well-characterized data set of epidermal growth factor (EGF)-perturbed cancer cells to evaluate our approach and demonstrated better representation of signaling events compared with gene-centric methods. We then applied PTM-SEA to analyze the phosphoproteomes of cancer cells treated with cell-cycle inhibitors and detected mechanism-of-action specific signatures of cell cycle kinases. We also applied our methods to analyze the phosphoproteomes of PI3K-inhibited human breast cancer cells and detected signatures of compounds inhibiting PI3K as well as targets downstream of PI3K (AKT, MAPK/ERK) covering a substantial fraction of the PI3K pathway. PTMsigDB and PTM-SEA can be freely accessed at https://github.com/broadinstitute/ssGSEA2.0.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPUK, ZAGLJ, ZRSKP
Designing a Low-Code CRUD framework Văduva, Bogdan; Vălean, Honoriu
Carpathian journal of electronic and computer engineering,
09/2021, Volume:
14, Issue:
1
Journal Article
Peer reviewed
Open access
Nowadays programmers write source code for inserting, editing and deleting records of a relational table. The majority of commercial relational databases include a specific management tool that ...offers such possibilities and most database programmers take this ability as granted. When it comes to real life applications, programmers use Object Oriented (OO) paradigm to build user friendly windows/screens/forms for database operations. The current work shows a different approach using a Low-code CRUD (Create, Read, Update, Delete) framework. Views and guidelines of how to design a Low-code CRUD framework will be detailed. “Low-code” motivation is due to the fact that the new framework will provide the ability to use less code in order to build fast and efficient complex applications. It will be up to the reader to envision a specific framework.
While a significant number of databases are deployed in cloud environments, pushing part or all data storage and querying planes closer to their sources (i.e., to the edge) can provide advantages in ...latency, connectivity, privacy, energy, and scalability. This article dissects the advantages provided by databases in edge and fog environments by surveying application domains and discussing the key drivers for pushing database systems to the edge. At the same time, it also identifies the main challenges faced by developers in this new environment and analyzes the mechanisms employed to deal with them. By providing an overview of the current state of edge and fog databases, this survey provides valuable insights into future research directions.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, SAZU, UL, UM, UPUK
The identification of microplastics becomes increasingly challenging with decreasing particle size and increasing sample heterogeneity. The analysis of microplastic samples by Fourier transform ...infrared (FTIR) spectroscopy is a versatile, bias-free tool to succeed at this task. In this study, we provide an adaptable reference database, which can be applied to single-particle identification as well as methods like chemical imaging based on FTIR microscopy. The large datasets generated by chemical imaging can be further investigated by automated analysis, which does, however, require a carefully designed database. The novel database design is based on the hierarchical cluster analysis of reference spectra in the spectral range from 3600 to 1250 cm
−1
. The hereby generated database entries were optimized for the automated analysis software with defined reference datasets. The design was further tested for its customizability with additional entries. The final reference database was extensively tested on reference datasets and environmental samples. Data quality by means of correct particle identification and depiction significantly increased compared to that of previous databases, proving the applicability of the concept and highlighting the importance of this work. Our novel database provides a reference point for data comparison with future and previous microplastic studies that are based on different databases.
Graphical abstract
ᅟ
Full text
Available for:
DOBA, EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, IZUM, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UILJ, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Data Platform for Machine Learning Agrawal, Pulkit; Arya, Rajat; Bindal, Aanchal ...
Proceedings of the 2019 International Conference on Management of Data,
06/2019
Conference Proceeding
Open access
In this paper, we present a purpose-built data management system, MLdp, for all machine learning (ML) datasets. ML applications pose some unique requirements different from common conventional data ...processing applications, including but not limited to: data lineage and provenance tracking, rich data semantics and formats, integration with diverse ML frameworks and access patterns, trial-and-error driven data exploration and evolution, rapid experimentation, reproducibility of the model training, strict compliance and privacy regulations, etc. Current ML systems/services, often named MLaaS, to-date focus on the ML algorithms, and offer no integrated data management system. Instead, they require users to bring their own data and to manage their own data on either blob storage or on file systems. The burdens of data management tasks, such as versioning and access control, fall onto the users, and not all compliance features, such as terms of use, privacy measures, and auditing, are available. MLdp offers a minimalist and flexible data model for all varieties of data, strong version management to guarantee re-producibility of ML experiments, and integration with major ML frameworks. MLdp also maintains the data provenance to help users track lineage and dependencies among data versions and models in their ML pipelines. In addition to table-stake features, such as security, availability and scalability, MLdp's internal design choices are strongly influenced by the goal to support rapid ML experiment iterations, which cycle through data discovery, data exploration, feature engineering, model training, model evaluation, and back to data discovery. The contributions of this paper are: 1) to recognize the needs and to call out the requirements of an ML data platform, 2) to share our experiences in building MLdp by adopting existing database technologies to the new problem as well as by devising new solutions, and 3) to call for actions from our communities on future challenges.
In social network big data scheduling, it is easy for target data to conflict in the same data node. Of the different kinds of entropy measures, this paper focuses on the optimization of target ...entropy. Therefore, this paper presents an optimized method for the scheduling of big data in social networks and also takes into account each task’s amount of data communication during target data transmission to construct a big data scheduling model. Firstly, the task scheduling model is constructed to solve the problem of conflicting target data in the same data node. Next, the necessary conditions for the scheduling of tasks are analyzed. Then, the a periodic task distribution function is calculated. Finally, tasks are scheduled based on the minimum product of the corresponding resource level and the minimum execution time of each task is calculated. Experimental results show that our optimized scheduling model quickly optimizes the scheduling of social network data and solves the problem of strong data collision.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK