Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining ...and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields * Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
Web Mining Kumbhar, V. S.; Oza, K. S.; Kamat, R. K.
2016, 20220901, 2022-09-01, 2017-01-31
eBook
Web mining is the application of data mining strategies to excerpt learning from web information, i.e. web content, web structure, and web usage data. With the emergence of the web as the predominant ...and converging platform for communication, business and scholastic information dissemination, especially in the last five years, there are ever increasing research groups working on different aspects of web mining mainly in three directions. These are: mining of web content, web structure and web usage. In this context there are good number of frameworks and benchmarks related to the metrics of the websites which is certainly weighty for B2B, B2C and in general in any e-commerce paradigm. Owing to the popularity of this topic there are few books in the market, dealing more on such performance metrics and other related issues. This book, however, omits all such routine topics and lays more emphasis on the classification and clustering aspects of the websites in order to come out with the true perception of the websites in light of its usability. In nutshell, Web Mining: A Synergic Approach Resorting to Classifications and Clustering showcases an effective methodology for classification and clustering of web sites from their usability point of view. While the clustering and classification is accomplished by using an open source tool WEKA, the basic dataset for the selected websites has been emanated by using a free tool site-analyzer. As a case study, several commercial websites have been analyzed. The dataset preparation using site-analyzer and classification through WEKA by embedding different algorithms is one of the unique selling points of this book. This text projects a complete spectrum of web mining from its very inception through data mining and takes the reader up to the application level. Salient features of the book include: Literature review of research work in the area of web mining Business websites domain researched, and data collected using site-analyzer to
Povzetek: Predstavljena je metoda rudarja multimedijev z globokim učenjem, ki temelji na lastnostih vsebine síik. Uporablja se za različne naloge računalniškega vida, kot so segmentacija, ...klasifikacija in zaznavanje objektov. Preizkušena je bila na standardnem mul?ime diainem naboru podatkov.
Mining the Web Chakrabarti, Soumen
2002, 2002-10-16
eBook
Mining the Web: Discovering Knowledge from Hypertext Data is the first book devoted entirely to techniques for producing knowledge from the vast body of unstructured Web data. Building on an initial ...survey of infrastructural issues—including Web crawling and indexing—Chakrabarti examines low-level machine learning techniques as they relate specifically to the challenges of Web mining. He then devotes the final part of the book to applications that unite infrastructure and analysis to bring machine learning to bear on systematically acquired and stored data. Here the focus is on results: the strengths and weaknesses of these applications, along with their potential as foundations for further progress. From Chakrabarti's work—painstaking, critical, and forward-looking—readers will gain the theoretical and practical understanding they need to contribute to the Web mining effort.* A comprehensive, critical exploration of statistics-based attempts to make sense of Web Mining. * Details the special challenges associated with analyzing unstructured and semi-structured data. * Looks at how classical Information Retrieval techniques have been modified for use with Web data. * Focuses on today's dominant learning methods: clustering and classification, hyperlink analysis, and supervised and semi-supervised learning. * Analyzes current applications for resource discovery and social network analysis. * An excellent way to introduce students to especially vital applications of data mining and machine learning technology.
Purpose
The purpose of this paper is to present a concept of the protocol for public registries based on blockchain. New database protocol aims to use the benefits of blockchain technologies and ...ensure their interoperability.
Design/methodology/approach
This paper is framed with design science research (DSR). The primary method is exaptation, i.e. adoption of solutions from other fields. The research is looking into existing technologies which are applied here as elements of the protocol: Name-Value Storage (NVS), Berkley DB, RAID protocol, among others. The choice of NVS as a reference technology for creating a database over blockchain is based on the analysis and comparison with two other similar technologies Bigchain and Amazon QLDB.
Findings
The proposed mechanism allows creating a standard database over a bundle of distributed ledgers. It ensures a blockchain agnostic approach and uses the benefits of various blockchain technologies in one ecosystem. In this scheme, blockchains play the role of journal storages (immutable log), whereas the overlaid database is the indexed storage. The distinctive feature of such a system is that in blockchain, users can perform peer-to-peer transactions directly in the ledger using blockchain native mechanism of user access management with public-key cryptography (blockchain does not require to administrate its database).
Originality/value
This paper presents a new method of creating a public peer-to-peer database across a bundle of distributed ledgers.
A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over ...multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using both synthetic and real web-accessible data and show that parallel algorithms can be significantly more efficient than their sequential counterparts.
Purpose
Several genealogical databases are now publicly available on the Web. The information stored in such databases is not only of interest for genealogical research but might also be used in ...broader historical studies. As a case study, this paper aims to explore what a crowdsourced genealogical online database can tell about income inequality in Denmark during the First World War.
Design/methodology/approach
The analysis is based on 55,000 family-level records on the payment of local income taxes in a major Danish provincial town (Esbjerg) from a publicly available database on the website of The Esbjerg City Archives combined with official statistics from Statistics Denmark.
Findings
Denmark saw a sharp increase in income inequality during the First World War. The analysis shows that the new riches during the First World War in a harbour city such as Esbjerg were not “goulash barons” or stock-market speculators but fishermen. There were no fishermen in the top 1per cent of the income distribution in 1913. In 1917, more than 37 per cent of the family heads in this part of the income distribution were fishermen.
Originality/value
The paper illustrates how large-scale microdata from publicly available genealogical Web databases might be used to gain new insights into broader historical issues.
Wolfram, Alström and Bardet-Biedl (WABB) syndromes are rare diseases with overlapping features of multiple sensory and metabolic impairments, including diabetes mellitus, which have caused diagnostic ...confusion. There are as yet no specific treatments available, little or no access to well characterized cohorts of patients, and limited information on the natural history of the diseases. We aim to establish a Europe-wide registry for these diseases to inform patient care and research.
EURO-WABB is an international multicenter large-scale observational study capturing longitudinal clinical and outcome data for patients with WABB diagnoses. Three hundred participants will be recruited over 3 years from different sites throughout Europe. Comprehensive clinical, genetic and patient experience data will be collated into an anonymized disease registry. Data collection will be web-based, and forms part of the project's Virtual Research and Information Environment (VRIE). Participants who haven't undergone genetic diagnostic testing for their condition will be able to do so via the project.
The registry data will be used to increase the understanding of the natural history of WABB diseases, to serve as an evidence base for clinical management, and to aid the identification of opportunities for intervention to stop or delay the progress of the disease. The detailed clinical characterisation will allow inclusion of patients into studies of novel treatment interventions, including targeted interventions in small scale open label studies; and enrolment into multi-national clinical trials. The registry will also support wider access to genetic testing, and encourage international collaborations for patient benefit.