Identifying client needs to provide optimal services is crucial in tourist destination management. The events held in tourist destinations may help to meet those needs and thus contribute to tourist ...satisfaction. As with product management, the creation of hierarchical catalogs to classify those events can aid event management. The events that can be found on the internet are listed in dispersed, heterogeneous sources, which makes direct classification a difficult, time-consuming task. The main aim of this work is to create a novel process for automatically classifying an eclectic variety of tourist events using a hierarchical taxonomy, which can be applied to support tourist destination management. Leveraging data science methods such as CRISP-DM, supervised machine learning, and natural language processing techniques, the automatic classification process proposed here allows the creation of a normalized catalog across very different geographical regions. Therefore, we can build catalogs with consistent filters, allowing users to find events regardless of the event categories assigned at source, if any. This is very valuable for companies that offer this kind of information across multiple regions, such as airlines, travel agencies or hotel chains. Ultimately, this tool has the potential to revolutionize the way companies and end users interact with tourist events information.
Display omitted
•Computational techniques are used to classify tourism destination events.•A Large Language Model (BERT) is used to get vectorial representations of events.•A method to automatically classify events is proposed to easy the adoption of standards.•There is great scope for extending this methodology to other applications.
The beginning of the coronavirus disease (COVID-19) epidemic dates back to December 31, 2019, when the first cases were reported in the People's Republic of China. In the Czech Republic, the first ...three cases of infection with the novel coronavirus were confirmed on March 1, 2020. The joint effort of state authorities and researchers gave rise to a unique team, which combines methodical knowledge of real-world processes with the know-how needed for effective processing, analysis, and online visualization of data.
Due to an urgent need for a tool that presents important reports based on valid data sources, a team of government experts and researchers focused on the design and development of a web app intended to provide a regularly updated overview of COVID-19 epidemiology in the Czech Republic to the general population.
The cross-industry standard process for data mining model was chosen for the complex solution of analytical processing and visualization of data that provides validated information on the COVID-19 epidemic across the Czech Republic. Great emphasis was put on the understanding and a correct implementation of all six steps (business understanding, data understanding, data preparation, modelling, evaluation, and deployment) needed in the process, including the infrastructure of a nationwide information system; the methodological setting of communication channels between all involved stakeholders; and data collection, processing, analysis, validation, and visualization.
The web-based overview of the current spread of COVID-19 in the Czech Republic has been developed as an online platform providing a set of outputs in the form of tables, graphs, and maps intended for the general public. On March 12, 2020, the first version of the web portal, containing fourteen overviews divided into five topical sections, was released. The web portal's primary objective is to publish a well-arranged visualization and clear explanation of basic information consisting of the overall numbers of performed tests, confirmed cases of COVID-19, COVID-19-related deaths, the daily and cumulative overviews of people with a positive COVID-19 case, performed tests, location and country of infection of people with a positive COVID-19 case, hospitalizations of patients with COVID-19, and distribution of personal protective equipment.
The online interactive overview of the current spread of COVID-19 in the Czech Republic was launched on March 11, 2020, and has immediately become the primary communication channel employed by the health care sector to present the current situation regarding the COVID-19 epidemic. This complex reporting of the COVID-19 epidemic in the Czech Republic also shows an effective way to interconnect knowledge held by various specialists, such as regional and national methodology experts (who report positive cases of the disease on a daily basis), with knowledge held by developers of central registries, analysts, developers of web apps, and leaders in the health care sector.
Coronavirus Disease 19 (COVID-19) adalah virus baru yang dapat menyebabkan infeksi saluran pernafasan. Virus ini berasal dari hewan yang dapat menular ke manusia melalui percikan ludahnya. Menurt ...data epidemiologis, rata-rata penderita virus ini berusia 15-80 tahun. Virus ini memiliki masa inkubasi 3-14 hari yang memiliki gejala awal yaitu demam tinggi, sesak napas, batuk dan pilek. Indonesia mwmiliki 2 kasus pertama pada 2 Maret 2020, Covid-19 meningkat secara teratur pada 29 Desember 2020 data menunjukkan 719.219 ribu orang dipastikan terjangkit Covid-19. Masalah yang diangkat dalam penelitian ini adalah bagaimana mengklasifikasikan risiko tertular virus Covid-19 dari gejala yang ditimbulkan. Tujuan dari penelitian ini adalah untuk mengetahui nilai akurasi dari klasifikasi resiko tertular virus Covid-19 berdasarkan instrument yang digunkan dari metode Cross Industry Standard Process for Data Mining (CRISP-DM). Dataset yang digunakan peniliti diambil dari website http://github.com/nshomron/covidpred. Penelitian ini menggunakan Algoritma Neural Network (NN) dengan bantuan alat Phyton, akurasi Algoritma Neural Ntwork (NN) diperoleh nilai sebesar 95%, artinya telah menunjukkan hasil klasifikasi yang baik. Peneliti juga menguji dengan Algoritma Logistic Regression namun nilai akurasi yang diperoleh tidak jauh berbeda dengan Algoritma NN, Algoritma Logistic Regression diperoleh akurasi nilai sebesar 94%.
En la actualidad las organizaciones generan mucha información día a día, misma que puede ser de utilidad para adquirir conocimiento y para la toma de decisiones. El proyecto se enfoca en la ...construcción de un almacén de datos, que contiene las variables correspondientes para poder aplicar estadísticas descriptivas y así visualizar patrones de comportamiento en la deserción académica de los alumnos de posgrado del Centro de Enseñanza LANIA; dichas variables se obtuvieron de un estudio del estado del arte. El trabajo brinda al lector un panorama de la construcción del almacén de datos, donde se destaca principalmente el proceso ETL, el cual se refiere a la extracción, transformación y carga de los datos. También se puede destacar la aplicación de la metodología CRISP-DM (Cross Industry Standard Process for Data Mining), donde se llevó a cabo cuatro fases de seis que contiene. Se pudo observar en los resultados que el 57.14% de los estudiantes que desertaron de la Maestría en Computación Aplicada (MCA) está en el rango de 23 a 25 años y el 40.47% de los estudiantes que desertaron de la Maestría en Redes y Sistemas Integrados (MRySI) está en el rango de 27 a 31 años.
Currently, data mining based on the application of detection of important patterns that allow making decisions according to cervical cancer is a problem that affects women from the age of 24 years ...and older. For this purpose, the Rapid Miner Studio tool was used for data analysis according to age. To perform this analysis, the knowledge discovery in databases (KDD) methodology was used according to the stages that this methodology follows, such as data selection, data preparation, data mining and evaluation and interpretation. On the other hand, the comparison of methodologies such as the standard intersectoral process for data mining (Crips-dm), KDD and sample, explore, modify, model, evaluate (Semma) is shown, which is separated by dimensions and in each dimension both methodologies are compared. In that sense, a graph was created comparing algorithmic models such as naive Bayes, decision tree, and rule induction. It is concluded that the most outstanding result was -1.424 located in cluster 4 in the attribute result date.
Calculus is one of the basic subject that must be studied at the computer science faculty of the informatics engineering study program. For some students, especially in the Faculty of Informatics ...Engineering, calculus is a subject that is considered quite difficult, even though this subject is important for them. And the resulted for some students having to repeat this subject. For this reason, predictions of calculus learning outcomes are carried out by applying the data mining process and using the C5.0 method for the prediction process based on the classification concept that will be carried out. This study applies the Cross Industry Standard Process for Data Mining (CRISP – DM) methodology with the C5.0 algorithm. The results are in the form of a decision tree (Decision tree) and the rules in it using the attributes of guardian, number of family members, status of residence, internet, activity, desire to continue study, the last education of parents (father and mother), parents' occupations, grades on assignments, UAS, and UTS. The C5.0 algorithm is able to predict the results of learning calculus. The evaluation results show that the applied C5.0 algorithm has an accuracy of 95%.
The S&R Baby Store store is a Small and Medium Enterprise (SME) that is engaged in baby equipment, but there is a lot of competition between small and medium enterprises (SMEs) who are engaged in the ...same field, so that many products sold are of course not all sold out, some are lacking. in demand. Therefore the S&R Baby Store store needs a good sales strategy in order to increase sales profit. This study discusses the application of data mining, using the K-Means Clustering algorithm with the CRISP-DM method. Implementation using RapidMiner 9.10 which is done by entering sales transaction data with a total of 4 attributes and forming 4 clusters consisting of very in demand, in demand, moderate in demand and less in demand. the second cluster with 944 products, the third cluster with 2 products, and the fourth cluster with 43 products. The results of the cluster above are the products sold are the best-selling product categories, then the results of the cluster are validated using the Davies-Bouldin Index with a DBI value generated from clustering of 0.560.
Data mining techniques have gained widespread adoption over the past decades, particularly in the financial services domain. To achieve sustained benefits from these techniques, organizations have ...adopted standardized processes for managing data mining projects, most notably CRISP-DM. Research has shown that these standardized processes are often not used as prescribed, but instead, they are extended and adapted to address a variety of requirements. To improve the understanding of how standardized data mining processes are extended and adapted in practice, this paper reports on a case study in a financial services organization, aimed at identifying perceived gaps in the CRISP-DM process and characterizing how CRISP-DM is adapted to address these gaps. The case study was conducted based on documentation from a portfolio of data mining projects, complemented by semi-structured interviews with project participants. The results reveal 18 perceived gaps in CRISP-DM alongside their perceived impact and mechanisms employed to address these gaps. The identified gaps are grouped into six categories. Next, they were triangulated and augmented with the gaps discovered in the other studies. Then, the requirements for adapting CRISP-DM to address the gaps were derived, and the directions for the potential adaptations were outlined.
The study presents a two-fold contribution. It provides practitioners with a structured set of gaps to be considered when applying CRISP-DM, or similar processes, in the financial services sector. Additionally, the study elicits the requirements and sketches the potential solutions to address these gaps. Also, the number of the identified gaps is generic and applicable to other sectors with similar concerns (e.g. privacy), such as telecom or e-commerce.
•Proposes an applicable data-driven framework for SMMEs.•Leveraging IDEF0 functional modelling and CRISP-DM business analytics method.•Real-world case study validates the proposed ...framework.•Identifies how SMMEs can leverage data-driven techniques effectively.
This paper aims to explore the challenges and opportunities for Small and Medium-sized Manufacturing Enterprises (SMMEs) in implementing data-driven techniques in their operations. SMMEs are often considered to be low and medium–low tech companies, even if they have machinery, as they still rely on traditional processes and manpower and lack any digital technology. Previous research has shown that medium–high and high-tech companies perform better, with higher rates of growth, than low and medium–low ones by a sustainable and significant margin. Therefore, there is a need for further research on the implementation of data-driven analytical methods and technologies in SMMEs that are both cost-effective and easy to use. This study proposes a conceptual analytical system that combines Integration Definition for Function Modeling 0 (IDEF0) and the Cross Industry Standard Process for Data Mining (CRISP-DM) business analytics method to develop a practical and widely applicable framework for data-driven techniques in manufacturing. We then developed a case study of an Indonesian company, where we collected real and direct information about specific objects, events, and activities related to particular aspects, including showing their key performance indicators (KPIs) through data dashboards, to evaluate the effectiveness of the proposed conceptual analytical system in improving operational management in SMMEs. The findings of this study provide valuable insights that can be used to develop effective solutions for SMMEs to leverage data-driven techniques and improve their operations. We also highlight implications of the findings for future research and practical applications. The final framework can be converted into a system that can be continuously and flexibly updated and customized, based on specific needs.