Classifying e-mails into distinct labels can have a great impact on customer support. By using machine learning to label e-mails, the system can set up queues containing e-mails of a specific ...category. This enables support personnel to handle request quicker and more easily by selecting a queue that match their expertise. This study aims to improve a manually defined rule-based algorithm, currently implemented at a large telecom company, by using machine learning. The proposed model should have higher
F
1
-score and classification rate. Integrating or migrating from a manually defined rule-based model to a machine learning model should also reduce the administrative and maintenance work. It should also make the model more flexible. By using the frameworks, TensorFlow, Scikit-learn and Gensim, the authors conduct a number of experiments to test the performance of several common machine learning algorithms, text-representations, word embeddings to investigate how they work together. A long short-term memory network showed best classification performance with an
F
1
-score of 0.91. The authors conclude that long short-term memory networks outperform other non-sequential models such as support vector machines and AdaBoost when predicting labels for e-mails. Further, the study also presents a Web-based interface that were implemented around the LSTM network, which can classify e-mails into 33 different labels.
•A decision support system for residential burglary analysis is presented.•A systematic data collection method for residential burglaries is introduced.•Clustering is used to group residential ...burglaries, better than a random guesser.•Target characteristics or spatial distance are the best performing distance metrics.
According to the Swedish National Council for Crime Prevention, law enforcement agencies solved approximately three to five percent of the reported residential burglaries in 2012. Internationally, studies suggest that a large proportion of crimes are committed by a minority of offenders. Law enforcement agencies, consequently, are required to detect series of crimes, or linked crimes. Comparison of crime reports today is difficult as no systematic or structured way of reporting crimes exists, and no ability to search multiple crime reports exist.
This study presents a systematic data collection method for residential burglaries. A decision support system for comparing and analysing residential burglaries is also presented. The decision support system consists of an advanced search tool and a plugin-based analytical framework. In order to find similar crimes, law enforcement officers have to review a large amount of crimes. The potential use of the cut-clustering algorithm to group crimes to reduce the amount of crimes to review for residential burglary analysis based on characteristics is investigated. The characteristics used are modus operandi, residential characteristics, stolen goods, spatial similarity, or temporal similarity.
Clustering quality is measured using the modularity index and accuracy is measured using the rand index. The clustering solution with the best quality performance score were residential characteristics, spatial proximity, and modus operandi, suggesting that the choice of which characteristic to use when grouping crimes can positively affect the end result. The results suggest that a high quality clustering solution performs significantly better than a random guesser. In terms of practical significance, the presented clustering approach is capable of reduce the amounts of cases to review while keeping most connected cases. While the approach might miss some connections, it is also capable of suggesting new connections. The results also suggest that while crime series clustering is feasible, further investigation is needed.
We investigated the role of the drug resistance‐related proteins LRP, MRP and Pgp and the apoptotic suppressor, bcl‐2, in relation to other clinical characteristics, with respect to response and ...survival in 91 patients with newly diagnosed AML, treated with standard chemotherapy. Multivariate analysis showed that poor response to chemotherapy was associated with increasing age (P = 0.0004), LRP expression (P = 0.0001) and Pgp function (P = 0.015). The significant predictors of both leukaemia‐free survival (LFS) and overall survival (OS) were LRP (LFS, P = 0.01; OS, P = 0.0001), Pgp function (LFS, P = 0.0001; OS, P = 0.0003) and cytogenetic abnormalities (LFS, P = 0.0001; OS, P = 0.0005). Patients with the lowest expression of LRP and Pgp function and favourable karyotype (group I) had an LFS of 30.2 months compared to 8.5 months in the group with the highest expression of LRP and Pgp and poor prognosis karyotype (group III, P = 0.002). OS decreased from 75.4 months in group I to 7.9 months in group III patients (P < 0.0001). Neither MRP nor bcl‐2 were significantly associated with chemotherapy response and survival. Correlations were found between increasing expression of LRP and older age (P = 0.05) and an unfavourable karyotype (P = 0.005), but these variables were independent of each other in analysis of treatment response and patient survival. Our findings suggest that both LRP and Pgp are clinically relevant drug‐resistance proteins and it may be necessary to modulate both LRP and Pgp functions in order to reverse the multidrug resistance phenotype in AML.
Streaming data services, such as video-on-demand, are getting increasingly more popular, and they are expected to account for more than 80% of all Internet traffic in 2020. In this context, it is ...important for streaming service providers to detect deviations in service requests due to issues or changing end-user behaviors in order to ensure that end-users experience high quality in the provided service. Therefore, in this study we investigate to what extent sequence-based Markov models can be used for anomaly detection by means of the end-users’ control sequences in the video streams, i.e., event sequences such as play, pause, resume and stop. This anomaly detection approach is further investigated over three different temporal resolutions in the data, more specifically: 1 h, 1 day and 3 days. The proposed anomaly detection approach supports anomaly detection in ongoing streaming sessions as it recalculates the probability for a specific session to be anomalous for each new streaming control event that is received. Two experiments are used for measuring the potential of the approach, which gives promising results in terms of precision, recall,
F
1
-score and Jaccard index when compared to
k
-means clustering of the sessions.
Law enforcement agencies, as well as researchers rely on temporal analysis methods in many crime analyses, e.g., spatio-temporal analyses. A number of temporal analysis methods are being used, but a ...structured comparison in different configurations is yet to be done. This study aims to fill this research gap by comparing the accuracy of five existing, and one novel, temporal analysis methods in approximating offense times for residential burglaries that often lack precise time information. The temporal analysis methods are evaluated in eight different configurations with varying temporal resolution, as well as the amount of data (number of crimes) available during analysis. A dataset of all Swedish residential burglaries reported between 2010 and 2014 is used (N = 103,029). From that dataset, a subset of burglaries with known precise offense times is used for evaluation. The accuracy of the temporal analysis methods in approximating the distribution of burglaries with known precise offense times is investigated. The aoristic and the novel aoristic_ext method perform significantly better than three of the traditional methods. Experiments show that the novel aoristic_ext method was most suitable for estimating crime frequencies in the day-of-the-year temporal resolution when reduced numbers of crimes were available during analysis. In the other configurations investigated, the aoristic method showed the best results. The results also show the potential from temporal analysis methods in approximating the temporal distributions of residential burglaries in situations when limited data are available.
The importance of cellular networks continuously increases as we assume ubiquitous connectivity in our daily lives. As a result, the underlying core telecom systems have very high reliability and ...availability requirements, that are sometimes hard to meet. This study presents a proactive approach that could aid satisfying these high requirements on reliability and availability by predicting future base station alarms. A data set containing 231 internal performance measures from cellular (4G) base stations is correlated with a data set containing base station alarms. Next, two experiments are used to investigate (i) the alarm prediction performance of six machine learning models, and (ii) how different predict-ahead times (ranging from 10 min to 48 hours) affect the predictive performance. A 10-fold cross validation evaluation approach and statistical analysis suggested that the Random Forest models showed best performance. Further, the results indicate the feasibility of predicting severe alarms one hour in advance with a precision of 0.812 (±0.022, 95 % CI), recall of 0.619 (±0.027) and F 1 -score of 0.702 (±0.022). A model interpretation package, ELI5, was used to identify the most influential features in order to gain model insight. Overall, the results are promising and indicate the potential of an early-warning system that enables a proactive means for achieving high reliability and availability requirements.
OBJECTIVES: The present study aims to extend current research on how offenders’ modus operandi (MO) can be used in crime linkage, by investigating the possibility to automatically estimate offenders’ ...risk exposure and level of pre-crime preparation for residential burglaries. Such estimations can assist law enforcement agencies when linking crimes into series and thus provide a more comprehensive understanding of offenders and targets, based on the combined knowledge and evidence collected from different crime scenes. METHODS : Two criminal profilers manually rated offenders’ risk exposure and level of pre-crime preparation for 50 burglaries each. In an experiment we then analyzed to what extent 16 machine-learning algorithms could generalize both offenders’ risk exposure and preparation scores from the criminal profilers’ ratings onto 15,598 residential burglaries. All included burglaries contain structured and feature-rich crime descriptions which learning algorithms can use to generalize offenders’ risk and preparation scores from. RESULTS: Two models created by Naïve Bayes-based algorithms showed best performance with an AUC of 0.79 and 0.77 for estimating offenders’ risk and preparation scores respectively. These algorithms were significantly better than most, but not all, algorithms. Both scores showed promising distinctiveness between linked series, as well as consistency for crimes within series compared to randomly sampled crimes. CONCLUSIONS: Estimating offenders’ risk exposure and pre-crime preparation can complement traditional MO characteristics in the crime linkage process. The estimations are also indicative to function for cross-category crimes that otherwise lack comparable MO. Future work could focus on increasing the number of manually rated offenses as well as fine-tuning the Naïve Bayes algorithm to increase its estimation performance.