•We tackle the stock market prediction using both technical and fundamental analysis.•We combine technical and fundamental analysis through data science.•Result on the twenty most capitalized ...NASDAQ100 companies supports our proposal.
Stock market prediction is one of the most challenging problems which has been distressing both researchers and financial analysts for more than half a century. To tackle this problem, two completely opposite approaches, namely technical and fundamental analysis, emerged. Technical analysis bases its predictions on mathematical indicators constructed on the stocks price, while fundamental analysis exploits the information retrieved from news, profitability, and macroeconomic factors. The competition between these schools of thought has led to many interesting achievements, however, to date, no satisfactory solution has been found. Our work aims to combine both technical and fundamental analysis through the application of data science and machine learning techniques. In this paper, the stock market prediction problem is mapped in a classification task of time series data. Indicators of technical analysis and the sentiment of news articles are both exploited as input. The outcome is a robust predictive model able to forecast the trend of a portfolio composed by the twenty most capitalized companies listed in the NASDAQ100 index. As a proof of real effectiveness of our approach, we exploit the predictions to run a high frequency trading simulation reaching more than 80% of annualized return. This project represents a step forward to combine technical and fundamental analysis and provides a starting point for developing new trading strategies.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Applying computational statistics or machine learning methods to data is a key component of many scientific studies, in any field, but alone might not be sufficient to generate robust and reliable ...outcomes and results. Before applying any discovery method, preprocessing steps are necessary to prepare the data to the computational analysis. In this framework, data cleaning and feature engineering are key pillars of any scientific study involving data analysis and that should be adequately designed and performed since the first phases of the project. We call "feature" a variable describing a particular trait of a person or an observation, recorded usually as a column in a dataset. Even if pivotal, these data cleaning and feature engineering steps sometimes are done poorly or inefficiently, especially by beginners and unexperienced researchers. For this reason, we propose here our quick tips for data cleaning and feature engineering on how to carry out these important preprocessing steps correctly avoiding common mistakes and pitfalls. Although we designed these guidelines with bioinformatics and health informatics scenarios in mind, we believe they can more in general be applied to any scientific area. We therefore target these guidelines to any researcher or practitioners wanting to perform data cleaning or feature engineering. We believe our simple recommendations can help researchers and scholars perform better computational analyses that can lead, in turn, to more solid outcomes and more reliable discoveries.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
This work presents the Transition-Aware Human Activity Recognition (TAHAR) system architecture for the recognition of physical activities using smartphones. It targets real-time classification with a ...collection of inertial sensors while addressing issues regarding the occurrence of transitions between activities and unknown activities to the learning algorithm. We propose two implementations of the architecture which differ in their prediction technique as they deal with transitions either by directly learning them or by considering them as unknown activities. This is accomplished by combining the probabilistic output of consecutive activity predictions of a Support Vector Machine (SVM) with a heuristic filtering approach. The architecture is validated over three case studies that involve data from people performing a broad spectrum of activities (up to 33), while carrying smartphones or wearable sensors. Results show that TAHAR outperforms state-of-the-art baseline works and reveal the main advantages of the architecture.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
Sepsis is a life-threatening clinical condition that happens when the patient's body has an excessive reaction to an infection, and should be treated in one hour. Due to the urgency of sepsis, ...doctors and physicians often do not have enough time to perform laboratory tests and analyses to help them forecast the consequences of the sepsis episode. In this context, machine learning can provide a fast computational prediction of sepsis severity, patient survival, and sequential organ failure by just analyzing the electronic health records of the patients. Also, machine learning can be employed to understand which features in the medical records are more predictive of sepsis severity, of patient survival, and of sequential organ failure in a fast and non-invasive way.
In this study, we analyzed a dataset of electronic health records of 364 patients collected between 2014 and 2016. The medical record of each patient has 29 clinical features, and includes a binary value for survival, a binary value for septic shock, and a numerical value for the sequential organ failure assessment (SOFA) score. We disjointly utilized each of these three factors as an independent target, and employed several machine learning methods to predict it (binary classifiers for survival and septic shock, and regression analysis for the SOFA score). Afterwards, we used a data mining approach to identify the most important dataset features in relation to each of the three targets separately, and compared these results with the results achieved through a standard biostatistics approach.
Our results showed that machine learning can be employed efficiently to predict septic shock, SOFA score, and survival of patients diagnoses with sepsis, from their electronic health records data. And regarding clinical feature ranking, our results showed that Random Forests feature selection identified several unexpected symptoms and clinical components as relevant for septic shock, SOFA score, and survival. These discoveries can help doctors and physicians in understanding and predicting septic shock. We made the analyzed dataset and our developed software code publicly available online.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Liver cancer kills approximately 800 thousand people annually worldwide, and its most common subtype is hepatocellular carcinoma (HCC), which usually affects people with cirrhosis. Predicting ...survival of patients with HCC remains an important challenge, especially because technologies needed for this scope are not available in all hospitals. In this context, machine learning applied to medical records can be a fast, low-cost tool to predict survival and detect the most predictive features from health records. In this study, we analyzed medical data of 165 patients with HCC: we employed computational intelligence to predict their survival, and to detect the most relevant clinical factors able to discriminate survived from deceased cases. Afterwards, we compared our data mining results with those obtained through statistical tests and scientific literature findings. Our analysis revealed that blood levels of alkaline-phosphatase (ALP), alpha-fetoprotein (AFP), and hemoglobin are the most effective prognostic factors in this dataset. We found literature supporting association of these three factors with hepatoma, even though only AFP has been used in a prognostic index. Our results suggest that ALP and hemoglobin can be candidates for future HCC prognostic indexes, and that physicians could focus on ALP, AFP, and hemoglobin when studying HCC records.
Activity-Based Computing 1 aims to capture the state of the user and its environment by exploiting heterogeneous sensors in order to provide adaptation to exogenous computing resources. When these ...sensors are attached to the subject’s body, they permit continuous monitoring of numerous physiological signals. This has appealing use in healthcare applications, e.g. the exploitation of Ambient Intelligence (AmI) in daily activity monitoring for elderly people. In this paper, we present a system for human physical Activity Recognition (AR) using smartphone inertial sensors. As these mobile phones are limited in terms of energy and computing power, we propose a novel hardware-friendly approach for multiclass classification. This method adapts the standard Support Vector Machine (SVM) and exploits fixed-point arithmetic for computational cost reduction. A comparison with the traditional SVM shows a significant improvement in terms of computational costs while maintaining similar accuracy, which can contribute to develop more sustainable systems for AmI.
Full text
Available for:
FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
In this paper, we deal with the classical Statistical Learning Theory's problem of bounding, with high probability, the true risk R(h) of a hypothesis
chosen from a set H of
hypotheses. The Union ...Bound (UB) allows one to state that PLR^(h),δqh≤R(h)≤UR^(h),δph≥1-δ where R^(h) is the empirical errors, if it is possible to prove that P{R(h)≥L(R^(h),δ)}≥1-δ and P{R(h)≤U(R^(h),δ)}≥1-δ, when
, qh, and ph are chosen before seeing the data such that qh,ph∈0,1 and ∑h∈H(qh+ph)=1. If no
information is available qh and ph are set to 12m, namely equally distributed. This approach gives poor results since, as a matter of fact, a learning procedure targets just particular hypotheses, namely hypotheses with small empirical error, disregarding the others. In this work we set the qh and ph in a distribution-dependent way increasing the probability of being chosen to function with small true risk. We will call this proposal Distribution-Dependent Weighted UB (DDWUB) and we will retrieve the sufficient conditions on the choice of qh and ph that state that DDWUB outperforms or, in the worst case, degenerates into UB. Furthermore, theoretical and numerical results will show the applicability, the validity, and the potentiality of DDWUB.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
In many decision-making scenarios, ranging from recreational activities to healthcare and policing, the use of artificial intelligence coupled with the ability to learn from historical data is ...becoming ubiquitous. This widespread adoption of automated systems is accompanied by the increasing concerns regarding their ethical implications. Fundamental rights, such as the ones that require the preservation of privacy, do not discriminate based on sensible attributes (e.g., gender, ethnicity, political/sexual orientation), or require one to provide an explanation for a decision, are daily undermined by the use of increasingly complex and less understandable yet more accurate learning algorithms. For this purpose, in this work, we work toward the development of systems able to ensure trustworthiness by delivering privacy, fairness, and explainability by design. In particular, we show that it is possible to simultaneously learn from data while preserving the privacy of the individuals thanks to the use of Homomorphic Encryption, ensuring fairness by learning a fair representation from the data, and ensuring explainable decisions with local and global explanations without compromising the accuracy of the final models. We test our approach on a widespread but still controversial application, namely face recognition, using the recent FairFace dataset to prove the validity of our approach.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK