Background Accurate methods to preoperatively characterize adnexal tumors are pivotal for optimal patient management. A recent metaanalysis concluded that the International Ovarian Tumor Analysis ...algorithms such as the Simple Rules are the best approaches to preoperatively classify adnexal masses as benign or malignant. Objective We sought to develop and validate a model to predict the risk of malignancy in adnexal masses using the ultrasound features in the Simple Rules. Study Design This was an international cross-sectional cohort study involving 22 oncology centers, referral centers for ultrasonography, and general hospitals. We included consecutive patients with an adnexal tumor who underwent a standardized transvaginal ultrasound examination and were selected for surgery. Data on 5020 patients were recorded in 3 phases from 2002 through 2012. The 5 Simple Rules features indicative of a benign tumor (B-features) and the 5 features indicative of malignancy (M-features) are based on the presence of ascites, tumor morphology, and degree of vascularity at ultrasonography. Gold standard was the histopathologic diagnosis of the adnexal mass (pathologist blinded to ultrasound findings). Logistic regression analysis was used to estimate the risk of malignancy based on the 10 ultrasound features and type of center. The diagnostic performance was evaluated by area under the receiver operating characteristic curve, sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR–), positive predictive value (PPV), negative predictive value (NPV), and calibration curves. Results Data on 4848 patients were analyzed. The malignancy rate was 43% (1402/3263) in oncology centers and 17% (263/1585) in other centers. The area under the receiver operating characteristic curve on validation data was very similar in oncology centers (0.917; 95% confidence interval, 0.901–0.931) and other centers (0.916; 95% confidence interval, 0.873–0.945). Risk estimates showed good calibration. In all, 23% of patients in the validation data set had a very low estimated risk (<1%) and 48% had a high estimated risk (≥30%). For the 1% risk cutoff, sensitivity was 99.7%, specificity 33.7%, LR+ 1.5, LR– 0.010, PPV 44.8%, and NPV 98.9%. For the 30% risk cutoff, sensitivity was 89.0%, specificity 84.7%, LR+ 5.8, LR– 0.13, PPV 75.4%, and NPV 93.9%. Conclusion Quantification of the risk of malignancy based on the Simple Rules has good diagnostic performance both in oncology centers and other centers. A simple classification based on these risk estimates may form the basis of a clinical management system. Patients with a high risk may benefit from surgery by a gynecological oncologist, while patients with a lower risk may be managed locally.
The most commonly used statistical models of civil war onset fail to correctly predict most occurrences of this rare event in out-of-sample data. Statistical methods for the analysis of binary data, ...such as logistic regression, even in their rare event and regularized forms, perform poorly at prediction. We compare the performance of Random Forests with three versions of logistic regression (classic logistic regression, Firth rare events logistic regression, and L
1-regularized logistic regression), and find that the algorithmic approach provides significantly more accurate predictions of civil war onset in out-of-sample data than any of the logistic regression models. The article discusses these results and the ways in which algorithmic statistical methods like Random Forests can be useful to more accurately predict rare events in conflict data.
Display omitted
Population-based computational approaches have been developed in recent years and helped to gain insight into arrhythmia mechanisms, and intra- and inter-patient variability (e.g., in ...drug responses). Here, we illustrate the use of multivariable logistic regression to analyze the factors that enhance or reduce the susceptibility to cellular arrhythmogenic events. As an example, we generate 1000 model variants by randomly modifying ionic conductances and maximal rates of ion transports in our atrial myocyte model and simulate an arrhythmia-provoking protocol that enhances early afterdepolarization (EAD) proclivity. We then treat EAD occurrence as a categorical, yes or no variable, and perform logistic regression to relate perturbations in model parameters to the presence/absence of EADs. We find that EAD formation is sensitive to the conductance of the voltage-gated Na+, the acetylcholine-sensitive and ultra-rapid K+ channels, and the Na+/Ca2+ exchange current, which matches our mechanistic understanding of the process and preliminary sensitivity analysis.
The described technique:
•allows investigating the factors underlying dichotomous outcomes, and is therefore a useful tool improve our understanding of arrhythmic risk;•is valid for analyzing both deterministic and stochastic models, and various phenomena (e.g., delayed afterdepolarizations and Ca2+ sparks);•is computationally more efficient than one-at-a-time parameter sensitivity analysis.
•A new method is proposed for evaluating the suitability of land for urban development.•The method avoids effects of subjectivity on evaluation process and highly accurate.•Logistic regression, PCA, ...K-means, kriging interpolation, and Geodetector were combined.•The results provided a scientific reference to support land allocation decisions.
Ensuring the suitability of urban development land is essential for delineating spatial growth boundaries and urban spatial layouts. However, the significant impact of subjective uncertainty on the suitability evaluation process significantly reduces the reliability of the evaluation results. Thus, in this study, we developed a new method to address this issue and improve the accuracy of the evaluation results. Zhengzhou in China was considered as the research area and the data utilized were obtained from the following primary sources: Landsat TM/ETM/OLI image data, land use data, digital elevation model data, spatial primary geographical data, and digital map data. A new method for evaluating the suitability of urban development land was developed by combining logistic regression, principal component analysis, kriging interpolation, K-means, and the Geodetector method to evaluate and classify the suitability of urban development land in Zhengzhou City during 2013. By using logistic regression, we could accurately evaluate the effects of a single factor, thereby avoiding subjective assessments. The principal component can be used to reduce the dimensions of the evaluation results for a single factor where the weight of the principal component is determined by using the cumulative contribution rate in order to obtain the comprehensive evaluation result. Kriging interpolation can be used to predict the evaluation results for the grid surface by using the principal component to comprehensively evaluate the sample points. K-means can be used to automatically classify the evaluation results for the grid surface. Geodetector was used to detect the spatial differentiation of the results in order to confirm the validity of the spatial partition results. These methods can avoid interference due to human factors and yield more objective and accurate evaluation results. The results indicated that the proposed evaluation method can avoid the subjective influence of the evaluation index classification and the determination of the index weight to obtain extremely accurate evaluations and high effectiveness. The suitability grading and evaluation values were highly consistent with the spatial pattern, thereby demonstrating the applicability of the evaluation results. The method and evaluation results may provide a scientific reference to support decisions regarding land resource allocation during urban development.
Forecasting the direction of the daily changes of stock indices is an important yet difficult task for market participants. Advances on data mining and machine learning make it possible to develop ...more accurate predictions to assist investment decision making. This paper attempts to develop a learning architecture LR2GBDT for forecasting and trading stock indices, mainly by cascading the logistic regression (LR) model onto the gradient boosted decision trees (GBDT) model. Without any assumption on the underlying data generating process, raw price data and twelve technical indicators are employed for extracting the information contained in the stock indices. The proposed architecture is evaluated by comparing the experimental results with the LR, GBDT, SVM (support vector machine), NN (neural network) and TPOT (tree-based pipeline optimization tool) models on three stock indices data of two different stock markets, which are an emerging market (Shanghai Stock Exchange Composite Index) and a mature stock market (Nasdaq Composite Index and S&P 500 Composite Stock Price Index). Given the same test conditions, the cascaded model not only outperforms the other models, but also shows statistically and economically significant improvements for exploiting simple trading strategies, even when transaction cost is taken into account.
•A cascaded learning architecture LR2GBDT is proposed to predict the direction of the daily changes of stock indices.•Logistic regression and gradient boosted decision trees are combined in our approach.•Technical indicators and the output derived from LR are fed as input features.•The prediction accuracy and trading performance are improved by LR2GBDT.•The profitability with a simple long–short trading strategy in a daily investment horizon is also discussed.
This paper demonstrates the use of amplitude variance in combination with only one time of 4th power transformation and fast Fourier transform (FFT) for modulation format identification (MFI) in ...digital coherent receiver. The incoming signals are firstly classified into two main categories, polarization-division multiplexing (PDM) m-ary phase-shift keying modulation (mPSK) and m-ary quadrature amplitude modulation (mQAM), based on the amplitude variance after constant modulus algorithm (CMA) equalization. Then, the sub-categories of PDM-mPSK or PDM-mQAM are further identified by utilizing the logical regression (LR) algorithm on a two-dimensional (2-D) plane which is constructed by using the mean and maximum value after 4th power transformation and FFT. The feasibility is firstly verified via numerical simulation for 28GBaud/s PDM-QPSK/8PSK/16QAM/32QAM signals transmitted through additive Gaussian noise channel as well as long-distance standard single mode fiber (SSMF) channel. The simulation results demonstrate that 100% identification accuracy for all these modulation formats can be achieved at the optical signal noise ratio (OSNR) much lower than their respective 7% FEC thresholds and as high as 99% identification accuracy can be achieved even at the OSNR much lower or close to their respective 20% FEC thresholds. 100% identification accuracy can be also obtained in the presence of obviously nonlinear impairments. Proof-of-concept experiments are finally implemented to evaluate the MFI performance among 28GBaud/s PDM-QPSK/8PSK/16QAM, and 21.5GBaud/s PDM-32QAM signals, which further confirm the feasibility of our proposed MFI scheme.
•Real-time crash prediction for weaving segments.•Higher speed difference between the beginning and end of weaving segment results in higher crash risk.•Wet pavement surface condition increases crash ...risk by 77%.•Maximum length is more important than segment length in crash risk estimation.•Weaving segment configuration significantly impact crash risk.
Weaving segments are potential recurrent bottlenecks which affect the efficiency and safety of expressways during peak hours. Meanwhile, they are one of the most complicated segments, since on- and off-ramp traffic merges, diverges and weaves in the limited space. One effective way to improve the safety of weaving segments is to study crash likelihood using real-time crash data with the objective of, identifying hazardous conditions and reducing the risk of crashes by Intelligent Transportation Systems (ITS) traffic control. This study presents a multilevel Bayesian logistic regression model for crashes at expressway weaving segments using crash, geometric, Microwave Vehicle Detection System (MVDS) and weather data. The results show that the mainline speed at the beginning of the weaving segments, the speed difference between the beginning and the end of weaving segment, logarithm of volume have significant impacts on the crash risk of the following 5–10min for weaving segments. The configuration is also an important factor. Weaving segment, in which there is no need for on- or off-ramp traffic to change lane, is with high crash risk because it has more traffic interactions and higher speed differences between weaving and non-weaving traffic. Meanwhile, maximum length, which measures the distance at which weaving turbulence no longer has impact, is found to be positively related to the crash risk at the 95% confidence interval. In addition to traffic and geometric factors, wet pavement surface condition significantly increases the crash ratio by 77%. The proposed model along with ITS, e.g., ramp metering, Dynamic Message Sign (DMS), and high friction surface treatment can be used to enhance the safety of weaving segments in real-time.
BackgroundDespite a concerted policy effort in Europe, social inequalities in health are a persistent problem. Developing a standardised measure of socioeconomic level across Europe will improve the ...understanding of the underlying mechanisms and causes of inequalities. This will facilitate developing, implementing and assessing new and more effective policies, and will improve the comparability and reproducibility of health inequality studies among countries. This paper presents the extension of the European Deprivation Index (EDI), a standardised measure first developed in France, to four other European countries—Italy, Portugal, Spain and England, using available 2001 and 1999 national census data.Methods and resultsThe method previously tested and validated to construct the French EDI was used: first, an individual indicator for relative deprivation was constructed, defined by the minimal number of unmet fundamental needs associated with both objective (income) poverty and subjective poverty. Second, variables available at both individual (European survey) and aggregate (census) levels were identified. Third, an ecological deprivation index was constructed by selecting the set of weighted variables from the second step that best correlated with the individual deprivation indicator.ConclusionsFor each country, the EDI is a weighted combination of aggregated variables from the national census that are most highly correlated with a country-specific individual deprivation indicator. This tool will improve both the historical and international comparability of studies, our understanding of the mechanisms underlying social inequalities in health and implementation of intervention to tackle social inequalities in health.
Piecewise Deterministic Monte Carlo algorithms enable simulation from a posterior distribution, whilst only needing to access a sub-sample of data at each iteration. We show how they can be ...implemented in settings where the parameters live on a restricted domain.
Measuring the indoor temperature of building rooms is a valuable approach for evaluating thermal comfort and providing feedback control for heat substations in district heating systems (DHSs) in ...China. Previous studies on indoor temperatures have primarily focused on analyzing their overall trends and influencing factors, while research on daily change patterns is lacking. This study utilized a clustering method to analyze the indoor temperature data from an actual DHS in Northeast China. First, a 24-h observation vector was constructed using the deviation between the actual and target values to represent the daily temperature pattern. Second, the k-means method was applied to cluster the values, and the quantity distribution and typical characteristics of each cluster were analyzed. Finally, a multi-nominal logistic regression model was used to analyze the influence of different factors on each cluster. The comparison results with the four representative clustering algorithms indicated that k-means was the optimal model and the optimal number of clusters was 4. The trend of each cluster was roughly the same, with the main difference being the fluctuation amplitude and distance from the target value. The differences between the clusters were related to various influencing features, with the primary return pressure for workdays and the secondary return pressure for holidays being the most significant. This study identified the optimal daily variation patterns of indoor temperature and analyzed the important features that affect this pattern, which is beneficial for enhancing the regulatory efficiency of DHS.
•Deviation-based observation vector represents daily temperature patterns.•Cluster analysis uncovers distinct patterns in daily indoor temperature change.•Optimal clustering effectiveness is achieved using k-means (DBI = 0.63).•Significant differences observed among clusters through varied characteristics.•MNLogit used to assess influence degrees of different features on clusters.