We introduce a multidimensional, neural network approach to reveal and measure urban segregation phenomena, based on the self-organizing map algorithm (SOM). The multidimensionality of SOM allows one ...to apprehend a large number of variables simultaneously, defined on census blocks or other types of statistical blocks, and to perform clustering along them. Levels of segregation are then measured through correlations between distances on the neural network and distances on the actual geographical map. Further, the stochasticity of SOM enables one to quantify levels of heterogeneity across census blocks. We illustrate this new method on data available for the city of Paris.
DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of ...how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species.
No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods.
The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
Fine-scale data is particularly important for the analysis of multiscalar segregation phenomena. Using dis-aggregated data from an EU data challenge, we show here how to apply a recently developed ...method that measures segregation at multiple scales and provides a visualization of the levels of segregation across scale and space. We illustrate the technique with results for two groups of citizen migrants in the city of Paris.
(1) Background: SARS-CoV-2 has infected more than 97 million people worldwide and caused the death of more than 6 million. (2) Methods: Between 1 October and 31 December 2020, 764 patients diagnosed ...with SARS-CoV-2 infection were selected based on RT-PCR test results. The following parameters were noted: age, gender, origin, days of hospitalization, COVID-19 experienced form, radiographic imaging features, associated comorbidities, and recommended treatment at discharge. (3) Results: The mean age at the time of COVID-19 infection was 55.2 years for men and 55.3 years for women. There was a similar age distribution among patients, regardless of gender. There was a substantial difference between the average lengths of hospitalization and those with residual symptoms—most patients who reported symptoms after discharge had been admitted with moderately severe forms of illness. Fatigue was the main remaining symptom (36%). (4) Conclusions: In conclusion, to clarify the impact of SARS-CoV-2 infection on patients in the long term, further studies are needed to investigate the elements assessed. Well-designed recovery programs will be needed to effectively manage these patients, with multidisciplinary collaboration and a team of professionals involved in all aspects of post-COVID patient health.
This paper proposes a descriptive method for an open problem in time series analysis: determining the number of regimes in a switching autoregressive model. We will translate this problem into a ...classification one and define a criterion for hierarchically clustering different model fittings. Finally, the method will be tested on simulated examples and real-life data.
The impact of outliers and anomalies on model estimation and data processing is of paramount importance, as evidenced by the extensive body of research spanning various fields over several decades: ...thousands of research papers have been published on the subject. As a consequence, numerous reviews, surveys, and textbooks have sought to summarize the existing literature, encompassing a wide range of methods from both the statistical and data mining communities. While these endeavors to organize and summarize the research are invaluable, they face inherent challenges due to the pervasive nature of outliers and anomalies in all data-intensive applications, irrespective of the specific application field or scientific discipline. As a result, the resulting collection of papers remains voluminous and somewhat heterogeneous.
To address the need for knowledge organization in this domain, this paper implements the first systematic meta-survey of general surveys and reviews on outlier and anomaly detection. Employing a classical systematic survey approach, the study collects nearly 500 papers using two specialized scientific search engines. From this comprehensive collection, a subset of 56 papers that claim to be general surveys on outlier detection is selected using a snowball search technique to enhance field coverage. A meticulous quality assessment phase further refines the selection to a subset of 25 high-quality general surveys.
Using this curated collection, the paper investigates the evolution of the outlier detection field over a 20-year period, revealing emerging themes and methods. Furthermore, an analysis of the surveys sheds light on the survey writing practices adopted by scholars from different communities who have contributed to this field.
Finally, the paper delves into several topics where consensus has emerged from the literature. These include taxonomies of outlier types, challenges posed by high-dimensional data, the importance of anomaly scores, the impact of learning conditions, difficulties in benchmarking, and the significance of neural networks. Non-consensual aspects are also discussed, particularly the distinction between local and global outliers and the challenges in organizing detection methods into meaningful taxonomies.
•Outlier detection research shows renewed interest since deep learning rise.•High dimensional outlier detection is still very difficult.•Benchmarks, visualisation and explanations are open research issues.
We aimed to determine the trend of the antimicrobial resistance pattern of pathogens isolated in samples collected from patients hospitalized in the intensive care unit (ICU) in selected periods ...before and after COVID-19. A retrospective study of bacterial pathogens was performed on 1267 patients. Positive bacterial culture data from 1695 samples from the pre-COVID-19 period and 1562 samples from the post-COVID-19 period were obtained. The most frequently isolated bacteria in both periods were
and
spp. The resistance rates of
spp. Significantly increased against colistin (0.38% to 20.51%), gentamicin (44.62% to 64.85%), and aztreonam (56.35% to 3.60%). There was a significant increase in the resistance rate against colistin for
strains (4.69% to 32.46%) and for Acinetobacter sp. strains (3.37% to 18.09%). More than 50% of the
strains were MRSA, with statistically significant increases in the antimicrobial resistance rate against doxycycline (40.08% to 51.72%), linezolid (0.22% to 3.13%), rifampicin (53.16% to 64.93%), and teicoplanin (26.31% to 53.40%). The study revealed a significantly increasing trend in the antimicrobial resistance rate of Gram-negative pathogens against certain antibiotics, including those used only in cases where there are no other therapeutic options.
Monitoring pesticide concentration is very important for public authorities given the major concerns for environmental safety and the likelihood for increased public health risks. An important aspect ...of this process consists in locating abnormal signals, from a large amount of collected data. This kind of data is usually complex since it suffers from limits of quantification leading to left censored observations, and from the sampling procedure which is irregular in time and space across measuring stations. The present manuscript tackles precisely the issue of detecting spatio‐temporal collective anomalies in pesticide concentration levels, and introduces a novel methodology for dealing with spatio‐temporal heterogeneity. The latter combines a change‐point detection procedure applied to the series of maximum daily values across all stations, and a clustering step aimed at a spatial segmentation of the stations. Limits of quantification are handled in the change‐point procedure, by supposing an underlying left‐censored parametric model, piece‐wise stationary. Spatial segmentation takes into account the geographical conditions, and may be based on river network, wind directions and so forth. Conditionally to the temporal segment and the spatial cluster, one may eventually analyze the data and identify contextual anomalies. The proposed procedure is illustrated in detail on a data set containing the prosulfocarb concentration levels in surface waters in Centre‐Val de Loire region.
Segregation through the multiscalar lens Olteanu, Madalina; Randon-Furling, Julien; Clark, William A. V.
Proceedings of the National Academy of Sciences - PNAS,
06/2019, Volume:
116, Issue:
25
Journal Article
Peer reviewed
Open access
We introduce a mathematical framework that allows one to carry out multiscalar and multigroup spatial exploratory analysis across urban regions. By producing coefficients that integrate information ...across all scales and that are normalized with respect to theoretical maximally segregated configurations, this framework provides a practical and powerful tool for the comparative empirical analysis of urban segregation. We illustrate our method with a study of ethnic mixing in the Los Angeles metropolitan area.