Many natural combinatorial problems can be expressed as constraint satisfaction problems. This class of problems is known to be NP-complete in general, but certain restrictions on the form of the ...constraints can ensure tractability. The standard way to parameterize interesting subclasses of the constraint satisfaction problem is via finite constraint languages. The main problem is to classify those subclasses that are solvable in polynomial time and those that are NP-complete. It was conjectured that if a constraint language has a weak near-unanimity polymorphism then the corresponding constraint satisfaction problem is tractable; otherwise, it is NP-complete.
In the article, we present an algorithm that solves Constraint Satisfaction Problem in polynomial time for constraint languages having a weak near unanimity polymorphism, which proves the remaining part of the conjecture.
1
Label noise is an important issue in classification, with many potential negative consequences. For example, the accuracy of predictions may decrease, whereas the complexity of inferred models and ...the number of necessary training samples may increase. Many works in the literature have been devoted to the study of label noise and the development of techniques to deal with label noise. However, the field lacks a comprehensive survey on the different types of label noise, their consequences and the algorithms that consider label noise. This paper proposes to fill this gap. First, the definitions and sources of label noise are considered and a taxonomy of the types of label noise is proposed. Second, the potential consequences of label noise are discussed. Third, label noise-robust, label noise cleansing, and label noise-tolerant algorithms are reviewed. For each category of approaches, a short discussion is proposed to help the practitioner to choose the most suitable technique in its own particular field of application. Eventually, the design of experiments is also discussed, what may interest the researchers who would like to test their own algorithms. In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g., confidences on labels.
Internet of Things (IoT) is a new paradigm that integrates the Internet and physical objects belonging to different domains such as home automation, industrial process, human health and environmental ...monitoring. It deepens the presence of Internet-connected devices in our daily activities, bringing, in addition to many benefits, challenges related to security issues. For more than two decades, Intrusion Detection Systems (IDS) have been an important tool for the protection of networks and information systems. However, applying traditional IDS techniques to IoT is difficult due to its particular characteristics such as constrained-resource devices, specific protocol stacks, and standards. In this paper, we present a survey of IDS research efforts for IoT. Our objective is to identify leading trends, open issues, and future research possibilities. We classified the IDSs proposed in the literature according to the following attributes: detection method, IDS placement strategy, security threat and validation strategy. We also discussed the different possibilities for each attribute, detailing aspects of works that either propose specific IDS schemes for IoT or develop attack detection strategies for IoT threats that might be embedded in IDSs.
Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This article provides a brief introduction to the ...field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to many applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.
Multi-modality (MM) image fusion aims to render fused images that maintain the merits of different modalities, e.g., functional highlight and detailed textures. To tackle the challenge in modeling ...cross-modality features and decomposing desirable modality-specific and modality-shared features, we propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. Firstly, CDDFuse uses Restormer blocks to extract cross-modality shallow features. We then introduce a dual-branch Transformer-CNN feature extractor with Lite Transformer (LT) blocks leveraging long-range attention to handle low-frequency global features and Invertible Neural Networks (INN) blocks focusing on extracting high-frequency local information. A correlation-driven loss is further proposed to make the low-frequency features correlated while the high-frequency features uncorrelated based on the embedded information. Then, the LT-based global fusion and INN-based local fusion layers output the fused image. Extensive experiments demonstrate that our CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion. We also show that CDDFuse can boost the performance in downstream infrared-visible semantic segmentation and object detection in a unified benchmark. The code is available at https://github.om/haozixiang1228/MMIF-CDDFuse.
Hyperspectral image (HSI) and multispectral image (MSI) fusion, which fuses a low-spatial-resolution HSI (LR-HSI) with a higher resolution multispectral image (MSI), has become a common scheme to ...obtain high-resolution HSI (HR-HSI). This article presents a novel HSI and MSI fusion method (called as CNN-Fus), which is based on the subspace representation and convolutional neural network (CNN) denoiser, i.e., a well-trained CNN for gray image denoising. Our method only needs to train the CNN on the more accessible gray images and can be directly used for any HSI and MSI data sets without retraining. First, to exploit the high correlations among the spectral bands, we approximate the desired HR-HSI with the low-dimensional subspace multiplied by the coefficients, which can not only speed up the algorithm but also lead to more accurate recovery. Since the spectral information mainly exists in the LR-HSI, we learn the subspace from it via singular value decomposition. Due to the powerful learning performance and high speed of CNN, we use the well-trained CNN for gray image denoising to regularize the estimation of coefficients. Specifically, we plug the CNN denoiser into the alternating direction method of multipliers (ADMM) algorithm to estimate the coefficients. Experiments demonstrate that our method has superior performance over the state-of-the-art fusion methods.
Indexing highly repetitive texts—such as genomic databases, software repositories and versioned text collections—has become an important problem since the turn of the millennium. A relevant ...compressibility measure for repetitive texts is
r
, the number of runs in their Burrows-Wheeler Transforms (BWTs). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used
O
(
r
) space and was able to efficiently count the number of occurrences of a pattern of length
m
in a text of length
n
(in
O
(
m
log log
n
) time, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms of
r
. In this article, we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the
occ
occurrences efficiently (in
O
(
occ
log log
n
) time) within
O
(
r
) space. By raising the space to
O
(
r
log log
n
), our index counts the occurrences in optimal time,
O
(
m
), and locates them in optimal time as well,
O
(
m
+
occ
). By further raising the space by an
O
(
w
/ log σ) factor, where σ is the alphabet size and
w
= Ω (log
n
) is the RAM machine size in bits, we support count and locate in
O
(⌈
m
log (σ)/
w
⌉) and
O
(⌈
m
log (σ)/
w
⌉ +
occ
) time, which is optimal in the packed setting and had not been obtained before in compressed space. We also describe a structure using
O
(
r
log (
n
/
r
)) space that replaces the text and extracts any text substring of length ℓ in the almost-optimal time
O
(log (
n
/
r
)+ℓ log (σ)/
w
). Within that space, we similarly provide access to arbitrary suffix array, inverse suffix array, and longest common prefix array cells in time
O
(log (
n
/
r
)), and extend these capabilities to full suffix tree functionality, typically in
O
(log (
n
/
r
)) time per operation. Our experiments show that our
O
(
r
)-space index outperforms the space-competitive alternatives by 1--2 orders of magnitude in time. Competitive implementations of the original FM-index are outperformed by 1--2 orders of magnitude in space and/or 2--3 in time.
Hierarchical Clustering Cohen-addad, Vincent; Kanade, Varun; Mallmann-trenn, Frederik ...
Journal of the ACM,
08/2019, Letnik:
66, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on ...providing algorithms, rather than optimizing a specific objective, Dasgupta framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a “good” hierarchical clustering is one that minimizes a particular cost function 23. He showed that this cost function has certain desirable properties: To achieve optimal cost, disconnected components (namely, dissimilar elements) must be separated at higher levels of the hierarchy, and when the similarity between data elements is identical, all clusterings achieve the same cost.
We take an axiomatic approach to defining “good” objective functions for both similarity- and dissimilarity-based hierarchical clustering. We characterize a set of
admissible
objective functions having the property that when the input admits a “natural” ground-truth hierarchical clustering, the ground-truth clustering has an optimal value. We show that this set includes the objective function introduced by Dasgupta.
Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better and faster algorithms for hierarchical clustering. We also initiate a beyond worst-case analysis of the complexity of the problem and design algorithms for this scenario.