The automatic classification of the HEp-2 cell stain patterns from indirect immunofluorescence images has attracted much attention recently. As an image classification problem, it can be well solved ...by the state-of-the-art bag-of-features (BoF) model as long as a suitable local descriptor is known. Unfortunately, for this special task, we have very limited knowledge of such a descriptor.
In this paper, we explore the possibility of automatically learning the descriptor from the image data itself. Specifically, we assume that a local patch can be well described by a set of linear projections performed on its pixel values. Based on this assumption, both unsupervised and supervised approaches are explored for learning the projections. More importantly, we propose a multi-projection-multi-codebook scheme which creates multiple linear projection descriptors and multiple image representation channels with each channel corresponding to one descriptor. Through our analysis, we show that the image representation obtained by combining these different channels can be more discriminative than that obtained from a single-projection scheme. This analysis is further verified by our experimental study.
We evaluate the proposed approach by strictly following the protocol suggested by the organizer of the 2012 HEp-2 cell classification contest which is hosted to compare the state-of-the-art methods for HEp-2 cell classification. In this paper, our system achieves 66.6% cell level classification accuracy which is just slightly lower than the best performance achieved in the HEp-2 cell classification contest. This result is impressive and promising considering that we only utilize a single type of feature (namely, linear projection coefficients of patch pixel values) which is learned from the image data.
•We study the classification of HEp-2 Cell stain patterns by using BoF Model.•Linear Projection coefficients from raw pixel value are used as local descriptors.•A multi-projection-multi-codebook model based image representation is proposed.
Weakly supervised anomaly detection aims at learning an anomaly detector from a limited amount of labeled data and abundant unlabeled data. Recent works build deep neural networks for anomaly ...detection by discriminatively mapping the normal samples and abnormal samples to different regions in the feature space or fitting different distributions. However, due to the limited number of annotated anomaly samples, directly training networks with the discriminative loss may not be sufficient. To overcome this issue, this article proposes a novel strategy to transform the input data into a more meaningful representation that could be used for anomaly detection. Specifically, we leverage an autoencoder to encode the input data and utilize three factors, hidden representation, reconstruction residual vector, and reconstruction error, as the new representation for the input data. This representation amounts to encode a test sample with its projection on the training data manifold, its direction to its projection, and its distance to its projection. In addition to this encoding, we also propose a novel network architecture to seamlessly incorporate those three factors. From our extensive experiments, the benefits of the proposed strategy are clearly demonstrated by its superior performance over the competitive methods. Code is available at: https://github.com/yj-zhou/Feature_Encoding_with_AutoEncoders_for_Weakly-supervised_Anomaly_Detection .
Humans are capable of learning a new fine-grained concept with very little supervision, e.g., few exemplary images for a species of bird, yet our best deep learning systems need hundreds or thousands ...of labeled examples. In this paper, we try to reduce this gap by studying the fine-grained image recognition problem in a challenging few-shot learning setting, termed few-shot fine-grained recognition (FSFG). The task of FSFG requires the learning systems to build classifiers for the novel fine-grained categories from few examples (only one or less than five). To solve this problem, we propose an end-to-end trainable deep network, which is inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task. Specifically, our network consists of a bilinear feature learning module and a classifier mapping module: while the former encodes the discriminative information of an exemplar image into a feature vector, the latter maps the intermediate feature into the decision boundary of the novel category. The key novelty of our model is a "piecewise mappings" function in the classifier mapping module, which generates the decision boundary via learning a set of more attainable sub-classifiers in a more parameter-economic way. We learn the exemplar-to-classifier mapping based on an auxiliary dataset in a meta-learning fashion, which is expected to be able to generalize to novel categories. By conducting comprehensive experiments on three fine-grained datasets, we demonstrate that the proposed method achieves superior performance over the competing baselines.
Display omitted
Tumor-associated macrophage (TAM)-related immunotherapy is a greatly promising strategy that involves altering the immunosuppressive tumor microenvironment with the immunomodulator ...imiquimod (R837) for enhanced cancer therapy. However, the function of R837 is seriously limited due to poor water solubility and a lack of targeting ability. Here, we developed two types of targeting polymer micelles to separately deliver R837 and the anticancer drug doxorubicin (DOX) to TAMs and tumor cells via intratumoral injection and intravenous injection, respectively, for enhanced cancer chemo-immunotherapy against breast cancer. After these micelles accumulated in the tumor tissues, the immunostimulating micelles released R837, which bound to the TLR-7 receptor on the lysosomal membrane within the TAM, stimulating the maturation of the TAM, thereby causing an antitumor immune response and relieving the immunosuppressive effect in the tumor microenvironment. Simultaneously, the chemotherapeutic micelles released DOX in the cytoplasm of the tumor cells, directly inducing cell death. As a result, a synergistic combination of chemotherapy and immunotherapy was achieved through these nanomedicines, which separately activated the antitumor immune response and inhibited tumor cell growth. Therefore, this strategy is a new avenue for the development of targeting nanomedicines for combination chemo-immunotherapy against malignant cancer.
Clinical chemotherapy confronts a challenge resulting from cancer-related multidrug resistance (MDR), which can directly lead to treatment failure. To address it, an innovative approach is proposed ...to construct a light-activated reactive oxygen species (ROS)-responsive nanoplatform based on a protoporphyrin (PpIX)-conjugated and dual chemotherapeutics-loaded polymer micelle. This system combines chemotherapy and photodynamic therapy (PDT) to defeat the MDR of tumors. Such an intelligent nanocarrier can prolong the circulation time in blood because of the negative polysaccharide component of chondroitin sulfate, and subsequently being selectively internalized by MCF-7/ADR cells doxorubicin (DOX)-resistant. When exposed to 635 nm red light, this nanoplatform generates sufficient ROS through the photoconversion of PpIX, further triggering the disassociation of the micelles to release the dual cargoes. Afterward, the released apatinib, serving as a reversal inhibitor of MDR, can recover the chemosensitivity of DOX by competitively inhibiting the P-glycoprotein drug pump in drug-resistant tumor cells, and the excessive ROS has a strong capacity to exert its PDT effect to act on the mitochondria or the nuclei, ultimately causing cell apoptosis. As expected, this intelligent nanosystem successfully reverses tumor MDR via the synergism between apatinib-enhanced DOX sensitivity and ROS-mediated PDT performance.
Mask-Aware Networks for Crowd Counting Jiang, Shengqin; Lu, Xiaobo; Lei, Yinjie ...
IEEE transactions on circuits and systems for video technology,
09/2020, Letnik:
30, Številka:
9
Journal Article
Recenzirano
Odprti dostop
Crowd counting problem aims to count the number of objects within an image or a frame in the videos and is usually solved by estimating the density map generated from the object location annotations. ...The values in the density map, by nature, take two possible states: zero indicating no object around, a non-zero value indicating the existence of objects and the value denoting the local object density. In contrast to traditional methods which do not differentiate the density prediction of these two states, we propose to use a dedicated network branch to predict the object/non-object mask and then combine its prediction with the input image to produce the density map. Our rationale is that the mask prediction could be better modeled as a binary segmentation problem and the difficulty of estimating the density could be reduced if the mask is known. A key to the proposed scheme is the strategy of incorporating the mask prediction into the density map estimator. To this end, we study five possible solutions, and via analysis and experimental validation we identify the most effective one. Through extensive experiments on three public datasets, we demonstrate the superior performance of the proposed approach over the baselines and show that our network could achieve the state-of-the-art performance.
Zero-shot learning (ZSL) can be formulated as a cross-domain matching problem: after being projected into a joint embedding space, a visual sample will match against all candidate class-level ...semantic descriptions and be assigned to the nearest class. In this process, the embedding space underpins the success of such matching and is crucial for ZSL. In this paper, we conduct an in-depth study on the construction of embedding space for ZSL and posit that an ideal embedding space should satisfy two criteria: intra-class compactness and inter-class separability. While the former encourages the embeddings of visual samples of one class to distribute tightly close to the semantic description embedding of this class, the latter requires embeddings from different classes to be well separated from each other. Towards this goal, we present a simple but effective two-branch network to simultaneously map semantic descriptions and visual samples into a joint space, on which visual embeddings are forced to regress to their class-level semantic embeddings and the embeddings crossing classes are required to be distinguishable by a trainable classifier. Furthermore, we extend our method to a transductive setting to better handle the model bias problem in ZSL (i.e., samples from unseen classes tend to be categorized into seen classes) with minimal extra supervision. Specifically, we propose a pseudo labeling strategy to progressively incorporate the testing samples into the training process and thus balance the model between seen and unseen classes. Experimental results on five standard ZSL datasets show the superior performance of the proposed method and its transductive extension.
In this paper we propose a dilated convolutional model for music melody extraction. Taking variable-q transforms (VQTs) as inputs, it first uses consecutive layers of convolution to capture local ...temporal-frequency patterns, and then a single layer of dilated convolution to capture global frequency patterns contributed by the pitches and harmonics of active notes. Compared with the contrast model without dilation, the proposed model can remarkably cut down the computational cost, and at the same time does not compromise the performance. Its advantages over existing models are two fold. First, it performs best on most datasets, for both general and vocal melody extraction. Second, it can achieve the best performance with least training data.
Devising a representation suitable for characterizing human actions on the basis of a sequence of pose estimates generated by an RGBD sensor remains a research challenge. We here provide two insights ...into this challenge. First, we show that discriminate sequence of poses typically occur over a short time window, and thus we propose a simple-but-effective local descriptor called a trajectorylet to capture the static and kinematic information within this interval. Second, we show that state of the art recognition results can be achieved by encoding each trajectorylet using a discriminative trajectorylet detector set which is selected from a large number of candidate detectors trained through exemplar-SVMs. The action-level representation is obtained by pooling trajectorylet encodings. Evaluating on standard datasets acquired from the Kinect sensor, it is demonstrated that our method obtains superior results over existing approaches under various experimental setups.
•We design the trajectorylet that captures the static and dynamic pose information withina short interval.•We obtain discriminative patterns of action instances by encoding each trajectorylet using a discriminative trajectorylet detector set.•This method generalizes well to different parameter settings and datasets.