Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant ...sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network , learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a similarity network , fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.
Modern object detectors rely heavily on rectangular bounding boxes, such as anchors, proposals and the final predictions, to represent objects at various recognition stages. The bounding box is ...convenient to use but provides only a coarse localization of objects and leads to a correspondingly coarse extraction of object features. In this paper, we present RepPoints (representative points), a new finer representation of objects as a set of sample points useful for both localization and recognition. Given ground truth localization and recognition targets for training, RepPoints learn to automatically arrange themselves in a manner that bounds the spatial extent of an object and indicates semantically significant local areas. They furthermore do not require the use of anchors to sample a space of bounding boxes. We show that an anchor-free object detector based on RepPoints can be as effective as the state-of-the-art anchor-based detection methods, with 46.5 AP and 67.4 AP50 on the COCO test-dev detection benchmark, using ResNet-101 model. Code is available at https://github.com/microsoft/RepPoints.
This paper proposes a method for learning joint embeddings of images and text using a two-branch neural network with multiple layers of linear projections followed by nonlinearities. The network is ...trained using a large-margin objective that combines cross-view ranking constraints with within-view neighborhood structure preservation constraints inspired by metric learning literature. Extensive experiments show that our approach gains significant improvements in accuracy for image-to-text and text-to-image retrieval. Our method achieves new state-of-the-art results on the Flickr30K and MSCOCO image-sentence datasets and shows promise on the new task of phrase localization on the Flickr30K Entities dataset.
On the Euclidean distance of images Wang, Liwei; Zhang, Yan; Feng, Jufu
IEEE transactions on pattern analysis and machine intelligence,
08/2005, Volume:
27, Issue:
8
Journal Article
Peer reviewed
We present a new Euclidean distance for images, which we call image Euclidean distance (IMED). Unlike the traditional Euclidean distance, IMED takes into account the spatial relationships of pixels. ...Therefore, it is robust to small perturbation of images. We argue that IMED is the only intuitively reasonable Euclidean distance for images. IMED is then applied to image recognition. The key advantage of this distance measure is that it can be embedded in most image classification techniques such as SVM, LDA, and PCA. The embedding is rather efficient by involving a transformation referred to as standardizing transform (ST). We show that ST is a transform domain smoothing. Using the face recognition technology (FERET) database and two state-of-the-art face identification algorithms, we demonstrate a consistent performance improvement of the algorithms embedded with the new metric over their original versions.
We show that the maximal operator associated with multilinear Calderón-Zygmund singular integrals and its commutators are bounded on products of central Morrey spaces with variable exponent. ...Moreover, some bounded properties are obtained for the commutators of multilinear Calderón-Zygmund operators as well as for the corresponding fractional integrals.
In this review paper, the recent development on the adsorption of organic dyes by metal-doped porous carbon materials were reviewed. The primary objective of this paper is to sort out the dispersion ...information of metal-doped porous carbon materials widely used in organic dye adsorption. Various metal-doped porous carbon materials adsorbing organic dyes are summarized and discussed here for the first time. Key factors affecting the adsorption process such as the amount of doped metal, solution pH, and temperature are also reported and discussed. The adsorption mechanisms such as electrostatic interaction, π-π interaction, hydrogen bonding and synergistic interaction between metal particles and carbon materials are proposed for organic dyes adsorption on metal-doped porous carbon with the help of related works from the literature. Finally, few suggestions for future studies on metal-doped porous carbon materials are proposed.
•The adsorption of organic dyes in wastewater by metal doped porous carbon materials was reviewed for the first time.•Different types of metal doped porous carbon were summarized and their development trends for dye adsorption were analyzed.•The influence factors and mechanism of adsorption were discussed.•Few suggestions for future studying on metal doped porous carbon materials are proposed.
Display omitted
•Word embeddings trained from clinical notes, literature, Wikipedia, and news are compared.•Word embeddings trained from clinical notes and literature capture word semantics ...better.•There isn’t a consistent global ranking of word embeddings for biomedical NLP applications.•Word embeddings trained from biomedical domain corpora do not necessarily perform better.
Word embeddings have been prevalently used in biomedical Natural Language Processing (NLP) applications due to the ability of the vector representations being able to capture useful semantic properties and linguistic relationships between words. Different textual resources (e.g., Wikipedia and biomedical literature corpus) have been utilized in biomedical NLP to train word embeddings and these word embeddings have been commonly leveraged as feature input to downstream machine learning models. However, there has been little work on evaluating the word embeddings trained from different textual resources.
In this study, we empirically evaluated word embeddings trained from four different corpora, namely clinical notes, biomedical publications, Wikipedia, and news. For the former two resources, we trained word embeddings using unstructured electronic health record (EHR) data available at Mayo Clinic and articles (MedLit) from PubMed Central, respectively. For the latter two resources, we used publicly available pre-trained word embeddings, GloVe and Google News. The evaluation was done qualitatively and quantitatively. For the qualitative evaluation, we randomly selected medical terms from three categories (i.e., disorder, symptom, and drug), and manually inspected the five most similar words computed by embeddings for each term. We also analyzed the word embeddings through a 2-dimensional visualization plot of 377 medical terms. For the quantitative evaluation, we conducted both intrinsic and extrinsic evaluation. For the intrinsic evaluation, we evaluated the word embeddings’ ability to capture medical semantics by measruing the semantic similarity between medical terms using four published datasets: Pedersen’s dataset, Hliaoutakis’s dataset, MayoSRS, and UMNSRS. For the extrinsic evaluation, we applied word embeddings to multiple downstream biomedical NLP applications, including clinical information extraction (IE), biomedical information retrieval (IR), and relation extraction (RE), with data from shared tasks.
The qualitative evaluation shows that the word embeddings trained from EHR and MedLit can find more similar medical terms than those trained from GloVe and Google News. The intrinsic quantitative evaluation verifies that the semantic similarity captured by the word embeddings trained from EHR is closer to human experts’ judgments on all four tested datasets. The extrinsic quantitative evaluation shows that the word embeddings trained on EHR achieved the best F1 score of 0.900 for the clinical IE task; no word embeddings improved the performance for the biomedical IR task; and the word embeddings trained on Google News had the best overall F1 score of 0.790 for the RE task.
Based on the evaluation results, we can draw the following conclusions. First, the word embeddings trained from EHR and MedLit can capture the semantics of medical terms better, and find semantically relevant medical terms closer to human experts’ judgments than those trained from GloVe and Google News. Second, there does not exist a consistent global ranking of word embeddings for all downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, the word embeddings trained from the biomedical domain corpora do not necessarily have better performance than those trained from the general domain corpora for any downstream biomedical NLP task.
The design of high glass transition temperature (T g) thermoset materials with considerable reparability is a challenge. In this study, a novel biobased triepoxy (TEP) is synthesized and cured with ...an anhydride monomer in the presence of zinc catalyst. The cured TEP exhibits a high T g (187 °C) and comparable strength and modulus to the cured bisphenol A epoxy. By adopting the vitrimer chemistry, the cross-linked polymer materials are imparted significant stress relaxation and reparability via dynamic transesterification. It is noted that the reparability is closely related to the repairing temperature, external force, catalyst content, and the magnitude of rubbery modulus of the sample. The width of the crack from the cured TEP can be efficiently repaired within 10 min. This work introduces the first high-T g biobased epoxy material with excellent reparability and provides a valuable method for the design of high-T g self-healing materials suitable for high service temperature.