Spatiotemporal and motion feature representations are the key to video action recognition. Typical previous approaches are to utilize 3D CNNs to cope with both spatial and temporal features, but they ...suffer from huge computations. Other approaches are to utilize (1+2)D CNNs to learn spatial and temporal features in an efficient way, but they neglect the importance of motion representations. To overcome problems with previous approaches, we propose a novel block which makes it possible to alleviate the aforementioned problems, since our block can capture spatial and temporal features more faithfully and efficiently learn motion features. This proposed block includes Motion Excitation (ME), Multi-view Excitation (MvE), and Densely Connected Temporal Aggregation (DCTA). The purpose of ME is to encode feature-level frame differences; MvE is designed to enrich spatiotemporal features with multiple view representations adaptively; and DCTA is to model long-range temporal dependencies. We inject the proposed building block, which we refer to as the META block (or simply “META”), into 2D ResNet-50. Through extensive experiments, we demonstrate that our proposed method architecture outperforms previous CNN-based methods in terms of “Val Top-1 %” measure with Something-Something v1 and Jester datasets, while the META yielded competitive results with the Moment-in-Time Mini dataset.
Generally, users are reserved in describing their search intention when submitting queries into the search engine. Therefore, a large number of search queries are usually short, ambiguous and tend to ...have multiple interpretations. With the gigantic size of the web, ignoring the information needs underlying such queries can misguide the search engine. To mitigate these issues, an effective approach is to diversify the search results considering the query subtopics with diverse intents. The task of identifying possible subtopics with diverse intents underlying a query is known as subtopic mining. This paper is aimed at mining and diversifying subtopics underlying a query. Our method first exacts noun phrases containing the query terms from the top-retrieved web documents. We also extract query suggestions and completions from commercial search engines. The extracted candidates highly related to the query are then selected as subtopics. We introduce a new relatedness score function to estimate the degree of relatedness between the query and the candidate. To estimate the relevancy between the query and the subtopic, this paper introduces a semantic relevance measure using a locally trained sentence embedding model. Finally, we propose a novel coverage-based diversification technique to rank the subtopics combining their relevancy and the coverage estimated by the web documents. The experimental results on two NTCIR English subtopic mining datasets demonstrate that our proposed method achieves new state-of-the-art performance and significantly outperforms some known related methods in terms of relevance (D-nDCG) and diversity (D#-nDCG) metric at cut of 10.
The upsurge of prolific blogging and microblogging platforms enabled the abusers to spread negativity and threats greater than ever. Negative and hateful comments are averting users from sharing ...their opinion freely on social media platforms. It often breaks people’s confidence and causes extensive damage to their mental health. Hence, identifying these toxic contents and taking appropriate measures against them is crucial to preserve a safe environment on social media. Numerous state-of-the-art approaches classify the whole content as toxic or non-toxic, but they don’t distinguish the precise toxic portion from the whole content. Detecting the toxic portions is essential as it substantially aids to moderate the toxic contents through excluding the abusive parts. This paper describes our proposed approach to detect the toxic portions from text contents efficiently and accurately. We explore an ensemble of sequence labeling models including the word embedding-based Spark NLP NER (named entity recognition) deep learning model, spaCy NER model with custom toxic tags, and ALBERT NER model to identify the toxic spans. The NER-based models usually intend to capture the contextual attributes of phrases and spans that are essential for named entity recognition. As the toxic span detection task also requires us to apprehend the phrasal context for detecting toxic span, the similarity between these two tasks inspires us to exploit these NER models. Finally, we determine the final toxic spans using a prevalence-based fusion of the predictions generated by these models. The fusion strategy enables us to consolidate the diversity of these models for perceiving the phrasal context in all aspects. Experimental results achieved on the SemEval-2021 toxic spans detection dataset depict that our model meticulously captures the toxic fragment and achieves a competitive result among the other state-of-the-art methods.
Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. ...Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods.
Recent studies have obtained superior performance in image recognition tasks by using, as an image representation, the fully connected layer activations of Convolutional Neural Networks (CNN) trained ...with various kinds of images. However, the CNN representation is not very suitable for fine-grained image recognition tasks involving food image recognition. For improving performance of the CNN representation in food image recognition, we propose a novel image representation that is comprised of the covariances of convolutional layer feature maps. In the experiment on the ETHZ Food-101 dataset, our method achieved 58.65% averaged accuracy, which outperforms the previous methods such as the Bag-of-Visual-Words Histogram, the Improved Fisher Vector, and CNN-SVM.
Current action recognition studies enjoy the benefits of two neural network branches, spatial and temporal. This work aims to extend the previous work by introducing a fusion of spatial and temporal ...branches to provide superior action recognition capability toward multi-label multi-class classification problems. In this paper, we propose three fusion models with different fusion strategies. We first build several efficient temporal Gaussian mixture (TGM) layers to form spatial and temporal branches to learn a set of features. In addition to these branches, we introduce a new deep spatio-temporal branch consisting of a series of TGM layers to learn the features that emerged from the existing branches. Each branch produces a temporal-aware feature that assists the model in understanding the underlying action in a video. To verify the performance of our proposed models, we performed extensive experiments using the well-known MultiTHUMOS benchmarking dataset. The results demonstrate the importance of our proposed deep fusion mechanism, contributing to the overall score while keeping the number of parameters small.
•Build a large-scale 3D shape retrieval benchmark that supports multi-modal queries.•Evaluate the 26 3D shape retrieval methods using 3 types of metrics.•Solicit and identify state-of-the-art methods ...and promising related techniques.•Perform detailed analysis on diverse methods w.r.t accuracy and efficiency.•Make benchmark and evaluation tools freely available to the community.
Large-scale 3D shape retrieval has become an important research direction in content-based 3D shape retrieval. To promote this research area, two Shape Retrieval Contest (SHREC) tracks on large scale comprehensive and sketch-based 3D model retrieval have been organized by us in 2014. Both tracks were based on a unified large-scale benchmark that supports multimodal queries (3D models and sketches). This benchmark contains 13680 sketches and 8987 3D models, divided into 171 distinct classes. It was compiled to be a superset of existing benchmarks and presents a new challenge to retrieval methods as it comprises generic models as well as domain-specific model types. Twelve and six distinct 3D shape retrieval methods have competed with each other in these two contests, respectively. To measure and compare the performance of the participating and other promising Query-by-Model or Query-by-Sketch 3D shape retrieval methods and to solicit state-of-the-art approaches, we perform a more comprehensive comparison of twenty-six (eighteen originally participating algorithms and eight additional state-of-the-art or new) retrieval methods by evaluating them on the common benchmark. The benchmark, results, and evaluation tools are publicly available at our websites (http://www.itl.nist.gov/iad/vug/sharp/contest/2014/Generic3D/, 2014, http://www.itl.nist.gov/iad/vug/sharp/contest/2014/SBR/, 2014).
It has been a formidable task to achieve efficiency and scalability for the alignment between two massive, conceptually similar ontologies. Here we assume, an ontology is typically given in RDF ...(Resource Description Framework) or OWL (Web Ontology Language) and can be represented by a directed graph. A straightforward approach to the alignment of two ontologies entails an
O
(
N
2
)
computation by comparing every combination of pairs of nodes from given two ontologies, where
N
denotes the average number of nodes in each ontology. Our proposed algorithm called
Anchor-Flood algorithm, boasting of
O
(
N
log
(
N
)
)
computation on the average, starts off with an
anchor, a pair of “look-alike” concepts from each ontology, gradually exploring concepts by collecting neighboring concepts, thereby taking advantage of
locality of reference in the graph data structure. It outputs a set of alignments between concepts and properties within semantically connected subsets of two entire graphs, which we call
segments. When similarity comparison between a pair of nodes in the directed graph has to be made to determine whether two given ontologies are aligned or not, we repeat the similarity comparison between a pair of nodes, within the neighborhood pairs of two ontologies surrounding the anchor iteratively until the algorithm meets that “
either all the collected concepts are explored, or no new aligned pair is found”. In this way, we can significantly reduce the computational time for the alignment. Moreover, since we only focus on
segment-to-segment comparison, regardless of the entire size of ontologies, our algorithm not only achieves high performance, but also resolves the scalability problem in aligning ontologies. Our proposed algorithm reduces the number of seemingly-aligned but actually misaligned pairs. Through several examples with large ontologies, we will demonstrate the features of our Anchor-Food algorithm.
Stance detection in twitter aims at mining user stances expressed in a tweet towards a single or multiple target entities. Detecting and analyzing user stances from massive opinion-oriented twitter ...posts provide enormous opportunities to journalists, governments, companies, and other organizations. Most of the prior studies have explored the traditional deep learning models, e.g., long short-term memory (LSTM) and gated recurrent unit (GRU) for detecting stance in tweets. However, compared to these traditional approaches, recently proposed densely connected bidirectional LSTM and nested LSTMs architectures effectively address the vanishing-gradient and overfitting problems as well as dealing with long-term dependencies. In this paper, we propose a neural network model that adopts the strengths of these two LSTM variants to learn better long-term dependencies, where each module coupled with an attention mechanism that amplifies the contribution of important elements in the final representation. We also employ a multi-kernel convolution on top of them to extract the higher-level tweet representations. Results of extensive experiments on single and multi-target benchmark stance detection datasets show that our proposed method achieves substantial improvement over the current state-of-the-art deep learning based methods.
Web search queries are usually vague, ambiguous, or tend to have multiple intents. Users have different search intents while issuing the same query. Understanding the intents through mining subtopics ...underlying a query has gained much interest in recent years. Query suggestions provided by search engines hold some intents of the original query, however, suggested queries are often noisy and contain a group of alternative queries with similar meaning. Therefore, identifying the subtopics covering possible intents behind a query is a formidable task. Moreover, both the query and subtopics are short in length, it is challenging to estimate the similarity between a pair of short texts and rank them accordingly. In this paper, we propose a method for mining and ranking subtopics where we introduce multiple semantic and content-aware features, a bipartite graph-based ranking (BGR) method, and a similarity function for short texts. Given a query, we aggregate the suggested queries from search engines as candidate subtopics and estimate the relevance of them with the given query based on word embedding and content-aware features by modeling a bipartite graph. To estimate the similarity between two short texts, we propose a Jensen-Shannon divergence based similarity function through the probability distributions of the terms in the top retrieved documents from a search engine. A diversified ranked list of subtopics covering possible intents of a query is assembled by balancing the relevance and novelty. We experimented and evaluated our method on the NTCIR-10 INTENT-2 and NTCIR-12 IMINE-2 subtopic mining test collections. Our proposed method outperforms the baselines, known related methods, and the official participants of the INTENT-2 and IMINE-2 competitions.