Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that ...involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked “What vehicle is the person riding?”, computers will need to identify the objects in an image as well as the relationships
riding(man, carriage)
and
pulling(horse, carriage)
to answer correctly that “the person is riding a horse-drawn carriage.” In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense annotations of objects, attributes, and relationships within each image to learn these models. Specifically, our dataset contains over 108K images where each image has an average of
35
objects,
26
attributes, and
21
pairwise relationships between objects. We canonicalize the objects, attributes, relationships, and noun phrases in region descriptions and questions answer pairs to WordNet synsets. Together, these annotations represent the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Image retrieval using scene graphs Johnson, Justin; Krishna, Ranjay; Stark, Michael ...
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
06/2015
Conference Proceeding
This paper develops a novel framework for semantic image retrieval based on the notion of a scene graph. Our scene graphs represent objects ("man", "boat"), attributes of objects ("boat is white") ...and relationships between objects ("man standing on boat"). We use these scene graphs as queries to retrieve semantically related images. To this end, we design a conditional random field model that reasons about possible groundings of scene graphs to test images. The likelihoods of these groundings are used as ranking scores for retrieval. We introduce a novel dataset of 5,000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval. In particular, we evaluate retrieval using full scene graphs and small scene subgraphs, and show that our method outperforms retrieval methods that use only objects or low-level image features. In addition, we show that our full model can be used to improve object localization compared to baseline methods.
Points of interest are an important requirement for location-based services, yet they are editorially curated and maintained, either professionally or through community. Beyond the laborious manual ...annotation task, further complications arise as points of interest may appear, relocate, or disappear over time, and may be relevant only to specific communities. To assist, complement, or even replace manual annotation, we propose a novel method for the automatic localization of points of interest depicted in photos taken by people across the world. Our technique exploits the geographic coordinates and the compass direction supplied by modern cameras, while accounting for possible measurement errors due to the variability in accuracy of the sensors that produced them. We statistically demonstrate that our method significantly outperforms techniques from the research literature on the task of estimating the geographic coordinates and geographic footprints of points of interest in various cities, even when photos are involved in the estimation process that do not show the point of interest at all.
Background. Tenofovir disoproxil fumarate (TDF) is an established nucleotide analogue in the treatment of chronic hepatitis B. Bone mineral density loss has been described in TDF-treated patients ...with human immunodeficiency virus infection, but limited data exist for patients with chronic hepatitis B. Dual X-ray absorptiometry (DEXA) was used to determine bone mineral density changes in TDF-exposed patients. We evaluated the accuracy of the Fracture Risk Assessment Tool (FRAX) as an alternative to DEXA in clinical practice. Methods. A total of 170 patients were studied: 122 were exposed to TDF, and 48 were controls. All patients underwent DEXA, and demographic details were recorded. FRAX scores (before and after DEXA) were calculated. Results. TDF was associated with a lower hip T score (P = .02). On univariate and multivariate analysis, advancing age, smoking, lower body mass index, and TDF exposure were independent predictors of low bone mineral density. In addition, the pre-DEXA FRAX score was an accurate predictor of the post-DEXA FRAX treatment recommendation (100% sensitivity and 83% specificity), area under the curve 0.93 (95% CI, .87-.97, P<.001). Conclusions. TDF-treated patients with chronic hepatitis have reduced bone mineral density, but the reduction is limited to 1 anatomical site. Age and advanced liver disease are additional contributing factors, underlining the importance of multifactorial fracture risk assessment. FRAX can accurately identify those at greatest risk of osteoporotic fracture.
Traditional cameras and video equipment are gradually losing the race with smart phones and small mobile devices that allow video, photo, and audio capturing on the go. Users are now quickly creating ...movies and taking photos whenever and wherever they go, particularly at concerts and live events (e.g., shows, sport events). Still, in-situ media capturing with such devices poses constraints to any user, especially amateur ones. In this paper, we present the design and evaluation of a mobile video capture suite that allows for cooperative ad hoc production. Our system relies on ad hoc in-situ collaboration offering users the ability to switch between streams and cooperate with each other in order to capture better media with mobile devices. Our main contribution is the real-time awareness that users gain on media capturing endeavors around them and the possibility to collect that data for personal use once the event is over. This contribution is further emphasized by the geo-referenced cues that support the overall user interface and the management of the different media streams. As a secondary contribution, we report on lessons and design guidelines that emerged and apply to in-situ design of rich video collaborative experiences and with the elicitation of functional and usability requirements related to privacy, social connections, and gamification.
How the human auditory system extracts perceptually relevant acoustic features of speech is unknown. To address this question, we used intracranial recordings from nonprimary auditory cortex in the ...human superior temporal gyrus to determine what acoustic information in speech sounds can be reconstructed from population neural activity. We found that slow and intermediate temporal fluctuations, such as those corresponding to syllable rate, were accurately reconstructed using a linear model based on the auditory spectrogram. However, reconstruction of fast temporal fluctuations, such as syllable onsets and offsets, required a nonlinear sound representation based on temporal modulation energy. Reconstruction accuracy was highest within the range of spectro-temporal fluctuations that have been found to be critical for speech intelligibility. The decoded speech representations allowed readout and identification of individual words directly from brain activity during single trial sound presentations. These findings reveal neural encoding mechanisms of speech acoustic parameters in higher order human auditory cortex.
A variety of attention-related effects have been demonstrated in primary auditory cortex (A1). However, an understanding of the functional role of higher auditory cortical areas in guiding attention ...to acoustic stimuli has been elusive. We recorded from neurons in two tonotopic cortical belt areas in the dorsal posterior ectosylvian gyrus (dPEG) of ferrets trained on a simple auditory discrimination task. Neurons in dPEG showed similar basic auditory tuning properties to A1, but during behavior we observed marked differences between these areas. In the belt areas, changes in neuronal firing rate and response dynamics greatly enhanced responses to target stimuli relative to distractors, allowing for greater attentional selection during active listening. Consistent with existing anatomical evidence, the pattern of sensory tuning and behavioral modulation in auditory belt cortex links the spectrotemporal representation of the whole acoustic scene in A1 to a more abstracted representation of task-relevant stimuli observed in frontal cortex.
•Neural activity was recorded in secondary auditory cortical areas during behavior•Responses to target stimuli were substantially more enhanced than in A1•Responses to distractors were strongly suppressed relative to targets•Abstract task-relevant feature representations gradually emerge across brain areas
Exploring the mechanisms of auditory selective attention, Atiani et al. compare neural responses during behavior in primary versus higher-order auditory cortical areas in the ferret, demonstrating selective enhancement of target stimuli relative to distractors in higher-order belt areas during active listening.
CelebrityNet Li, Li-Jia; Shamma, David A.; Kong, Xiangnan ...
ACM transactions on multimedia computing communications and applications,
08/2015, Letnik:
12, Številka:
1
Journal Article
Recenzirano
Photos are an important information carrier for implicit relationships. In this article, we introduce an image based social network, called
CelebrityNet
, built from implicit relationships encoded in ...a collection of celebrity images. We analyze the social properties reflected in this image-based social network and automatically infer communities among the celebrities. We demonstrate the interesting discoveries of the CelebrityNet. We particularly compare the inferred communities with human manually labeled ones and show quantitatively that the automatically detected communities are highly aligned with that of human interpretation. Inspired by the uniqueness of visual content and tag concepts within each community of the CelebrityNet, we further demonstrate that the constructed social network can serve as a knowledge base for high-level visual recognition tasks. In particular, this social network is capable of significantly improving the performance of automatic image annotation and classification of unknown images.
Editors' Message Lampinen, Airi; Gergle, Darren; Shamma, David A.
Proceedings of the ACM on human-computer interaction,
11/2019, Letnik:
3, Številka:
CSCW
Journal Article
Recenzirano
Odprti dostop
It is our great pleasure to welcome you to this issue of the Proceedings of the ACM on Human- Computer Interaction, on the contributions of the research community Computer-Supported Cooperative Work ...and Social Computing (CSCW). This issue contains a carefully selected set of papers, accepted through our review process from among the 658 world-wide articles submitted by the Spring 2019 deadline. After the first round of peer review, 325 (49.4%) papers were invited to the Revise and Resubmit phase. After receiving the revised submissions, the external reviewers and the program committee reviewed all second round contributions. Finally, the program committee came together for a three-day online editorial committee meeting, held to allow for collective deliberation. Ultimately, 205 papers (31.2%) were accepted.