The rumour spectrum Turenne, Nicolas
PloS one,
01/2018, Letnik:
13, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Rumour is an old social phenomenon used in politics and other public spaces. It has been studied for only hundred years by sociologists and psychologists by qualitative means. Social media platforms ...open new opportunities to improve quantitative analyses. We scanned all scientific literature to find relevant features. We made a quantitative screening of some specific rumours (in French and in English). Firstly, we identified some sources of information to find them. Secondly, we compiled different reference, rumouring and event datasets. Thirdly, we considered two facets of a rumour: the way it can spread to other users, and the syntagmatic content that may or may not be specific for a rumour. We found 53 features, clustered into six categories, which are able to describe a rumour message. The spread of a rumour is multi-harmonic having different frequencies and spikes, and can survive several years. Combinations of words (n-grams and skip-grams) are not typical of expressivity between rumours and news but study of lexical transition from a time period to the next goes in the sense of transmission pattern as described by Allport theory of transmission. A rumour can be interpreted as a speech act but with transmission patterns.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In ruminants, more than 30% of the embryonic loss observed after artificial insemination has an early origin that is coincident with the marked elongation of the conceptus that occurs before ...implantation. During this developmental phase, physiological interactions are established between the conceptus and the uterus which are essential for the establishment of pregnancy and the elongation process. Our molecular knowledge of elongating conceptuses in cattle has long been focused on its analysis in view of its interactions with the uterus with the elongating stages being defined, like the uterus stages, by days post insemination or conception. The gene clusters reported so far indicate important pathways, some being shared by the non-elongating conceptuses of other mammals. However, to identify the key components of the elongation process – that could be specific to ungulates – new models are needed. Somatic nuclear transfer could be one of them as it provides complementary insights on differentiation beyond the blastocyst stage. Nonetheless, other models are necessary to convert gene lists or networks in elongating phenotypes. This review partly summarizes information on these topics, but data on the impact of the uterus on the elongation process or on the differentiation of the embryonic tissues are reviewed elsewhere.
Social media is more and more dominant in everyday life for people around the world. YouTube content is a resource that may be useful, in social computational science, for understanding key questions ...about society. Using this resource, we performed web scraping to create a dataset of 644,575 video transcriptions concerning net activism and whistleblowing. We automatically performed linguistic feature extraction to capture a representation of each video using its title, description and transcription (downloaded metadata). The next step was to clean the dataset using automatic clustering with linguistic representation to identify unmatched videos and noisy keywords. Using these keywords to exclude videos, we finally obtained a dataset that was reduced by 95%, i.e., it contained 35,730 video transcriptions. Then, we again automatically clustered the videos using a lexical representation and split the dataset into subsets, leading to hundreds of clusters that we interpreted manually to identify a hierarchy of topics of interest concerning whistleblowing. We used the dataset to learn a lexical representation for a specific topic and to detect unknown whistleblowing videos for this topic; the accuracy of this detection is 57.4%. We also used the dataset to identify interesting context linguistic markers around the names of whistleblowers. From a given list of names, we automatically extracted all 5-g word sequences from the dataset and identified interesting markers in the left and right contexts for each name by manual interpretation. The results of our study are the following: a dataset (raw and cleaned collections) concerning whistleblowing, a hierarchy of topics about whistleblowing, the automatic prediction of whistleblowing and the semi-automatic semantic analysis of markers around whistleblower names. This text mining analysis can be exploited for digital sociology and e-democracy studies.
Text data is often seen as "take-away" materials with little noise and easy
to process information. Main questions are how to get data and transform them
into a good document format. But data can be ...sensitive to noise oftenly called
ambiguities. Ambiguities are aware from a long time, mainly because polysemy is
obvious in language and context is required to remove uncertainty. I claim in
this paper that syntactic context is not suffisant to improve interpretation.
In this paper I try to explain that firstly noise can come from natural data
themselves, even involving high technology, secondly texts, seen as verified
but meaningless, can spoil content of a corpus; it may lead to contradictions
and background noise.
Parallel text datasets are a valuable for educational purposes, machine translation, and cross-language information retrieval, but few are domain-oriented. We have created a Chinese–English parallel ...dataset in the domain of finance technology, using the 'Financial Times' website, from which we grabbed 60,473 news items from between 2007 and 2021. This dataset is a bilingual Chinese–English parallel dataset of news in the domain of finance. It is open access in its original state without transformation, and has been made not for machine translation as has been used, but for intelligent mining, in which we conducted many experiments using up-to-date text mining techniques: clustering (topic modeling, community detection, 'k'-means), topic prediction (naive Bayes, SVM, LSTM, Bert), and pattern discovery (dictionary based, time series). We present the usage of these techniques as a framework for other studies, not only as an application but with an interpretation.
This book presents a theory of consciousness which is unique and sustainable in nature, based on physiological and cognitive-linguistic principles controlled by a number of socio-psycho-economic ...factors. In order to anchor this theory, which draws upon various disciplines, the author presents a number of different theories, all of which have been abundantly studied by scientists from both a theoretical and experimental standpoint, including models of social organization, ego theories, theories of the motivational system in psychology, theories of the motivational system in neurosciences, language modeling and computational modeling of motivation. The theory presented in this book is based on the hypothesis that an individual’s main activities are developed by self-motivation, managed as an informational need. This is described in chapters covering self-motivation on a day-to-day basis, the notion of need, the hypothesis and control of cognitive self-motivation and a model of self-motivation which associates language and physiology. The subject of knowledge extraction is also covered, including the impact of self-motivation on written information, non-transversal and transversal text-mining techniques and the fields of interest of text mining.
Amongst the big questions about humanity from “Who am I?” to “Where am I”, there is the subject of consciousness relayed by philosophy and theology at first and then by biology, psychology, sociology ...and more recently by cognitive sciences. This book attempts to reconcile these disciplines through the common denominator of consciousness, but on one of its specific aspect. The book presents a concept of-motivation of field of activity, as a biological motor of a state of consciousness, and of which system informatics enables to reveal its existence. If consciousness is not well defined, a limited frame provides a more precise and observable definition despite the context complexity of the individuals: psychological, social and technical. These observable elements are of two nature: a nature of principal activity and a cognitive-linguistic nature, which are adapted with extrinsic and intrinsic control factors. The argument presented here consists in presenting a state of consciousness as related to the concept of instinctive (therefore physiological) information need, and whose carrier traces on the traditional information channels, i.e. letters and subscriptions and on the modern information channels, i.e. text messages and web pages, can be analyzed via knowledge extraction.
An abstract is not only a mirror of the full article; it also aims to draw attention to the most important information of the document it summarizes. Many studies have compared abstracts with full ...texts for their informativeness. In contrast to previous studies, we propose to investigate this relation based not only on the amount of information given by the abstract but also on its importance. The main objective of this paper is to introduce a new metric called GEM to measure the " generosity " or representativeness of an abstract. Schematically speaking, a generous abstract should have the best possible score of similarity for the sections important to the reader. Based on a questionnaire gathering information from 630 researchers, we were able to weight sections according to their importance. In our approach, seven sections were first automatically detected in the full text. The accuracy of this classification into sections was above 80% compared with a dataset of documents where sentences were assigned to sections by experts. Second, each section was weighted according to the questionnaire results. The GEM score was then calculated as a sum of weights of sections in the full text corresponding to sentences in the abstract normalized over the total sum of weights of sections in the full text. The correlation between GEM score and the mean of the scores assigned by annotators was higher than the correlation between scores from different experts. As a case study, the GEM score was calculated for 36,237 articles in environmental sciences (1930–2013) retrieved from the French ISTEX database. The main result was that GEM score has increased over time. Moreover, this trend depends on subject area and publisher. No correlation was found between GEM score and citation rate or open access status of articles. We conclude that abstracts are more generous in recent publications and cannot be considered as mere teasers. This research should be pursued in greater depth, particularly by examining structured abstracts. GEM score could be a valuable indicator for exploring large numbers of abstracts, by guiding the reader in his/her choice of whether or not to obtain and read full texts.
Since processes in well-known model organisms have specific features different from those in Bos taurus, the organism under study, a good way to describe gene regulation in ruminant embryos would be ...a species-specific consideration of closely related species to cattle, sheep and pig. However, as highlighted by a recent report, gene dictionaries in pig are smaller than in cattle, bringing a risk to reduce the gene resources to be mined (and so for sheep dictionaries). Bioinformatics approaches that allow an integration of available information on gene function in model organisms, taking into account their specificity, are thus needed. Besides these closely related and biologically relevant species, there is indeed much more knowledge of (i) trophoblast proliferation and differentiation or (ii) embryogenesis in human and mouse species, which provides opportunities for reconstructing proliferation and/or differentiation processes in other mammalian embryos, including ruminants. The necessary knowledge can be obtained partly from (i) stem cell or cancer research to supply useful information on molecular agents or molecular interactions at work in cell proliferation and (ii) mouse embryogenesis to supply useful information on embryo differentiation. However, the total number of publications for all these topics and species is great and their manual processing would be tedious and time consuming. This is why we used text mining for automated text analysis and automated knowledge extraction. To evaluate the quality of this "mining", we took advantage of studies that reported gene expression profiles during the elongation of bovine embryos and defined a list of transcription factors (or TF, n = 64) that we used as biological "gold standard". When successful, the "mining" approach would identify them all, as well as novel ones.
To gain knowledge on molecular-genetic regulations in a non model organism, we offer an approach based on literature-mining and score arrangement of data from model organisms. This approach was applied to identify novel transcription factors during bovine blastocyst elongation, a process that is not observed in rodents and primates. As a result, searching through human and mouse corpuses, we identified numerous bovine homologs, among which 11 to 14% of transcription factors including the gold standard TF as well as novel TF potentially important to gene regulation in ruminant embryo development. The scripts of the workflow are written in Perl and available on demand. They require data input coming from all various databases for any kind of biological issue once the data has been prepared according to keywords for the studied topic and species; we can provide data sample to illustrate the use and functionality of the workflow.
To do so, we created a workflow that allowed the pipeline processing of literature data and biological data, extracted from Web of Science (WoS) or PubMed but also from Gene Expression Omnibus (GEO), Gene Ontology (GO), Uniprot, HomoloGene, TcoF-DB and TFe (TF encyclopedia). First, the human and mouse homologs of the bovine proteins were selected, filtered by text corpora and arranged by score functions. The score functions were based on the gene name frequencies in corpora. Then, transcription factors were identified using TcoF-DB and double-checked using TFe to characterise TF groups and families. Thus, among a search space of 18,670 bovine homologs, 489 were identified as transcription factors. Among them, 243 were absent from the high-throughput data available at the time of the study. They thus stand so far for putative TF acting during bovine embryo elongation, but might be retrieved from a recent RNA sequencing dataset (Mamo et al. , 2012). Beyond the 246 TF that appeared expressed in bovine elongating tissues, we restricted our interpretation to those occurring within a list of 50 top-ranked genes. Among the transcription factors identified therein, half belonged to the gold standard (ASCL2, c-FOS, ETS2, GATA3, HAND1) and half did not (ESR1, HES1, ID2, NANOG, PHB2, TP53, STAT3).
A workflow providing search for transcription factors acting in bovine elongation was developed. The model assumed that proteins sharing the same protein domains in closely related species had the same protein functionalities, even if they were differently regulated among species or involved in somewhat different pathways. Under this assumption, we merged the information on different mammalian species from different databases (literature and biology) and proposed 489 TF as potential participants of embryo proliferation and differentiation, with (i) a recall of 95% with regard to a biological gold standard defined in 2011 and (ii) an extension of more than 3 times the gold standard of TF detected so far in elongating tissues. The working capacity of the workflow was supported by the manual expertise of the biologists on the results. The workflow can serve as a new kind of bioinformatics tool to work on fused data sources and can thus be useful in studies of a wide range of biological processes.