Data quality is the prerequisite of big data research and the basis of all data analysis, mining, and decision support. Therefore, a comprehensive fuzzy evaluation method for big data quality ...evaluation is proposed. Through the analysis of big data quality characteristics, a big data quality evaluation system for the whole process of data processing is constructed. The subjective weight and objective weight of each indicator are calculated through the analytic hierarchy process and entropy method. In order to overcome the subjective and one-sided shortcomings of the single weight determination method, the subjective weight and the objective weight are organically integrated through the distance function method to determine the combined weight of each indicator. The quantified result of big data quality is obtained through fuzzy calculation of membership degree. Finally the ranking results of the proposed method are compared with those of some existing multi-attribute decision-making (MADM) methods. The obtained results indicate that the proposed method is reasonable and efficient to deal with MADM problems. It can comprehensively measure the level of big data quality, and provide users with accurate and efficient quality evaluation results.
Background
Cutaneous squamous cell carcinoma (CSCC) is a severe malignancy derived from skin. Dysregulated circular RNAs (circRNAs) might play vital roles in tumor development.
Objective
Here, we ...aimed to explore the function of a novel circRNA circ_0067772 in CSCC.
Methods
Quantitative real-time PCR (qRT-PCR) or Western blot assay was performed to determine the expression of circ_0067772, microRNA (miR)-1238-3p and forkhead box protein G1 (FOXG1). Cell proliferation was assessed by Cell Counting Kit-8 (CCK-8) assay and colony formation assay. Transwell assay and wound healing assay were employed to examine cell metastasis. Flow cytometry was employed to monitor cell cycle and apoptosis. The target association between miR-1238-3p and circ_0067772 or FOXG1 was validated by dual-luciferase reporter assay. Moreover, role of circ_0067772 in vivo was investigated via xenograft model in nude mice.
Results
Circ_0067772 and FOXG1 were upregulated, while miR-1238-3p was downregulated in CSCC tissues and cells. Circ_0067772 knockdown conferred inhibitory effects on cell proliferation, migration and invasion of CSCC cells. MiR-1238-3p served as a target of circ_0067772, whose silencing could reverse circ_0067772 knockdown-induced inhibitory impact on the malignant cellular behaviors. Circ_0067772 positively regulated FOXG1 expression by antagonizing miR-1238-3p. Additionally, miR-1238-3p could repress CSCC cell proliferation, migration and invasion by targeting FOXG1. Also, circ_0067772 knockdown hindered CSCC tumor growth in vivo.
Conclusion
Our study identified a novel oncogenic circRNA and the involvement of circ_0067772/miR-1238-3p/FOXG1 axis in CSCC development, providing a target for CSCC therapy.
A simple multigranulation rough set approach is to approximate the target through a family of binary relations. Optimistic and pessimistic multigranulation rough sets are two typical examples of such ...approach. However, these two multigranulation rough sets do not take frequencies of occurrences of containments or intersections into account. To solve such problem, by the motivation of the multiset, the model of the multiple multigranulation rough set is proposed, in which both lower and upper approximations are multisets. Such two multisets are useful when counting frequencies of occurrences such that objects belong to lower or upper approximations with a family of binary relations. Furthermore, not only the concept of approximate distribution reduct is introduced into multiple multigranulation rough set, but also a heuristic algorithm is presented for computing reduct. Finally, multiple multigranulation rough set approach is tested on eight UCI (University of California—Irvine) data sets. Experimental results show: 1. the approximate quality based on multiple multigranulation rough set is between approximate qualities based on optimistic and pessimistic multigranulation rough sets; 2. by comparing with optimistic and pessimistic multigranulation rough sets, multiple multigranulation rough set needs more attributes to form a reduct.
In recent years, granular computing has been developed as a unified data description paradigm. As a popular soft computing supervised learning model, rough sets theory-based data description approach ...has been intensively investigated in data mining research. Feasible information granulation and approximation approaches have been recognized as two key features of data descriptors in rough sets. In this study, we propose a Dempster–Shafer theory-based rough granular description model based on a principle of justifiable granularity. First, we apply evidence information to show the performance of information granules generated from various data density regions, and definitions of lower and upper approximation sets are discussed considering characteristics of data credibility and plausibility, respectively. Furthermore, we propose a robust rough description model to identify some extreme instances, such as outlier and noise instances. Moreover, a set of pseudo labels is provided to enhance the robustness of the proposed model. Finally, to search for an optimum granularity, justifiable granularity is quantified from the perspectives of legitimacy and interpretability, and then optimized by a particle swarm optimization algorithm. Extensive comparative experiments with several representative rough granular description models illustrate that the proposed model achieves almost all the best approximation quality, number difference, and neighborhood credibility values. These experimental results demonstrate that the proposed approach is reasonable, effective, and robust, and is a promising rough granular description model for complex data in real-world applications.
•The evidence information is introduced into the modeling of rough sets theory.•Local strategy is adopted to enhance the robustness of rough sets model.•Outlier and noise instances with respect to various density regions are identified.•A feasible supervised optimal granularity for data description is proposed.
•Attribute-driven image captioning joining visual positioning and attribute selection.•Pointing mechanism to merge the attribute detection result into caption generation.•Approach that can well ...associate the attentional regions with visual attributes.•Experimental results outperform some attribute-based state-of-the-arts.
Visual attributes detection provides rich semantic concepts for image captioning. Some previous methods attempt to directly encode the attributes into vectors and generate the corresponding captions, which ignore the correlations between the image regions and attributes. In this paper, we consider to bridge the gap between visual features and detected attributes: first to look at a specific region of the image and second to decide which attribute to attend to. We propose an attribute-driven image captioning approach consisting of two parts: the visual positioning part and the attribute selection part. Specifically, we introduce the pointer-generator network into the second part of our model as a soft-switch, which determines whether to generate a word through the hidden state or point to a detected attribute at each decoding step. Qualitative and Quantitative experiments show that our model can improve the coverage of key visual attributes and significantly boost the overall performance.
Multi-label learning is an emerging paradigm exploiting samples with rich semantics. As an effective solution to multi-label learning, the strategy of label-specific features (LIFT) has been widely ...applied. Technically, such strategy feeds the tailored features to learning model instead of the original ones. However, tailoring features for each label may cause redundancy or irrelevance in feature space, thereby deteriorating the learning performance. To alleviate such a problem, a novel multi-label classification method named Relief-LIFT is proposed in this study. Relief-LIFT firstly leverages LIFT to generate the toiled features, and then adjusts Relief to select informative features from those toiled ones for the classification model. Experimental results on 12 real-world multi-label data sets demonstrate that, our proposed Relief-LIFT can achieve better performance as compared with other well-established multi-label classification methods.
The final 3'-terminal residue of the telomeric DNA G-overhang is inherently less precise. Here, we describe how alteration of the last 3'-terminal base affects the mutual recognition between two ...different G-rich oligomers of human telomeric DNA in the formation of heteromolecular G-quadruplexes (hetero-GQs). Associations between three- and single-repeat fragments of human telomeric DNA, target d(GGGTTAGGGTTAGGG) and probe d(TAGGGT), in Na+ solution yield two coexisting forms of (3 + 1) hybrid hetero-GQs: the kinetically favourable LLP-form (left loop progression) and the thermodynamically controlled RLP-form (right loop progression). However, only the adoption of a single LLP-form has been previously reported between the same probe d(TAGGGT) and a target variant d(GGGTTAGGGTTAGGGT) having one extra 3'-end thymine. Moreover, the flanking base alterations of short G-rich probe variants also significantly affect the loop progressions of hetero-GQs. Although seemingly two pseudo-mirror counter partners, the RLP-form exhibits a preference over the LLP-form to be recognized by a low equivalent of fluorescence dye thioflavin T (ThT). To a greater extent, ThT preferentially binds to RLP hetero-GQ than with the corresponding telomeric DNA duplex context or several other representative unimolecular GQs.
Label distribution learning (LDL) is a generalized machine learning framework for dealing with label ambiguity, as it can explore the relative importance levels of different labels in the description ...of each sample. Although several algorithms have been proposed to solve LDL problems, they usually destroy the consistency of geometric structures between feature space and label space to a certain extent, which frequently plays a significant role in learning tasks. Meanwhile, most existing LDL algorithms only take predictive performances into consideration, while ignoring the computational cost and robustness to noises. To remedy above deficiencies, we propose a novel algorithm, i.e., Local Collaborative Representation based Label Distribution Learning, shortly LCR-LDL. In LCR-LDL, an unlabeled sample is treated as the collaborative representation of the local dictionary constructed by the neighborhood of the unlabeled sample, and the discriminatory information of representation coefficients is used to reconstruct the label distribution of the unlabeled sample. Experimental results on 20 real-world LDL data sets compared with results produced by 11 state-of-the-art algorithms show that, the proposed LCR-LDL algorithm can not only effectively improve the predictive performances for LDL tasks, but also exhibit higher robustness and a lightweight computational overhead. This study suggests new trends for considering the computational cost and robustness issues in the LDL community.
•We propose two multi-label learning approaches with LIFT reduction.•The idea of fuzzy rough set attribute reduction is adopted in our approaches.•Sample selection improves the efficiency in feature ...dimension reduction.
In multi-label learning, since different labels may have some distinct characteristics of their own, multi-label learning approach with label-specific features named LIFT has been proposed. However, the construction of label-specific features may encounter the increasing of feature dimensionalities and a large amount of redundant information exists in feature space. To alleviate this problem, a multi-label learning approach FRS-LIFT is proposed, which can implement label-specific feature reduction with fuzzy rough set. Furthermore, with the idea of sample selection, another multi-label learning approach FRS-SS-LIFT is also presented, which effectively reduces the computational complexity in label-specific feature reduction. Experimental results on 10 real-world multi-label data sets show that, our methods can not only reduce the dimensionality of label-specific features when compared with LIFT, but also achieve satisfactory performance among some popular multi-label learning approaches.
Abstract
Vast G-quadruplexes (GQs) are primarily folded by one, two, or four G-rich oligomers, rarely with an exception. Here, we present the first NMR solution structure of a trimolecular GQ ...(tri-GQ) that is solely assembled by the self-trimerization of d(GTTAGG), preferentially in Na+ solution tolerant to an equal amount of K+ cation. Eight guanines from three asymmetrically folded strands of d(GTTAGG) are organized into a two-tetrad core, which features a broken G-column and two width-irregular grooves. Fast strand exchanges on a timescale of second at 17°C spontaneously occur between folded tri-GQ and unfolded single-strand of d(GTTAGG) that both species coexist in dynamic equilibrium. Thus, this tri-GQ is not just simply a static assembly but rather a dynamic assembly. Moreover, another minor tetra-GQ that has putatively tetrameric (2+2) antiparallel topology becomes noticeable only at an extremely high strand concentration above 18 mM. The major tri-GQ and minor tetra-GQ are considered to be mutually related, and their reversible interconversion pathways are proposed accordingly. The sequence d(GTTAGG) could be regarded as either a reading frame shifted single repeat of human telomeric DNA or a 1.5 repeat of Bombyx mori telomeric DNA. Overall, our findings provide new insight into GQs and expect more functional applications.