This study addressed the challenge of training generative adversarial networks (GANs) on small tabular clinical trial datasets for data augmentation, which are known to pose difficulties in training ...due to limited sample sizes. To overcome this obstacle, a hybrid approach is proposed, combining the synthetic minority oversampling technique (SMOTE) to initially augment the original data to a more substantial size for improving the subsequent GAN training with a Wasserstein conditional generative adversarial network with gradient penalty (WCGAN-GP), proven for its state-of-art performance and enhanced stability. The ultimate objective of this research was to demonstrate that the quality of synthetic tabular data generated by the final WCGAN-GP model maintains the structural integrity and statistical representation of the original small dataset using this hybrid approach. This focus is particularly relevant for clinical trials, where limited data availability due to privacy concerns and restricted accessibility to subject enrollment pose common challenges. Despite the limitation of data, the findings demonstrate that the hybrid approach successfully generates synthetic data that closely preserved the characteristics of the original small dataset. By harnessing the power of this hybrid approach to generate faithful synthetic data, the potential for enhancing data-driven research in drug clinical trials become evident. This includes enabling a robust analysis on small datasets, supplementing the lack of clinical trial data, facilitating its utility in machine learning tasks, even extending to using the model for anomaly detection to ensure better quality control during clinical trial data collection, all while prioritizing data privacy and implementing strict data protection measures.
Immune and inflammatory responses are known to be major causes of preterm birth (PTB). The maternal genetic background plays an important role in the development of PTB. Interferon-stimulated gene 15 ...(ISG15) is an interferon-induced protein which can modulate immune cell activation and function. We aim to study if polymorphisms in the ISG15 gene are associated with spontaneous PTB (sPTB) risk in Taiwanese women.
ISG15 rs4615788 C/G, rs1921 G/A, and rs8997 A/G polymorphisms were genotyped in a hospital-based study of 112 women with sPTB and 1120 term controls. The plasma concentrations of ISG15 were determined by enzyme-linked immunosorbent assay.
We found the ISG15 rs1921 G-rs8997 A haplotype was associated with decreased risk for PTB (χ
= 6.26, p = .01, p
= .04). The A/G genotype of ISG15 rs8997 polymorphism might have the potential to confer reduced risk of PTB women (χ
= 4.09, p = .04, p
= .08). Spontaneous PTB women displayed higher plasma ISG15 levels compared to term controls (p < .001). The plasma ISG15 levels among pregnant women with rs8997 A/G genotype were found significantly lower compared to G/G genotype (p = .03).
Women with the ISG15 rs1921 G-rs8997 A haplotype may associate with spontaneous PTB. These findings provide new insights into the etiology of preterm birth.
The correct establishment of the barcode classification system for fish can facilitate biotaxonomists to distinguish fish species, and it can help the government to verify the authenticity of the ...ingredients of fish products or identify unknown fish related samples. The Cytochrome c oxidation I (COI) gene sequence in the mitochondria of each species possesses unique characteristics, which has been widely used as barcodes in identifying species in recent years. Instead of using COI gene sequences for primer design, flanking tRNA segments of COI genes from 2618 complete fish mitochondrial genomes were analyzed to discover suitable primers for fish classification at taxonomic family level. The minimal number of primer sets is designed to effectively distinguish various clustered groups of fish species for identification applications. Sequence alignment analysis and cross tRNA segment comparisons were applied to check and ensure the primers for each cluster group are exclusive.
Two approaches were applied to improve primer design and re-cluster fish species. The results have shown that exclusive primers for 2618 fish species were successfully discovered through in silico analysis. In addition, we applied sequence alignment analysis to confirm that each pair of primers can successfully identify all collected fish species at the taxonomic family levels.
This study provided a practical strategy to discover unique primers for each fishery species and a comprehensive list of exclusive primers for extracting COI barcode sequences of all known fishery species. Various applications of verification of fish products or identification of unknown fish species could be effectively achieved.
A conformational epitope (CE) is composed of neighboring amino acid residues located on an antigenic protein surface structure. CEs bind their complementary paratopes in B-cell receptors and/or ...antibodies. An effective and efficient prediction tool for CE analysis is critical for the development of immunology-related applications, such as vaccine design and disease diagnosis. We propose a novel method consisting of two sequential modules: matching and prediction. The matching module includes two main approaches. The first approach is a complete sequence search (CSS) that applies BLAST to align the sequence with all known antigen sequences. Fragments with high epitope sequence identities are identified and the predicted residues are annotated on the query structure. The second approach is a spiral vector search (SVS) that adopts a novel surface spiral feature vector for large-scale surface patch detection when queried against a comprehensive epitope database. The prediction module also contains two proposed subsystems. The first system is based on knowledge-based energy and geometrical neighboring residue contents, and the second system adopts combinatorial features, including amino acid contents and physicochemical characteristics, to formulate corresponding geometric spiral vectors and compare them with all spiral vectors from known CEs. An integrated testing dataset was generated for method evaluation, and our two searching methods effectively identified all epitope regions. The prediction results show that our proposed method outperforms previously published systems in terms of sensitivity, specificity, positive predictive value, and accuracy. The proposed method significantly improves the performance of traditional epitope prediction. Matching followed by prediction is an efficient and effective approach compared to predicting directly on specific surfaces containing antigenic characteristics.
Pollution in human-made fishing ports caused by petroleum from boats, dead fish, toxic chemicals, and effluent poses a challenge to the organisms in seawater. To decipher the impact of pollution on ...the microbiome, we collected surface water from a fishing port and a nearby offshore island in northern Taiwan facing the Northwestern Pacific Ocean. By employing 16S rRNA gene amplicon sequencing and whole-genome shotgun sequencing, we discovered that Rhodobacteraceae, Vibrionaceae, and Oceanospirillaceae emerged as the dominant species in the fishing port, where we found many genes harboring the functions of antibiotic resistance (ansamycin, nitroimidazole, and aminocoumarin), metal tolerance (copper, chromium, iron and multimetal), virulence factors (chemotaxis, flagella, T3SS1), carbohydrate metabolism (biofilm formation and remodeling of bacterial cell walls), nitrogen metabolism (denitrification, N2 fixation, and ammonium assimilation), and ABC transporters (phosphate, lipopolysaccharide, and branched-chain amino acids). The dominant bacteria at the nearby offshore island (Alteromonadaceae, Cryomorphaceae, Flavobacteriaceae, Litoricolaceae, and Rhodobacteraceae) were partly similar to those in the South China Sea and the East China Sea. Furthermore, we inferred that the microbial community network of the cooccurrence of dominant bacteria on the offshore island was connected to dominant bacteria in the fishing port by mutual exclusion. By examining the assembled microbial genomes collected from the coastal seawater of the fishing port, we revealed four genomic islands containing large gene-containing sequences, including phage integrase, DNA invertase, restriction enzyme, DNA gyrase inhibitor, and antitoxin HigA-1. In this study, we provided clues for the possibility of genomic islands as the units of horizontal transfer and as the tools of microbes for facilitating adaptation in a human-made port environment.
Myocardial infarction (MI) is one of the significant cardiovascular diseases (CVDs). According to Taiwanese health record analysis, the hazard rate reaches a peak in the initial year after diagnosis ...of MI, drops to a relatively low value, and maintains stable for the following years. Therefore, identifying suspicious comorbidity patterns of short-term death before the diagnosis may help achieve prolonged survival for MI patients.
Interval sequential pattern mining was applied with odds ratio to the hospitalization records from the Taiwan National Health Insurance Research Database to evaluate the disease progression and identify potential subjects at the earliest possible stage.
Our analysis resulted in five disease pathways, including "diabetes mellitus," "other disorders of the urethra and urinary tract," "essential hypertension," "hypertensive heart disease," and "other forms of chronic ischemic heart disease" that led to short-term death after MI diagnosis, and these pathways covered half of the cohort.
We explored the possibility of establishing trajectory patterns to identify the high-risk population of early mortality after MI.
Multiple speech source separation plays an important role in many applications such as automatic speech recognition, acoustical surveillance, and teleconferencing. In this study, we propose a method ...for the separation of multiple speech sources in a reverberant environment based on sparse component enhancement. In a recorded signal (i.e., a mixed signal of multiple speech sources), there are always time–frequency points where only one source is active or dominant. It is the sparsity of speech signals. Such time–frequency points are called sparse component points. However, in a reverberant environment, the sparsity of the speech signal is affected, resulting in a decrease in the number of sparse component points in the recorded signal, which affects the quality of the separated source signal. In this study, for mixture signals recorded by a soundfield microphone (a microphone array), we first experimentally analyze the negative impact of reverberation on sparse components and then develop a sparse component enhancement method to increase the number of these points. Then, the sparse components are identified and classified according to the directions of arrival estimate of the sources. Next, the sparse components are used to guide the recovery of the non-sparse components. Finally, multiple source separation is achieved by the joint restoration of the sparse and non-sparse components of each source. The proposed method has low computational complexity and applies to underdetermined scenarios. Through a series of subjective and objective evaluation experiments, the effectiveness of the method is verified.
Speech emotion recognition (SER) is a hot topic in speech signal processing. When the training data and the test data come from different corpus, their feature distributions are different, which ...leads to the degradation of the recognition performance. Therefore, in order to solve this problem, a cross-corpus speech emotion recognition method is proposed based on subspace learning and domain adaptation in this paper. Specifically, training set data and the test set data are used to form the source domain and target domain, respectively. Then, the Hessian matrix is introduced to obtain the subspace for the extracted features in both source and target domains. In addition, an information entropy-based domain adaption method is introduced to construct the common space. In the common space, the difference between the feature distributions in the source domain and target domain is reduced as much as possible. To evaluate the performance of the proposed method, extensive experiments are conducted on cross-corpus speech emotion recognition. Experimental results show that the proposed method achieves better performance compared with some existing subspace learning and domain adaptation methods.
The chloroplast genome of Gracilaria firma was sequenced in view of its role as an economically important marine crop with wide industrial applications. To date, there are only 15 chloroplast genomes ...published for the Florideophyceae. Apart from presenting the complete chloroplast genome of G. firma, this study also assessed the utility of genome-scale data to address the phylogenetic relationships within the subclass Rhodymeniophycidae. The synteny and genome structure of the chloroplast genomes across the taxa of Eurhodophytina was also examined.
The chloroplast genome of Gracilaria firma maps as a circular molecule of 187,001 bp and contains 252 genes, which are distributed on both strands and consist of 35 RNA genes (3 rRNAs, 30 tRNAs, tmRNA and a ribonuclease P RNA component) and 217 protein-coding genes, including the unidentified open reading frames. The chloroplast genome of G. firma is by far the largest reported for Gracilariaceae, featuring a unique intergenic region of about 7000 bp with discontinuous vestiges of red algal plasmid DNA sequences interspersed between the nblA and cpeB genes. This chloroplast genome shows similar gene content and order to other Florideophycean taxa. Phylogenomic analyses based on the concatenated amino acid sequences of 146 protein-coding genes confirmed the monophyly of the classes Bangiophyceae and Florideophyceae with full nodal support. Relationships within the subclass Rhodymeniophycidae in Florideophyceae received moderate to strong nodal support, and the monotypic family of Gracilariales were resolved with maximum support.
Chloroplast genomes hold substantial information that can be tapped for resolving the phylogenetic relationships of difficult regions in the Rhodymeniophycidae, which are perceived to have experienced rapid radiation and thus received low nodal support, as exemplified in this study. The present study shows that chloroplast genome of G. firma could serve as a key link to the full resolution of Gracilaria sensu lato complex and recognition of Hydropuntia as a genus distinct from Gracilaria sensu stricto.