We introduce physics-informed neural networks – neural networks that are trained to solve supervised learning tasks while respecting any given laws of physics described by general nonlinear partial ...differential equations. In this work, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct types of algorithms, namely continuous time and discrete time models. The first type of models forms a new family of data-efficient spatio-temporal function approximators, while the latter type allows the use of arbitrarily accurate implicit Runge–Kutta time stepping schemes with unlimited number of stages. The effectiveness of the proposed framework is demonstrated through a collection of classical problems in fluids, quantum mechanics, reaction–diffusion systems, and the propagation of nonlinear shallow-water waves.
•We put forth a deep learning framework that enables the synergistic combination of mathematical models and data.•We introduce an effective mechanism for regularizing the training of deep neural networks in small data regimes.•The proposed methods enable scientific prediction and discovery from incomplete models and incomplete data.
Defining a Cancer Dependency Map Tsherniak, Aviad; Vazquez, Francisca; Montgomery, Phil G. ...
Cell,
07/2017, Volume:
170, Issue:
3
Journal Article
Peer reviewed
Open access
Most human epithelial tumors harbor numerous alterations, making it difficult to predict which genes are required for tumor survival. To systematically identify cancer dependencies, we analyzed 501 ...genome-scale loss-of-function screens performed in diverse human cancer cell lines. We developed DEMETER, an analytical framework that segregates on- from off-target effects of RNAi. 769 genes were differentially required in subsets of these cell lines at a threshold of six SDs from the mean. We found predictive models for 426 dependencies (55%) by nonlinear regression modeling considering 66,646 molecular features. Many dependencies fall into a limited number of classes, and unexpectedly, in 82% of models, the top biomarkers were expression based. We demonstrated the basis behind one such predictive model linking hypermethylation of the UBB ubiquitin gene to a dependency on UBC. Together, these observations provide a foundation for a cancer dependency map that facilitates the prioritization of therapeutic targets.
Display omitted
•The DEMETER computational model segregates on- from off-target effects of RNAi•769 strong differential dependencies were identified in 501 cancer cell lines•Predictive models for 426 dependencies were found using 66,646 molecular features•This cancer dependency map facilitates the prioritization of therapeutic targets
A large-scale analysis of 501 cancer cell lines reveals new vulnerabilities that will help prioritize therapeutic targets
The use of statistical and machine learning approaches to predict the compressive strength of concrete based on mixture proportions, on account of its industrial importance, has received significant ...attention. However, previous studies have been limited to small, laboratory-produced data sets. This study presents the first analysis of a large data set (>10,000 observations) of measured compressive strengths from actual (job-site) mixtures and their corresponding actual mixture proportions. Predictive models are applied to examine relationships between the mixture design variables and strength, and to thereby develop an estimate of the (28-day) strength. These models are also applied to a laboratory-based data set of strength measurements published by Yeh et al. (1998) and the performance of the models across both data sets is compared. Furthermore, to illustrate the value of such models beyond simply strength prediction, they are used to design optimal concrete mixtures that minimize cost and embodied CO2 impact while satisfying imposed target strengths.
The availability of data in massive collections in recent past not only has enabled data-driven decision-making, but also has created new questions that cannot be addressed effectively with the ...traditional statistical analysis methods. The traditional scientific research not only has prevented business scholars from working on emerging problems with big and rich data-sets, but also has resulted in irrelevant theory and questionable conclusions; mostly because the traditional method has mainly focused on modeling and analysis/explanation than on the real/practical problem and the data. We believe the lack of due attention to the analytics paradigm can to some extent be attributed to the business scholars' unfamiliarity with the analytics methods/methodologies and the type of questions it can answer. Therefore, our purpose in this paper is to illustrate how analytics, as a complement, rather than a successor, to the traditional research paradigm, can be used to address interesting emerging business research questions.
Predictive modeling is becoming an essential tool for clinical decision support, but health systems with smaller sample sizes may construct suboptimal or overly specific models. Models become ...over-specific when beside true physiological effects, they also incorporate potentially volatile site-specific artifacts. These artifacts can change suddenly and can render the model unsafe. To obtain safer models, health systems with inadequate sample sizes may adopt one of the following options. First, they can use a generic model, such as one purchased from a vendor, but often such a model is not sufficiently specific to the patient population and is thus suboptimal. Second, they can participate in a research network. Paradoxically though, sites with smaller datasets contribute correspondingly less to the joint model, again rendering the final model suboptimal. Lastly, they can use transfer learning, starting from a model trained on a large data set and updating this model to the local population. This strategy can also result in a model that is over-specific. In this paper we present the consensus modeling paradigm, which uses the help of a large site (source) to reach a consensus model at the small site (target). We evaluate the approach on predicting postoperative complications at two health systems with 9,044 and 38,045 patients (rare outcomes at about 1% positive rate), and conduct a simulation study to understand the performance of consensus modeling relative to the other three approaches as a function of the available training sample size at the target site. We found that consensus modeling exhibited the least over-specificity at either the source or target site and achieved the highest combined predictive performance.
•Health systems with inadequate data may build suboptimal or overly specific models.•Over-specialized models are not robust to institutional changes and can cause harm.•Consensus modeling reduces over-specialization for small health systems.•Consensus modeling improves equity in AI capability and can reduce health disparity.
Normalization transformations have recently experienced a resurgence in popularity in the era of machine learning, particularly in data preprocessing. However, the classical methods that can be ...adapted to cross-validation are not always effective. We introduce Ordered Quantile (ORQ) normalization, a one-to-one transformation that is designed to consistently and effectively transform a vector of arbitrary distribution into a vector that follows a normal (Gaussian) distribution. In the absence of ties, ORQ normalization is guaranteed to produce normally distributed transformed data. Once trained, an ORQ transformation can be readily and effectively applied to new data. We compare the effectiveness of the ORQ technique with other popular normalization methods in a simulation study where the true data generating distributions are known. We find that ORQ normalization is the only method that works consistently and effectively, regardless of the underlying distribution. We also explore the use of repeated cross-validation to identify the best normalizing transformation when the true underlying distribution is unknown. We apply our technique and other normalization methods via the
bestNormalize
R package on a car pricing data set. We built
bestNormalize
to evaluate the normalization efficacy of many candidate transformations; the package is freely available via the Comprehensive R Archive Network.
Grain boundary properties of elemental metals Zheng, Hui; Li, Xiang-Guo; Tran, Richard ...
Acta materialia,
March 2020, 2020-03-00, 2020-03-01, Volume:
186, Issue:
C
Journal Article
Peer reviewed
Open access
Display omitted
The structure and energy of grain boundaries (GBs) are essential for predicting the properties of polycrystalline materials. In this work, we use high-throughput density functional ...theory calculations workflow to construct the Grain Boundary Database (GBDB), the largest database of DFT-computed grain boundary properties to date. The database currently encompasses 327 GBs of 58 elemental metals, including 10 common twist or symmetric tilt GBs for body-centered cubic (bcc) and face-centered cubic (fcc) systems and the Σ7 0001 twist GB for hexagonal close-packed (hcp) systems. In particular, we demonstrate a novel scaled-structural template approach for HT GB calculations, which reduces the computational cost of converging GB structures by a factor of ~ 3–6. The grain boundary energies and work of separation are rigorously validated against previous experimental and computational data. Using this large GB dataset, we develop an improved predictive model for the GB energy of different elements based on the cohesive energy and shear modulus. The open GBDB represents a significant step forward in the availability of first principles GB properties, which we believe would help guide the future design of polycrystalline materials.
The effectiveness of slow low-dose oral immunotherapy (SLOIT) for cow's milk (CM) allergy has been reported. Most OIT studies have discussed the target populations over 4 years old. Furthermore, no ...predicting modeling is reported for CM allergy remission by CM-SLOIT under 4 years of age.
We sought to develop a predictive model for CM allergy remission by SLOIT after 3 years in young children who started CM-SLOIT under 4 years of age.
We included young children with cow's milk allergy or cow's milk sensitization (development modeling set with 120 children and validation modeling set with 71 children). We did logistic regression analysis to develop the models. We calculated the area under the receiver operating curves (ROC-AUCs) to evaluate the predictive modeling performance.
The model (CM-sIgE before SLOIT + age at beginning SLOIT + serum TARC before starting SLOIT + CM-sIgE titer one year after OIT) showed good discrimination with the ROC-AUC of 0.83 (95% CI:0.76–0.91) on internal validation. Applying the model to the validation set gave good discrimination (ROC-AUC = 0.89, 95% CI:0.80–0.97) and a reasonable calibration (intraclass correlation coefficient = 0.88, 95% CI:0.62–0.97).
We developed and validated predictive modeling for determining the remission rate of CM allergy at 3 years after SLOIT under 4 years of age in children with CM allergy. This predictive model is highly accurate and can support CM allergy management. (226 words)
Chemotherapy in brain tumors is tailored based on tumor type, grade, and molecular markers, which are crucial for predicting responses and survival outcomes. This review summarizes the role of ...chemotherapy in gliomas, glioneuronal and neuronal tumors, ependymomas, choroid plexus tumors, medulloblastomas, and meningiomas, discussing standard treatment protocols and recent developments in targeted therapies.Furthermore, the studies reporting the integration of MRI-based radiomics and deep learning models for predicting treatment outcomes are reviewed.
Advances in MRI-based radiomics and deep learning models have significantly enhanced the prediction of chemotherapeutic benefits, survival prediction following chemotherapy, and differentiating tumor progression with psuedoprogression. These non-invasive techniques offer valuable insights into tumor characteristics and treatment responses, facilitating personalized therapeutic strategies. Further research is warranted to refine these models and expand their applicability across different brain tumor types.
•Advanced MRI techniques and deep learning models improve non-invasive prognostication and treatment planning•MRI-based radiomics can predict chemotherapeutic benefits, molecular markers, grading, and prognosisin brain tumor patients.•Radiomics signatures often outperform clinicopathological models in predicting treatment outcomes.•Combining clinical, genetic, and radiomic features enhances survival prediction accuracy for glioma patients.•Further standardization and validation of radiomics techniques are necessary for clinical application.
The needs to ground construction safety-related decisions under uncertainty on knowledge extracted from objective, empirical data are pressing. Although construction research has considered machine ...learning (ML) for more than two decades, it had yet to be applied to safety concerns. We applied two state-of-the-art ML models, Random Forest (RF) and Stochastic Gradient Tree Boosting (SGTB), to a data set of carefully featured attributes and categorical safety outcomes, extracted from a large pool of textual construction injury reports via a highly accurate Natural Language Processing (NLP) tool developed by past research. The models can predict injury type, energy type, and body part with high skill (0.236<RPSS<0.436), outperforming the parametric models found in the literature. The high predictive skill reached suggests that injuries do not occur at random, and that therefore construction safety should be studied empirically and quantitatively rather than strictly being approached through the analysis of subjective data, expert opinion, and with a regulatory and managerial perspective. This opens the gate to a new research field, where construction safety is considered an empirically grounded quantitative science. Finally, the absence of predictive skill for the output variable injury severity suggests that unlike other safety outcomes, injury severity is mainly random, or that extra layers of predictive information should be used in making predictions, like the energy level in the environment. In the context of construction safety analysis, this study makes important strides in that the results provide reliable probabilistic forecasts of likely outcomes should an accident occur, and show great potential for integration with building information modeling and work packaging due to the binary and physical nature of the input variables. Such data-driven predictions had been absent from the field since its inception.
•Machine learning algorithms were trained on attribute and outcome data.•The high predictive skill reached shows that injuries do not occur at random.•Construction safety should thus be studied empirically and quantitatively.