Accurately predicting copolymer properties plays a pivotal role in the field of polymer informatics. This endeavor necessitates a comprehensive understanding of polymer structures, adept feature ...engineering, and proficient application of machine learning algorithms. In traditional methodologies, features for each monomer structure were generated independently, thus, segregating features from individual monomers. This approach results in a less informative representation, with limited applicability. To address these challenges, we introduce an innovative machine learning framework, named weighted-chained-SMILES. By constructing a representative SMILES notation, more intricate information can be encapsulated within the generated features. Our experimental results to predict the thermal properties demonstrate that our approach not only delivers competitive predictive performance but also exhibits enhanced adaptability across a diverse range of molecular representations. The versatility showcased by our model suggests promising potential for tackling more complex copolymer systems and extending its predictive capabilities to various other polymer properties.
Significant progress over the past decade in virtual representations of molecules and their physicochemical properties has produced new drugs from virtual screening of the structures of single ...protein molecules by conventional modeling methods. The development of clinical antiviral drugs from structural data for HIV protease has been a major success in structure based drug design. Techniques for virtual screening involve the ranking of the affinity of potential ligands for the target site on a protein. Two main alternatives have been developed: modeling of the target protein with a series of related ligand molecules, and docking molecules from a database to the target protein site. The computational speed and prediction accuracy will depend on the representation of the molecular structure and chemistry, the search or simulation algorithm, and the scoring function to rank the ligands. Moreover, the general challenges in modern computational drug design arise from the profusion of data, including whole genomes of DNA, protein structures, chemical libraries, affinity and pharmacological data. Therefore, software tools are being developed to manage and integrate diverse data, and extract and visualize meaningful relationships. Current areas of research include the development of searchable chemical databases, which requires new algorithms to represent molecules and search for structurally or chemically similar molecules, and the incorporation of machine learning techniques for data mining to improve the accuracy of predictions. Examples will be presented for the virtual screening of drugs that target HIV protease.
A program for presenting quantitative structure-activity relationship(QSAR) was developed and named PAS(Platform for Assessment from Structure). Though this development is still continuing, the ...efficiency of PAS is ascertained. I herein provide an outline of PAS and the know-how acquired. The disclosure of details of PAS can be a help for people who are interested in QSAR presentation. In Part 1, the extraction method of the component from the SMILES notation is discussed.