In the Ashkenazi Jewish (AJ) population of Israel, 11% of breast cancer and 40% of ovarian cancer are due to three inherited founder mutations in the cancer predisposition genes BRCA1 and BRCA2 . For ...carriers of these mutations, risk-reducing salpingo-oophorectomy significantly reduces morbidity and mortality. Population screening for these mutations among AJ women may be justifiable if accurate estimates of cancer risk for mutation carriers can be obtained. We therefore undertook to determine risks of breast and ovarian cancer for BRCA1 and BRCA2 mutation carriers ascertained irrespective of personal or family history of cancer. Families harboring mutations in BRCA1 or BRCA2 were ascertained by identifying mutation carriers among healthy AJ males recruited from health screening centers and outpatient clinics. Female relatives of the carriers were then enrolled and genotyped. Among the female relatives with BRCA1 or BRCA2 mutations, cumulative risk of developing either breast or ovarian cancer by age 60 and 80, respectively, were 0.60 (± 0.07) and 0.83 (± 0.07) for BRCA1 carriers and 0.33 (± 0.09) and 0.76 (± 0.13) for BRCA2 carriers. Risks were higher in recent vs. earlier birth cohorts ( P = 0.006). High cancer risks in BRCA1 or BRCA2 mutation carriers identified through healthy males provide an evidence base for initiating a general screening program in the AJ population. General screening would identify many carriers who are not evaluated by genetic testing based on family history criteria. Such a program could serve as a model to investigate implementation and outcomes of population screening for genetic predisposition to cancer in other populations.
Significance Inherited mutations in the tumor suppressor genes BRCA1 and BRCA2 predispose to very high risks of breast and ovarian cancer. For carriers of these mutations, risk-reducing surgery significantly reduces morbidity and mortality. General population screening for BRCA1 and BRCA2 mutations in young adult women could be feasible if accurate estimates of cancer risk for mutation carriers could be obtained. We determined that risks of breast and ovarian cancer for BRCA1 and BRCA2 mutation carriers ascertained from the general population are as high as for mutation carriers ascertained through personal or family history of cancer. General screening of BRCA1 and BRCA2 would identify many carriers who are currently not evaluated and could serve as a model for population screening for genetic predisposition to cancer.
Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the ...book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.
In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic ...objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.
It is important to classify academic papers in a fine-grained manner to uncover deeper implicit themes and semantics in papers for better semantic retrieval, paper recommendation, research trend ...prediction, topic analysis, and a series of other functions. Based on the ontology of the climate change domain, this study used an unsupervised approach to combine two methods, syntactic structure and semantic modeling, to build a framework of subject-indexing techniques for academic papers in the climate change domain. The framework automatically indexes a set of conceptual terms as research topics from the domain ontology by inputting the titles, abstracts and keywords of the papers using natural language processing techniques such as syntactic dependencies, text similarity calculation, pre-trained language models, semantic similarity calculation, and weighting factors such as word frequency statistics and graph path calculation. Finally, we evaluated the proposed method using the gold standard of manually annotated articles and demonstrated significant improvements over the other five alternative methods in terms of precision, recall and F1-score. Overall, the method proposed in this study is able to identify the research topics of academic papers more accurately, and also provides useful references for the application of domain ontologies and unsupervised data annotation.
Subject indexing, i.e., the enrichment of metadata records for textual resources with descriptors from a controlled vocabulary, is one of the core activities of libraries. Due to the proliferation of ...digital documents, it is no longer possible to annotate every single document intellectually, which is why we need to explore the potentials of automation on every level.
At ZBW the efforts to partially or completely automate the subject indexing process started as early as 2000 with experiments involving external partners and commercial software. The conclusion of that first exploratory period was that commercial, supposedly shelf-ready solutions would not suffice to cover the requirements of the library. In 2014 the decision was made to start doing the necessary applied research in-house which was successfully implemented by establishing a PhD position. However, the prototypical machine learning solutions that they developed over the following years were yet to be integrated into productive operations at the library. Therefore in 2020 an additional position for a software engineer was established and a pilot phase was initiated (planned to last until 2024) with the goal to complete the transfer of our solutions into practice by building a suitable software architecture that allows for real-time subject indexing with our trained models and the integration thereof into the other metadata workflows at ZBW.
In this paper we address the question of how to transfer results from applied research into a productive service, and we report on the milestones we have reached so far and on those that are yet to be reached on an operational level. We also discuss the challenges we were facing on a strategic level, the measures and resources (computing power, software, personnel) that were needed in order to be able to affect the transfer, and those that will be necessary in order to subsequently ensure the continued availability of the architecture and to enable a continuous development during running operations.
We conclude that there are still no shelf-ready open source systems for the automation of subject indexing – existing software has to be adapted and maintained continuously which requires various forms of expertise. However, the task of automation is here to stay, and librarians are witnessing the dawn of a new era where subject indexing is done at least in part by machines, and the respective roles of machines and human experts may shift even further and more rapidly in a not-so-distant future. We argue that in general, the format of “project” and the mindset that goes with it may not suffice to secure the commitment that an institution and its decision-makers and the library community as a whole will have to bring to the table in order to face the monumental task of the digital transformation and automation in the long run. We also highlight the importance of all parties – applied researchers, software engineers, stakeholders – staying involved and continuously communicating requirements and issues back and forth in order to successfully create and establish a productive service that is suitable and equipped for operation.
The study aims to describe the process of subject indexing of musical resources and to illustrate the problems to be solved by semantic indexing in the music domain. The case of Nuovo soggettario in ...Italy and the procedure used to enrich the Thesaurus with musical terminology are presented. The example of the Sinfonie term is illustrating the solutions adapted to solve the problem highlighted. In conclusion it is described how it is possible to use a general thesaurus such as the Nuovo soggettario in the semantic indexing of musical resources and how the results could improve user information retrieval.
In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital ...resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
Manually indexing documents for subject-based access is a labour-intensive process. We propose using metadata gathered from bibliographic databases to train algorithms that assist librarians in that ...work. We have developed Annif, an open source tool and microservice for automated subject indexing. After training it with a subject vocabulary and existing metadata, Annif can be used to assign subject headings for new documents. We have tested Annif with different document collections including scientific papers, old scanned books and contemporary e-books, Q&A pairs from an “ask a librarian” service, Finnish Wikipedia, and the archives of a local newspaper. The results of analysing scientific papers and current books have been reassuring, while other types of documents have proved to be more challenging. The current version is based on a combination of existing natural language processing and machine learning tools. By combining multiple approaches and existing open source algorithms, Annif can build on the strengths of individual algorithms and adapt to different settings. With Annif, we expect to improve subject indexing and classification processes especially for electronic documents as well as collections that otherwise would not be indexed at all.