The author describes the methods of finding and predicting compound word correlates in its equivalence nest on the basis of paradigmatic relations between the abbreviation and its decoding stimuli. ...The relevance of the study is determined by the fact that it established the principles of dictionary entries formation for the Exploratory Dictionary of Russian Word Abbreviations compiled by Exploratory Laboratory of Abbreviation Trends Research at the Russian Language Department of Donetsk National University. The paper gives general idea of the multiple motivation possibilities of abbreviated words in the modern language. The aim of the study is to generally represent the dictionary methods of finding and predicting the content of compound abbreviation equivalence nests. The author considered pure search models in existing dictionaries of abbreviations and texts, where abbreviations and their equivalents are presented as absolute synonyms, as well as extrapolation prediction methods by decoding matrices models of word equivalence nests, connected with the described abbreviations and their decoding stimuli through their paradigmatic relations. The latter led to forming compound abbreviations paradigmatic associations typology, including an abbreviation nest, an abbreviation-onomasiologic field, an abbreviation group and an abbreviation paradigm. The novelty of the study has been determined by the fact that for the first it provides a set of methods for finding and predicting the equivalence nest of abbreviated words; the methodology of the obtained results verification is also described. The results have provided the possibility of a full-fledged comprehensive dictionary description of compound abbreviated appellatives. In the future, the author is planning to develop methods of searching for equivalents of other formal onomasicologic abbreviation types - initial abbreviations and onym compound abbreviations.
Zipf’s Law of Abbreviation – the idea that more frequent symbols in a code are simpler than less frequent ones – has been shown to hold at the level of words in many languages. We tested whether it ...holds at the level of individual written characters. Character complexity is similar to word length in that it requires more cognitive and motor effort for producing and processing more complex symbols. We built a dataset of character complexity and frequency measures covering 27 different writing systems. According to our data, Zipf’s Law of Abbreviation holds for every writing system in our dataset — the more frequent characters have lower degrees of complexity and vice-versa. This result provides further evidence of optimization mechanisms shaping communication systems.
•Zipf’s Law of Abbreviation (ZLA): more frequent words tend to be shorter.•His hypothesis: ZLA due to speakers maximising accuracy while minimising effort.•We tested this hypothesis using an ...artificial language learning task.•We manipulated pressures for communicative accuracy and efficiency.•ZLA-like lexicons only arose when both pressures were at play.
The linguist George Kingsley Zipf made a now classic observation about the relationship between a word’s length and its frequency; the more frequent a word is, the shorter it tends to be. He claimed that this “Law of Abbreviation” is a universal structural property of language. The Law of Abbreviation has since been documented in a wide range of human languages, and extended to animal communication systems and even computer programming languages. Zipf hypothesised that this universal design feature arises as a result of individuals optimising form-meaning mappings under competing pressures to communicate accurately but also efficiently—his famous Principle of Least Effort. In this study, we use a miniature artificial language learning paradigm to provide direct experimental evidence for this explanatory hypothesis. We show that language users optimise form-meaning mappings only when pressures for accuracy and efficiency both operate during a communicative task, supporting Zipf’s conjecture that the Principle of Least Effort can explain this universal feature of word length distributions.
This study aims to show the suitability of the abbreviation used on YouTube social media. It provided a description of the problems that arise regarding the emergence of various forms of ...abbreviations, as well as analyzed the meaning contained in the abbreviations in Youtube account header. It employed a qualitative descriptive approach (in the form of a content analysis process and data of a collection of tangible words). The technique of providing data utilizes the free-of-conversation listening technique. The approach used in this study was a qualitative approach with Miles Huberman's analytical model. The data analysis model used was the flow model of analysis. The sources of the data the title on Najwa Shihab's Youtube channel. This study analyzed twenty two data that utilize abbreviations on them. It contained eight data in the abbreviation domain. In the realm of fragments, seven data have been found. The acronym domain in this study consisted of eight data. Then, in the contraction and letter symbols, there was one datum. With regard to the majority of the most frequent occurrences of each type of abbreviation in the header of Najwa Shihab's youtube account were 32% abbreviations, 28% fragments, 32% acronyms, 4% contractions, and 4% letter symbols.
Although the negative impact of abbreviations in source code is well-recognized, abbreviations are common for various reasons. To this end, a number of approaches have been proposed to expand ...abbreviations in identifiers. However, such approaches are either inaccurate or confined to specific identifiers. To this end, in this paper, we propose a generic and accurate approach to expand identifier abbreviations by leveraging both semantic relation and transfer expansion. One of the key insights of the approach is that abbreviations in the name of software entity <inline-formula><tex-math notation="LaTeX">e</tex-math> <mml:math><mml:mi>e</mml:mi></mml:math><inline-graphic xlink:href="jiang-ieq1-2995736.gif"/> </inline-formula> have a great chance to find their full terms in names of software entities that are semantically related to <inline-formula><tex-math notation="LaTeX">e</tex-math> <mml:math><mml:mi>e</mml:mi></mml:math><inline-graphic xlink:href="jiang-ieq2-2995736.gif"/> </inline-formula>. Consequently, the proposed approach builds a knowledge graph to represent such entities and their relationships with <inline-formula><tex-math notation="LaTeX">e</tex-math> <mml:math><mml:mi>e</mml:mi></mml:math><inline-graphic xlink:href="jiang-ieq3-2995736.gif"/> </inline-formula> and searches the graph for full terms. Another key insight is that literally identical abbreviations within the same application are likely (but not necessary) to have identical expansions, and thus the semantics-based expansion in one place may be transferred to other places. To investigate when abbreviation expansion could be transferred safely, we conduct a case study on three open-source applications. The results suggest that a significant part (75 percent) of expansions could be transferred among lexically identical abbreviations within the same application. However, the risk of transfer varies according to various factors, e.g., length of abbreviations, the physical distance between abbreviations, and semantic relations between abbreviations. Based on these findings, we design nine heuristics for transfer expansion and propose a learning-based approach to prioritize both transfer heuristics and semantic-based expansion heuristics. Evaluation results on nine open-source applications suggest that the proposed approach significantly improves the state of the art, improving recall from 29 to 89 percent and precision from 39 to 92 percent.
The article throws light upon the theoretical understanding of the role and place of abbreviations, as well as the description of the structural and semantic features of the abbreviation vocabulary. ...It aims at developing a systematic approach to the study of this phenomenon on the material of an English literary text by defining the types of acronyms in English on the basis of conventional, common usage, analysis of the systemic, structural, semantic characteristics of the abbreviated vocabulary of English and by trying establish the basic structural-semantic characteristics of such vocabulary.
В статье ставится цель представить лингвистическое, прагматическое и социокультурное описание сокращений английского языка, ассоциируемых с британской монархией. Разработана и представлена авторская ...тематическая классификация анализируемых сокращений, уделено внимание восприятию и интерпретации сокращений, сопряженных с явлением омонимии и другими лингвистическими и прагматическими факторами. Являясь национально маркированными единицами лексической системы британского варианта английского языка, анализируемые сокращения содержат фоновую (лингвострановедческую социокультурную) информацию, обладают особыми культурно-историческими и иными ассоциациями. Анализируемые сокращения занимают определенное место в фоновых знаниях британцев как носителей языка и культуры. Очевидна их ценность для лингвострановедческого и переводческого аспектов обучения английскому языку и важность изучения в сопоставительном плане.
Objective: The goal of this study was to develop a practical framework for recognizing and disambiguating clinical abbreviations, thereby improving current clinical natural language processing (NLP) ...systems’ capability to handle abbreviations in clinical narratives.
Methods: We developed an open-source framework for clinical abbreviation recognition and disambiguation (CARD) that leverages our previously developed methods, including: (1) machine learning based approaches to recognize abbreviations from a clinical corpus, (2) clustering-based semiautomated methods to generate possible senses of abbreviations, and (3) profile-based word sense disambiguation methods for clinical abbreviations. We applied CARD to clinical corpora from Vanderbilt University Medical Center (VUMC) and generated 2 comprehensive sense inventories for abbreviations in discharge summaries and clinic visit notes. Furthermore, we developed a wrapper that integrates CARD with MetaMap, a widely used general clinical NLP system.
Results and Conclusion: CARD detected 27 317 and 107 303 distinct abbreviations from discharge summaries and clinic visit notes, respectively. Two sense inventories were constructed for the 1000 most frequent abbreviations in these 2 corpora. Using the sense inventories created from discharge summaries, CARD achieved an F1 score of 0.755 for identifying and disambiguating all abbreviations in a corpus from the VUMC discharge summaries, which is superior to MetaMap and Apache’s clinical Text Analysis Knowledge Extraction System (cTAKES). Using additional external corpora, we also demonstrated that the MetaMap-CARD wrapper improved MetaMap’s performance in recognizing disorder entities in clinical notes. The CARD framework, 2 sense inventories, and the wrapper for MetaMap are publicly available at https://sbmi.uth.edu/ccb/resources/abbreviation.htm. We believe the CARD framework can be a valuable resource for improving abbreviation identification in clinical NLP systems.
There is a need for a standardized, practical annotation for structures of lipid species derived from mass spectrometric approaches; i.e., for high-throughput data obtained from instruments operating ...in either high- or low-resolution modes. This proposal is based on common, officially accepted terms and builds upon the LIPID MAPS terminology. It aims to add defined levels of information below the LIPID MAPS nomenclature, as detailed chemical structures, including stereochemistry, are usually not automatically provided by mass spectrometric analysis. To this end, rules for lipid species annotation were developed that reflect the structural information derived from the analysis. For example, commonly used head group-specific analysis of glycerophospholipids (GP) by low-resolution instruments is neither capable of differentiating the fatty acids linked to the glycerol backbone nor able to define their bond type (ester, alkyl-, or alk-1-enyl-ether). This and other missing structural information is covered by the proposed shorthand notation presented here. Beyond GPs, we provide shorthand notation for fatty acids/acyls (FA), glycerolipids (GL), sphingolipids (SP), and sterols (ST). In summary, this defined shorthand nomenclature provides a standard methodology for reporting lipid species from mass spectrometric analysis and for constructing databases.
Abbreviations are widely used in identifiers. However, they have severe negative impact on program comprehension and IR-based software maintenance activities, e.g., concept location, software ...clustering, and recovery of traceability links. Consequently, a number of efficient approaches have been proposed successfully to expand abbreviations in identifiers. Most of such approaches rely heavily on dictionaries, and rarely exploit the specific and fine-grained context of identifiers. As a result, such approaches are less accurate in expanding abbreviations (especially short ones) that may match multiple dictionary words. To this end, in this paper we propose an automatic approach to improve the accuracy of abbreviation expansion by exploiting the specific and fine-grained context. It focuses on a special but common category of abbreviations (abbreviations in parameter names), and thus it can exploit the specific and fine-grained context, i.e., the type of the enclosing parameter as well the corresponding formal (or actual) parameter name. The recent empirical study on parameters suggest that actual parameters are often lexically similar to their corresponding formal parameters. Consequently, it is likely that an abbreviation in a formal parameter can find its full terms in the corresponding actual parameter, and vice versa. Based on this assumption, a series of heuristics are proposed to look for full terms from the corresponding actual (or formal) parameter names. To the best of our knowledge, we are the first to expand abbreviations by exploiting the lexical similarity between actual and formal parameters. We also search for full terms in the data type of the enclosing parameter. Only if all such heuristics fail, the approach turns to the traditional abbreviation dictionaries. We evaluate the proposed approach on seven well known open-source projects. Evaluation results suggest that when only parameter abbreviations are involved, the proposed approach can improve the precision from 26 to 95 percent and recall from 26 to 65 percent compared against the state-of-the-art general purpose approach. Consequently, the proposed approach could be employed as a useful supplement to existing approaches to expand parameter abbreviations.