The theoretical part of the paper presents the project Slovenscina na dlani (Slovene in the palm of jour hand), which is intended for the establishment of an interactive learning environment for the ...Slovene language (as mother tongue) in elementary and secondary schools. In this paper we limit the research to elementary school. In the empirical section, we focus on freely available e-learning materials for the Slovene language from the 6th to the 9th grade of elementary school and present the results of an analysis of 10,118 tasks according to their types. The findings indicate an unbalanced representation of different types of tasks, with short answer types in the majority, the task type that is least popular among students. By introducing the most advanced linguistic technology into learning processes, we try to overcome the limitations of existing e-resources for learning the Slovene language.
The slWaC Corpus of the Slovene Web Erjavec, Tomaz; Ljubesic, Nikola; Logar, Natasa
Informatica (Ljubljana),
03/2015, Letnik:
39, Številka:
1
Journal Article
Recenzirano
Odprti dostop
The availability of large collections of text (language corpora) is crucial for empirically supported linguistic investigations of various languages; however, such corpora are complicated and ...expensive to collect. In recent years, corpora made from texts on the World Wide Web have become an attractive alternative to traditional corpora, as they can be made automatically, contain varied text types of contemporary language, and are quite large. This article describes version 2 of slWaC, a Web corpus of Slovene containing 1.2 billion tokens. This article describes the process of corpus compilation with a focus on near-duplicate removal, presents the linguistic annotation, format and accessibility of the corpus via Web concordancers. It then investigates the content of the corpus using the method of frequency profiling, by comparing its lemma and part-of-speech annotations with three corpora: the first version of slWaC, with Gigafida, the one billion word reference corpus of Slovene, and KRES, the hundred million word reference balanced corpus of Slovene.
The article is devoted to chosen titles and subtitles in Slovene-language popular science texts about Slovene history. The data comprise 110 examples taken from the three Slovene popular science ...books. The analysed phrases have different forms (nouns, adjective-modified nouns, nominalizations). The conducted study is methodologically based on semantic and syntax analysis – the analysis of the ways of realization of the predicate-argument structure on which a given initial para-text is founded – as applied in the context of textual analysis of scientific, and especially popular scientific, discourse. The author of the article investigates the relation between the given title and the content of the corresponding portion of the macro-text. Particularly interesting are titles formed by means of nominalization, that is to say – by means of a transformation of predicate-argument structure. When used as titles, nominalizations confirm their expected semantic and syntax features, such as the capacity for the omission of chosen elements, increased generality and the same time compactness of the message. They also proved fruitful elements of the strategies of macro-text construction based on using appropriate titles.
The paper describes the combined results of several projects which constitute a basic language resource infrastructure for printed historical Slovene. The IMP language resources consist of a digital ...library, an annotated corpus and a lexicon, which are interlinked and uniformly encoded following the Text Encoding Initiative Guidelines. The library holds about 650 units (mostly complete books) consisting of facsimiles with 45,000 pages as well as hand-corrected and structured transcriptions. The hand-annotated corpus has 300,000 tokens, where each word is tagged with its modernised word form, lemma, part-of-speech and, in cases of archaic words, its nearest contemporary equivalents. This information was extracted into the lexicon, which also covers an extended target-annotated corpus, resulting in 20,000 lemmas (of these 4,000 archaic) with 50,000 modern word forms and 70,000 attested forms. We have also developed a program to modernise, tag and lemmatise historical Slovene, and annotated the digital library with it, producing an automatically annotated corpus of 15 million words. To serve the humanities, the digital library and lexicon are available for reading and browsing on the web and the corpora via a concordancer. For language technology research and development the resources are available in source TEI XML under the Creative Commons Attribution licence. The paper presents the IMP resources, available from http://nl.ijs.si/imp/, the process of their compilation, encoding and dissemination, and concludes with directions for future research.
The Dictionary of Legal Terminology' shows the conceptual system of modern Slovenian law. In more than 10,000 dictionary entries, we present the terminology of contemporary Slovenian legal science, ...legal practice and legislation after 1991. The dictionary is primarily intended for legal experts, who practice various professions (including judges, lawyers, notaries, prosecutors as well as heads of HR and legal departments in companies). It is also intended for students of law and related sciences, in order to familiarize them with the conceptual system of Slovenian law in a professionally relevant manner. In order to use the dictionary efficiently at least a basic knowledge of law is required. This does not mean that journalists, translators, proof-readers and others who encounter legal terminology cannot use it. However, it should be borne in mind that definitions contained in a terminological dictionary are very concise and sufficiently informative only for an expert who is familiar with the conceptual system of a specific field of study, while other users are provided with a suitable starting point so they can find additional information if necessary.
The present monography entitled A Study of Slovenian Multi-Word Lexemes from a Lexicographical Perspective provides a typology of newer nominal multi-word lexemes with potential terminological ...meaning, based on the theory of Russian linguist N. M. Shanskiy which brings four precisely defined levels of semantic merger of their components. The analysis of individual components of multi-word lexemes from a morphological, syntactic and semantic perspective reveals typological tendencies of nominal multi-word lexemes which occurred especially in the last twenty years.The typology takes into account the causal relationship between the level of semantic transfer (of the components) of a multi-word lexeme on the one hand and the level of semantic merger of components of a multi-word lexeme and related lexicalization of a multi-word lexeme on the other hand, which is also reflected in the restricted collocability of their components. Lexicalized or non-lexicalized metaphorical and metonymic semantic transfers are contributing to the new semantic and syntactic combinations of individual words in the phrase, to the consequent integration of their meanings to form a single meaning of the multi-word lexeme and thus to a higher level of lexicalization and semantic firmness of a multi-word lexeme.
2.2.3.1 Glede izvora vprasalne prislova kod »ubi, sklon *k (*ide. *k-i-s) (Bezlaj 1982: 27)) ter navaja razlicne razlage jezikoslovcev. kopecný (1980: 371-76) za vseslovanski zaimenski prislov iz ...oblik k?de (slovensko koder < kodè-ze), k?du, k?dy (slovensko narecn kodi), k?da (slovensko koda in ko- daj) podaja slovensko obliko kod, sh. kud in kasubsk k?d (kqdka, k?dka (prav tam: 371), skrajsan iz katerekoli od navedenih oblik.10 Omenja, da so pri tem prislovu izkazani vsi trije pomeni in laliko zamenjuje tudi kje in kam, za 'kam' ne v zahodno- slovanskih jezikih in slovenscini, za 'kje' je razsirjen v poljscini, makedonscini in bolgarscini (prav tam: 373-74). Snoj ima psl. oblike *kçdÿ, *kodä, *kçdë, *kçdê, ki vsebujejo ide. vprasalni zaimek *k"o- ali *k"u-. in pripone *-ndhe (ki je izvomo razlicica pripone *-dhe, znane v ide. *k"u-d* llc. psl. ki>de 'kje'), kar se je laliko razvilo iz ide. *k"u-ndllc ali *k "u-ndc 'kam, kod' (2003: 288). Vanda BABic navaja znacilne prislovne pripone, ki so se v stari cerkveni slovanscini dodajale zaimenskim korenom za razlic- no dolocitev osnovnih zaimenskih pomenov, npr. za mesto pripone -de, ki »zaznamuje mesto glagolskega dejanja ali stanja«, -amo, tudi -emo, -odu, tudi -?de s pomenom smeri premikanja, prva prvotno za priblizevanje (h komu ali cemu draga za odda- ljevanje (od koga ali cesa kasneje obe za priblizevanje, dmga s predlogom ot? tudi za oddaljevanje (kamo 'kam', k?du, k?de 'kod', ot? k?du 'od kod' (Babic 2003: 223). mAtAsovic izvaja psl. *kundä, kar je okamneli orodnik ednine osnove *kunda-, iz ide. kwu-ndh-oh1 (2008: 249). 2.2.3.3 V16. stol. je kod tako vprasalni kot nedolocni, oziralni prislov in se pojavlja v 17, 18 in 12 delih, obliko koda pa poznata le Krelj in Juricic {Besedje 2011: 185). Ablativne zveze od kod so izpricane,11 do kod pa ne. Svetokriski ima glasovni dvojnici kod/kot (ob glagolu hoditi), pozna pa tudi predlozno rabo od kot kod x stavkih z vzroc- nimpomenom (Snoj 2006: 403). Pohlin ne navaja oblike kod, umetn tvoijeni sta nje- govi predlozni obliki, homonimni s casovnima prislovoma odklej, daklej 'od kod, do kod', ki ju v drugi izdaji opusti. Gutsmanv slovnici navaja le obliko od kod, v slovaiju 10 »Enako pa vprasalni kod, od kod in od kei 'od kje', Kopitar in Vodnik imajo vse tri: kod, od/ do kod, Dajnko kodi, odkod, Küzmic pa odkud/otkut (Irena Orel 2001: 44), slovar iztocnic za '(od)kod' nima, edino oziralni kod pri Ko sien (1848) (Novak 2006: 184). Pletersnikov slovar ima za kod se glasovno varianto ked in oblike z dodanimi obrazili oz. deikticno clenico -/: koda (Krelj), kodaj (Kastelec), kodi in kodik (vzhodnostajersko),12 samo dokod: poleg odkod se odkodaj (Kastelec), kec (= odkod, Miklosic (iz ked-ci)). Drugace obliko odkec izvaja Ramovs: *ot-ked-si (Ramovs 1935: 192). 7.2 V govorjenem jeziku je iz besedilnih zgledov v primerjavi z gradivom za SLA ugotovljena zivahnejsa izmenjava vseh treh PVPZ. Upada predvsem raba PVPZ kod, ki izkazuje najstevilnejso zastopanost in prepletenost prostorskih pomenskih sestavin, a ga v prekmurskem narecju ne poznajo (razen v obliki odkec 'od kod'). Zamenjujeta ga mestovni kje in smerni kam (pri istih govorcih nastopata tudi dvojnicno), ali pa se njegova raba omejuje na izrazanje izhodisca dejanja (od kod), pretezno v pomenu izvora/vzrocnosti glagolskega dejanja. Pomensko ga s kje povezuje sicer redko izrazena mestovnost (staticnost), s kam usmerjenost (dinamicnost), odpravlja pa se specializacija s sestavino razmescenosti v prostor in usmerjenega gibanja po prostoru (perlativnosti), pri dodani ablativni in adlativni sestavini (od kje, do kje) pomen ze eksplicitno izrazata predloga in oblikovno razlikovanje ni nujno (kod -> kje, podobno tudi tod -> tu, ondi/onod -> tam; od kod -> od/iz kje, do kod -> do kje, vzporedno od/do tod/ondod -> od/do tam). Nasprotno se na nekaterih podrocjih (rovtarske NS, posamicno dvojnicno tudi dolenjske, primorske, stajerske kod posplos na izrazanje umescenosti v prostoru in zamenjuje PVPZ kje. Smerni kam se ohranja in redko prevzema vlogo zaiinkov kje in kod, razen za ciljno mejo (kod -> kam, do kod -> do kam). Most notable in the modem spoken language are the decline or abandomnent of use of the semantic heterogeneous and specific pronoun kod and generalisation of use of the locative pronoun kje for expressing spatial distribution (kod (biti, se nahajati) -> kje), which is, among other tilings, also characteristic of the literary language of the 20th century, as well as the perlative (kod (se premikati) 'where (to move)') -> kje), ablative (od kod 'from where' -> od/iz kje) and the rarely expressed adlative with kod 'where to' (do kod 'how far' -> do kje/kam in which the meaning is already explicitly expressed with the two propositions, making a fonnal distinction unnecessary. The pronoun kod 'where' is also replacing the indicative kam 'where (to)' (even within one speaker we can find a doublet, i.e. the use of both fonns), or else its use is limited to the expression of a starting point (odkod 'from where'), mainly in the sense of the origin/causality of the verbal act. By contrast, in some areas (the Rovte dialect group and, rarely, as a doublet in other dialects as well, except the Prekmurje dialect in the east, where the pronoun kod is not known, except for the pirrase od kejc 'where from'), kod 'where' lias been generalised to express positioning in space and replaces the pronoun kje 'where'. The directive kam 'where (to)'has been preserved and rarely assumes the role of the pronouns kje and kod, except for the final destination (kod -> kam, do kod -> do kam).