Normal (N), habituated nonorganogenic (HNO), and tumour (T) sugar beet cell lines were analysed in order to establish specific patterns of extracellular proteins and identify protein markers that ...might explain the distinct phenotypical characteristics. Electron microscopy showed that the ultrastructure of N cells corresponds to that of parenchyma cells, and that these cells contain plastids with large starch grains. HNO and T cells had enlarged, lobed nuclei with an increased number of nucleoli; the number of nuclei in HNO cells was greater than in T cells. The T plastids were elongated, with reduced thylakoids and abundant phytoferritin deposits, while HNO plastids were small and vacuolated, with an irregular, underdeveloped thylakoid system. The extracellular proteome of the cells was separated by sodium dodecyl sulphate polyacrylamide gel electrophoresis. Greater differences in protein expression were observed between the HNO and N lines than between the T and N lines. Sixteen of the most prominent bands differentially expressed among the cell lines were cut out from the gel and analyzed by mass spectrometry. Cell wall-modifying enzymes were identified, including a peroxidase whose expression was twofold higher in N and T tissue than in HNO tissue; pectinesterase, which was expressed at a level threefold lower in the T line than in the other cell lines; and xyloglucan endotransglucosylase, which was expressed at a level sixfold higher in HNO and T tissue. Three proteins belonged to the chitinase gene family and their expression was higher in HNO and T tissue than in N tissue. The differential expression of these proteins suggests that these play a role in cell line-specific cell wall composition and cell-to-cell adhesion.
Prispevek predstavlja razvoj prosto dostopnih modelov za prepoznavanje in klasifikacijo imenskih enot za hrvaški in slovenski jezik. Poskusi se osredotočajo na najbolj informativne jezikovne ...lastnosti, pri čemer upoštevajo dostopnost jezikovnih orodij za oba jezika. Poleg standardnih jezikovnih lastnosti so upoštevane tudi distribucijske lastnosti, ki so bile izračunane iz velikih neoznačenih enojezičnih korpusov. Uporaba distribucijskih lastnosti izboljša rezultate za 7-8 točk v meri F1, uporaba oblikoslovnih informacij pa dodatno za 3-4 točke, in to pri obeh jezikih. Najboljši naučeni model skupaj s testno množico za primerjavo z obstoječimi in bodočimi sistemi ter model za oblikoslovno označevanje hrvaščine s programom HunPos so dostopni za prenos za uporabo v znanstvene in komercialne namene.
Digitalno doba otvorilo je nove mogućnosti za sastavljanje korpusa društvenog diskursa, što je korpusnolingvističke metode približilo drugim metodama analize diskursa te humanističkim znanostima. Čak ...i kada se ne koriste nikakve specifične tehnike korpusne lingvistike, danas je za empirijski utemeljenu društveno-znanstvenu analizu sve učestalije korištenje neke vrste korpusa ('korpusno-asistirana analiza diskursa' ili 'kritička korpusna analiza', Hardt-Mautner 1995; Baker 2016). U postjugoslavenskom prostoru, nedavni razvoj korpusne lingvistike donio je prednosti u mnogim područjima istraživanja. Ipak, za lingviste i analitičare diskursa koji se upuštaju u prikupljanje specijaliziranih korpusa za vlastite istraživačke svrhe, i dalje ostaju otvorena mnoga pitanja - djelomično zbog pozadine korpusne lingvistike koja se brzo mijenja, ali i zbog činjenice da još uvijek postoji rascjep u poznavanju korpusnih metoda, kao i metodologije sastavljanja korpusa izvan anglofonskog konteksta. Ovim radom pokušavamo smanjiti spomenuti rascjep predstavljajući jedan postupni prikaz postupka izgradnje korpusa za hrvatski, srpski i slovenski, kroz primjer sastavljanja tematskog korpusa iz digitalnih medija (novinski članci i komentari čitatelja). Nakon pregleda tipova korpusa, korištenja i prednosti u društvenim znanostima i digitalnim humanističkim znanostima, predstavljamo mogućnosti sastavljanja korpusa u južnoslavenskim jezičnim kontekstima, uključujući opcije preuzimanja podataka s mreže, dozvola i etičkih pitanja, čimbenika koji olakšavaju ili otežavaju automatizirano prikupljanje i označavanje korpusa i mogućnosti obrade. Studija otkriva sve veće mogućnosti za rad s danim jezicima, ali i neka uporno siva područja u kojima istraživači trebaju donositi odluke na temelju istraživačkih očekivanja. Općenito, rad ima za cilj rekapitulirati vlastito iskustvo sastavljanja korpusa u širem kontekstu južnoslavenske korpusne lingvistike i korpusnih lingvističkih pristupa u humanističkim znanostima općenito.
In this paper we present an approach to bootstrap a Croatian-Slovene bilingual lexicon from comparable news corpora from scratch, without relying on any external bilingual knowledge resource. Instead ...of using a dictionary to translate context vectors, we build a seed lexicon from identical words in both languages and extend it with context-based cognates and translation candidates of the most frequent words. By enlarging the seed dictionary for only 7% we were able to improve the baseline precision from 0.597 to 0.731 on the mean reciprocal rank for the ten top-ranking translation candidates with a 50.4% recall on the gold standard of 500 entries.
This research is the first step towards developing a system for translating Croatian weather forecasts into multiple languages. This step deals with the Croatian-English language pair. The parallel ...corpus consists of a one-year sample of the weather forecasts for the Adriatic, consisting of 7,893 sentence pairs. Evaluation is performed by the automatic evaluation measures BLUE, NIST and METEOR, as well as by manually evaluating a sample of 200 translations. We have shown that with a small-sized training set and the state-of-the art Moses system, decoding can be done with 96% accuracy concerning adequacy and fluency. Additional improvement is expected by increasing the training set size. Finally, the correlation of the recorded evaluation measures is explored. Adapted from the source document
This paper provides an overview of the research and development activities carried out to alleviate the language resources' bottleneck in machine translation within the Abu-MaTran project. We have ...developed a range of tools for the acquisition of the main resources required by the two most popular approaches to machine translation, i.e. statistical (corpora) and rule-based models (dictionaries and rules). All these tools have been released under open-source licenses and have been developed with the aim of being useful for industrial exploitation.
The paper presents the Parlameter corpus of contemporary Slovene parliamentary proceedings, which covers the VIIth mandate of the Slovene Parliament (2014–2018). The Parlameter corpus offers rich ...speaker metadata (gender, age, education, party affiliation) and is linguistically annotated (lemmatization, tagging), which boost research in several digital humanities and social sciences disciplines. We demonstrate the potential of the corpus analysis techniques for investigating political debates. The corpus architecture allows for regular extensions of the corpus with additional Slovene data, as well as data from other parliaments, starting with Croatian and Bosnian.
Comparing Measures of Semantic Similarity Ljubesic, N.; Boras, D.; Bakaric, N. ...
ITI 2008 - 30th International Conference on Information Technology Interfaces,
2008-June, 2008
Conference Proceeding
Odprti dostop
The aim of this paper is to compare different methods for automatic extraction of semantic similarity measures from corpora. The semantic similarity measure is proven to be very useful for many tasks ...in natural language processing like information retrieval, information extraction, machine translation etc. Additionally, one of the main problems in natural language processing is data sparseness since no language sample is large enough to seize all possible language combinations. In our research we experiment with four different measures of association with context and eight different measures of vector similarity. The results show that the Jensen-Shannon divergence and L1 and L2 norm outperform other measures of vector similarity regardless of the measure of association with context used. Maximum likelihood estimate and t-test show better results than other measures of association with context.