Piroska Lendvai


Ph.D., research fellow in the Department of Language Technology


Fields of research

  • Text mining

  • Semantic technologies

  • Machine learning

  • Digital humanities







PhD, Computational Linguistics, Tilburg University, The Netherlands

Dissertation: Extracting Information from Spoken User Input. A Machine Learning Approach



Janus Pannonius University, Pécs, Hungary, Faculty of Arts: MA in Philology

English Language and Literature

Russian Language and Literature

Thesis: Disambiguation of Noun Sequences in Brief News Items and Its Use in Machine Translation




2010-               research fellow, Research Institute for Linguistics, Budapest, Hungary

2005-2009        postdoc, Tilburg University, NL

2001-2004        PhD student, Tilburg University, NL


Participation in national and international projects



2011-               CKCC: Circulation of Knowledge and learned practices in the 17th-century Dutch Republic. A Web-based Humanities Collaboratory

Building web-based tools to analyze and visualize the 17th-century epistolary networks and their themes of interest, and to enrich this corpus with annotations.


2011-                INNET: Innovative Networking in Infrastructure for Endangered Languages

Reinforcing and extending an innovative, worldwide and sustainable grid of digital archives, and to disseminate state-of-the-art language technology from the CLARIN realm


2011-                CESAR Central and South-East European Language Resources - EU CIP-ICT-PSP.2010.6.1

Contributing to an open linguistic infrastructure by enhancing, upgrading, standardizing and cross-linking national language resources in the South-East European region


2010-2011        CLARIN Common Language Resources and Technology infrastructure- EU FP7 INFRA-2007-2.2.01

Establishing an integrated and interoperable research infrastructure of language resources and its technology to enable eHumanities


2009-                AMICUS networking project: Automated Motif dIscovery in CUltural heritage and Scientific communication texts

Language technology research on digital cultural heritage and scientific publishing texts


2007-2009         MITCH project: Mining Information from Texts of Cultural Heritage

Text mining on cultural heritage data, such as museum databases


2005-2007         ROLAQUAD project: Robust language understanding in question answering dialogues

Development of semantic resources for medical QA


2001-2004         PhD project: Learning to communicate - Applying machine learning to dialogue strategies

Understanding human-machine dialogues in spoken natural language by machine learning modeling




English, Russian, Dutch –  fluent

German, French – reading






  • T. Declerck, P. Lendvai, K. Mörth, G. Budin, T. Váradi (2012). Towards Linked Language Data for Digital Humanities. In: Linked Data in Linguistics Workshop, Frankfurt/Main, Germany, 7-9 March 2012.

  • T. Declerck, P. Lendvai, N. Koleva, T. Váradi (To appear). Integration of Ontological Semantic Resources in NooJ. In: Proceedings of the 2011 International NooJ Conference. Cambridge Scholars Publishing.

  • B. Ehmann, P. Lendvai, T. Pólya, O. Vincze, M. Miháltz, L. Tihanyi, T. Váradi, J. László (To appear). Narrative Psychological Application of Semantic Role Labeling. In: Proceedings of the 2011 International NooJ Conference. Cambridge Scholars Publishing.



  • P. Lendvai (2011). Towards a Discourse-driven Taxonomic Inference Model. In: Bouma, G. and van den Bosch, A. (eds.) Interactive Multi-modal Question Answering, Pages 255-274. Springer, 2011

  • T. Declerck, P. Lendvai (2011). Linguistic and Semantic Representation of the Thompson's Motif-Index of Folk-Literature. In: S. Gradmann, F. Borri, C. Meghini, H. Schuldt (eds.) Research and Advanced Technology for Digital Libraries - International Conference on Theory and Practice of Digital Libraries. Lecture Notes in Computer Science number 6966, Berlin, Germany, Springer, 9/2011

  • T. Declerck, A. Scheidel, P. Lendvai (2011). Proppian Content Descriptors in an Integrated Annotation Schema for Fairy Tales. In: Language Technology for Cultural Heritage. Selected Papers from the LaTeCH Workshop Series. Theory and Applications of Natural Language Processing, Pages 155-169, Springer, Heidelberg, 2011 

  • M. van Erp, A. van den Bosch, S. Hunt, M. van der Meij, R. Dekker, P. Lendvai (2011). Natural Selection - Finding Specimens in a Natural History Collection. In: Changing Diversity in a Changing Environment. ISBN 978-953-307-796-3

  • K. Mörth, T. Declerck, P. Lendvai, T. Váradi (2011). Accessing Multilingual Data on the Web for the Semantic Annotation of Cultural Heritage Texts. In: E. Montiel-Ponsoda, J. McCrae, P. Buitelaar, P. Cimiano (eds.) Proceedings of the 2nd International Workshop on the Multilingual Semantic Web, Bonn, Germany, Springer, 23 Oct. 2011

  • T. Declerck, P. Lendvai, T. Wunner (2011). Linguistic and Semantic Features of Textual Labels in Knowledge Representation Systems. In: Harry Bunt (ed.) Proccedings of the Sixth Joint ISO - ACL/SIGSEM Workshop on Interoperable Semantic Annotation, Oxford, United Kingdom, ACL-SIGSEM, 11-12 Jan. 2011

  • P. Lendvai, T. Váradi, S. Darányi, T. Declerck (2011). Assignment of Character and Action Types in Folk Tales. In: Proc. of the 2010 NooJ conference, Komotini, Greece, 27-29 May 2010.

  • B. Ehmann, P. Lendvai, A. Fritz, M. Miháltz, L. Tihanyi (2011). Szemantikus szerepek vizsgálata magyar nyelvű szövegek narratív pszichológiai elemzésében [Investigation of thematic roles in Hungarian narrative psychological analysis]. In: Proc. of Hungarian Computational Linguistics Conference / MSZNY, Szeged, 1-2 Dec 2011.

  • E. Simon, P. Lendvai, G. Németh, G. Olaszy, K. Vicsi (2011). Languages in the European Information Society: Hungarian. META-NET White Paper Series.  (Available both in English and Hungarian)

  • K. Zervanou and P. Lendvai (Eds.), Proceedings of the Fifth ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH-2011), Portland, OR, USA.



  • Declerck, T. and Lendvai, P. (2010) A Three-Level Representation Model for supporting the Interoperability of Conceptual, Terminological and Linguistic Resources. In: Proceedings of ICGL2010: The Second International Conference on Global Interoperability for Language Resources. 18-20 January

  • Lendvai, P., T. Váradi, M. Wynne, Y. Berglund. (2010) Humanities and Social Sciences Organizations, Initiatives and Projects Report. CLARIN Deliverable: D3C-3.2.

  • Lendvai, P., T. Declerck, S. Darçnyi, S. Malec. (2010) Propp Revisited: Integration of Linguistic Markup into Structured Content Descriptors of Tales. Digital Humanities 2010, London, United Kingdom, Oxford University Press, 7/2010.

  • Declerck, T., Lendvai, P. (2010)  Towards a standardized linguistic annotation of the textual content of labels in Knowledge Representation Systems. LREC 2010. In: Proceedings of the seventh international conference on Language Resources and Evaluation, Valetta, Malta, ELRA, 2010.

  • Declerck, T., K. Eckart, P. Lendvai, L. Romary, T. Zastrow (2010). Towards a Standardised Linguistic Annotation of Fairy Tales. In: Proc. of the LRT standards workshop at LREC-2010.

  • Lendvai, P., Declerck, T., S. Darányi, P. Gervás, R. Hervás, S. Malec, F. Peinado. (2010)  Integration of Linguistic Markup into Semantic Models of Folk Narratives: The Fairy Tale Use Case. In: Proceedings of the Seventh International conference on Language Resources and Evaluation, Pages 1996-2001, Valetta, Malta, European Language Resources Association (ELRA).

  • Declerck, T.,  Scheidel, A., Lendvai, P. (2010)  Proppian Content Descriptors in an Augmented Annotation Schema for Fairy Tales. In: C. Sporleder, K. Zervanou (eds). Proceedings of the ECAI 2010 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, Lisbon, Portugal, IOS Press, European Coordinating Committee for Artificial Intelligence -- ECAI, 8/2010.

  • Darányi, S., Lendvai, P. (eds.). (2010) Proceedings of the First International AMICUS Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts. Vienna, Austria, University of Szeged, Hungary.

  • Lendvai, P. (2010) Granularity Perspectives on Modeling Humanities Concepts.  In: S. Darányi, P. Lendvai, (eds.). First International AMICUS Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts, Vienna, Austria. University of Szeged, Hungary.



  • P. Lendvai (2009). Towards Acquisition of Taxonomic Inference. In: Proc. of Eighth International Conference on Computational Semantics (IWCS-09).

  • M. van Erp, P. Lendvai, A. van den Bosch (2009). Comparing Alternative Data-Driven Ontological Vistas of Natural History. In: Proc. of Eighth International Conference on Computational Semantics (IWCS-09).

  • P. Lendvai and L. Borin (Eds.) (2009). Proceedings of the EACL Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education. Athens.

  • van den Bosch, P. Lendvai, M. van Erp, S. Hunt, M. van der Meij, R. Dekker (2009). Weaving a new fabric of natural history. Interdisciplinary Science Reviews, 34:2.

  • T. Declerck and P. Lendvai (2009). Extraction de concepts et relations sémantiques á partir des labels d'ontologies. In: S. Despres, N. Grabar (eds.). Acquisition et modélisation de relations sémantiques, Toulouse, France, IRIT, 11/2009. Association (ELRA), Paris, 5/2010.



  • P. Lendvai (2008). Alignment-based Expansion of Textual Database Fields. In: A. Gelbukh (Ed.), Proceedings of the Computational Linguistics and Intelligent Text Processing 9th International Conference, CICLing 2008. Lecture Notes in Computer Science, Vol. 4919/2008, Berlin / Heidelberg: Springer.

  • P. Lendvai and S. Hunt (2008). From Field Notes towards a Knowledge Base. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC-08). Marrakech, Morocco.

  • M. van Erp, P. Lendvai, A. van den Bosch, S. Hunt (2008). Automatic Ontology Construction for Improved Access to Taxonomic Databases. At: International Workshop on Distributed Sensing and Collective Intelligence in Biodiversity Monitoring. Amsterdam.



  • P. Lendvai and J. Geertzen (2007). Token-based chunking of turn-internal dialogue acts. SigDial-07, Antwerp, Belgium.



  • P. Lendvai, A. van den Bosch (2005). Robust ASR lattice representation types in pragma-semantic processing of spoken input. In Proc. of the AAAI Spoken Language Understanding Workshop, SLU-05 (pp. 15-22), Pittsburgh, PA.

  • P. Lendvai (2005). Conceptual taxonomy identification in medical documents. In Proc. of The Second International Workshop on Knowledge Discovery and Ontologies (KDO-05), held within ECML/PKDD (pp. 31-38), Porto, Portugal.

  • W. Daelemans, A. van den Bosch, S. Canisius, P. Lendvai (2005). Robust Language Understanding in Dutch Question Answering Dialogues. In Proc. of SIREN-05, the Scientific ICT Research Event Netherlands, Technische Universiteit Eindhoven.

  • P. Lendvai (2005). Taxonùmia felismerÄse dokumentumszerkezetb?l [Taxonomy detection from document structure]. In Proc. of Computational Linguistics in Hungary Conference (Magyar Szamítógépes NyelvÄszeti Konferencia, MSZNY-2005, pp. 88-95), Szeged, Hungary.



  • P. Lendvai, A. van den Bosch, E. Krahmer, S. Canisius (2004). Memory-based robust interpretation of recognised speech. In Proc. of SPECOM-04, 9th International Conference ╘Speech and Computer╒ (pp. 415-422), St. Petersburg, Russia.

  • P. Lendvai (2004). Extracting information from spoken user input. A machine learning approach. Ph.D. thesis, Tilburg University, Netherlands.



  • P. Lendvai, A. van den Bosch, E. Krahmer (2003). Machine learning for shallow interpretation of user utterances in spoken dialogue systems. In Proc. of EACL-03 Workshop on Dialogue Systems: interaction, adaptation and styles of management (pp. 69-78), Budapest.

  • P. Lendvai (2003). Learning to identify fragmented words in spoken discourse. In Proceedings of EACL-03 Student Research Workshop (pp. 25-32), Budapest.

  • P. Lendvai and L. Maruster (2003). Process discovery for evaluating dialogue strategies. In Proc. of ISCA Workshop on Error Handling in Spoken Dialogue Systems (pp. 119-122), Chateau d'Oex-Vaud, Switzerland.

  • P. Lendvai, A. van den Bosch, E. Krahmer (2003). Memory-based disfluency chunking. In Proc. of DISS-03, Disfluency in Spontaneous Speech Workshop, Goeteborg, Sweden.



  • P. Lendvai, A. van den Bosch, E. Krahmer, M. Swerts (2002). Improving machine-learned detection of miscommunications in human-machine dialogues through informed data splitting. In Kuebler, S. & Hinrichs, E. (Eds.), Machine Learning Approaches in Computational Linguistics. Trento, Italy: ESSLLI. 

  • P. Lendvai, A. van den Bosch, E. Krahmer, M. Swerts (2002). Multi-feature error detection. In Theune, M., Nijholt, A. & Hondorp, H. (Eds.), Language and Computers: Studies in Practical Linguistics (pp. 163-178). Amsterdam: Rodopi.



