data_parlspeech_magyarlanc.Rd
The dataset is the result of a part of speech analysis conducted with the Magyarlanc tool on a sample of 25 Hungarian parliamentary speeches. It is used in the 11th chapter of the textbook (https://tankonyv.poltextlab.com/nlp-ch.html).
data_parlspeech_magyarlanc
It is a data.frame
, with 17 870 observation, 4 variables:
The token created by magyarlanc.
The lemma created from the tokens by magyarlanc
The part of speech tag indicating the position of the token in the text.
The morfologic features of the tokens
https://cap.tk.hu/en/dataoverview
Zsibrita, János, Veronika Vincze, and Richárd Farkas (2013). Magyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP, 2013: 763–71.