Part of Speech data created with Magyarlanc — data_parlspeech

The dataset is the result of a part of speech analysis conducted with the Magyarlanc tool on a sample of 25 Hungarian parliamentary speeches. It is used in the 11th chapter of the textbook (https://tankonyv.poltextlab.com/nlp-ch.html).

data_parlspeech_magyarlanc

Format

It is a data.frame, with 17 870 observation, 4 variables:

token: The token created by magyarlanc.
lemma: The lemma created from the tokens by magyarlanc
POS_tag: The part of speech tag indicating the position of the token in the text.
morfologic_features: The morfologic features of the tokens

Source

https://cap.tk.hu/en/dataoverview

References

Zsibrita, János, Veronika Vincze, and Richárd Farkas (2013). Magyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP, 2013: 763–71.