The dataset is the result of a part of speech analysis conducted with the Szeged ner tool on a sample of 25 Hungarian parliamentary speeches. It is used in the 11th chapter of the textbook (https://tankonyv.poltextlab.com/nlp-ch.html).

data_parlspeech_szner

Format

It is a data.frame, with 17 874 observation, 4 variables:

token

The token created by magyarlanc.

ner

The part of speech tag indicating the position of the token in the text.

Source

https://cap.tk.hu/en/dataoverview

References

Szarvas, György, Richárd Farkas, and András Kocsor (2006). A Multilingual Named Entity Recognition System Using Boosting and C4. 5 Decision Tree Learning Algorithms. In International Conference on Discovery Science, Springer, 267–78.