The dataset is the result of a part of speech analysis conducted with the Szeged ner tool on a sample of 25 Hungarian parliamentary speeches. It is used in the 11th chapter of the textbook (
It is a data.frame
, with 17 874 observation, 4 variables:
The token created by magyarlanc.
The part of speech tag indicating the position of the token in the text.
Szarvas, György, Richárd Farkas, and András Kocsor (2006). A Multilingual Named Entity Recognition System Using Boosting and C4. 5 Decision Tree Learning Algorithms. In International Conference on Discovery Science, Springer, 267–78.