The dataset contains 35 021 front page articles from the print Hungarian daily, Magyar Nemzet. This dataset is used in the 8th chapter of the textbook (https://tankonyv.poltextlab.com/embedding.html).

data_magyar_nemzet_large

Format

It is a data.frame, with 35 021 observation, 2 variables:

doc_id

A unique document id, the source file name in this case. The syntax is dailyname_year_month_day_nr.txt

text

The unprocessed article text

Source

https://cap.tk.hu/en/dataoverview

References

Sebők, Miklós, and Zoltán Kacsuk (2021). The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach.. Political Analysis, 29(2): 236-249.