The dataset contains 71 875 front page articles from the print Hungarian dailies, Magyar Nemzet and Népszabadság. This dataset is used in the 12th chapter of the textbook (
It is a data.frame
, with 71 875 observation, 5 variables:
A unique document id
The source file names. The syntax: daily_year_month_day_nr.txt
The Comparative Agendas Project majortopic coding for the article
The pre-processed article texts
Indicating the article source. Either "NOL" for Népszabadság, or "MNO" for "Magyar Nemzet".
Sebők, Miklós, and Zoltán Kacsuk (2021). The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach.. Political Analysis, 29(2): 236-249.