data_nol_mno_clean.Rd
The dataset contains 71 875 front page articles from the print Hungarian dailies, Magyar Nemzet and Népszabadság. This dataset is used in the 12th chapter of the textbook (https://tankonyv.poltextlab.com/oszt%C3%A1lyoz%C3%A1s-%C3%A9s-fel%C3%BCgyelt-tanul%C3%A1s.html).
data_nol_mno_clean
It is a data.frame
, with 71 875 observation, 5 variables:
A unique document id
The source file names. The syntax: daily_year_month_day_nr.txt
The Comparative Agendas Project majortopic coding for the article
The pre-processed article texts
Indicating the article source. Either "NOL" for Népszabadság, or "MNO" for "Magyar Nemzet".
https://cap.tk.hu/en/dataoverview
Sebők, Miklós, and Zoltán Kacsuk (2021). The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach.. Political Analysis, 29(2): 236-249.