The dataset contains 71 875 front page articles from the print Hungarian dailies, Magyar Nemzet and Népszabadság. This dataset is used in the 12th chapter of the textbook (https://tankonyv.poltextlab.com/oszt%C3%A1lyoz%C3%A1s-%C3%A9s-fel%C3%BCgyelt-tanul%C3%A1s.html).

data_nol_mno_clean

Format

It is a data.frame, with 71 875 observation, 5 variables:

row_number

A unique document id

filename

The source file names. The syntax: daily_year_month_day_nr.txt

majortopic_code

The Comparative Agendas Project majortopic coding for the article

text

The pre-processed article texts

corpus

Indicating the article source. Either "NOL" for Népszabadság, or "MNO" for "Magyar Nemzet".

Source

https://cap.tk.hu/en/dataoverview

References

Sebők, Miklós, and Zoltán Kacsuk (2021). The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach.. Political Analysis, 29(2): 236-249.