Advancements in News Article Classification: Approaches in Machine Learning and Deep Learning across Sports, Entertainment, Politics, Business, and Weather Domains

Authors

DOI:

https://doi.org/10.21015/vtcs.v11i2.1654

Abstract

The classification of news articles is a crucial technology for processing news information, aiding in the organization of information. It is challenging to classify news due to the continuous emergence of news that requires processing. The modern technological era has reshaped traditional lifestyles in various domains. Similarly, the medium of publishing news and events has experienced rapid growth with the advancement of Information Technology. In this research, news article classification is organized into five selected domains: sports, entertainment, politics, business, and weather news. The classification involves both common and uncommon approaches, along with datasets based on Machine Learning and Deep Learning techniques. Furthermore, the evaluation incorporates various metrics such as precision, recall, and accuracy to compare approaches across the selected five news domains with datasets. To narrow the focus, we limited the news categorization to a few domains (sports, entertainment, politics, business, and weather) to facilitate a better understanding of a large amount of data through concise content. We recommend our work to individuals interested in extending and building upon my research over time.

References

{1}

A. K. Visvam Devadoss, V. R. Thirulokachander, and A. K. Visvam Devadoss, “Efficient daily news platform generation using natural language processing,” Int. J. Inf. Technol., vol. 11, no. 2, pp. 295–311, 2019, doi: 10.1007/s41870-018-0239-4.

{2}

R. Kusumaningrum, M. I. A. Wiedjayanto, S. Adhy, and Suryono, “Classification of Indonesian news articles based on Latent Dirichlet Allocation,” Proc. 2016 Int. Conf. Data Softw. Eng. ICoDSE 2016, pp. 1–5, 2017, doi: 10.1109/ICODSE.2016.7936106.

{3}

S. MAHAJAN, “News Classification Using Machine Learning,” Int. J. Recent Innov. Trends Comput. Commun., vol. 9, no. 5, pp. 23–27, 2021, doi: 10.17762/ijritcc.v9i5.5464.

{4}

I. C. Irsan and M. L. Khodra, “Hierarchical multi-label news article classification with distributed semantic model based features,” Int. J. Adv. Intell. Informatics, vol. 5, no. 1, pp. 40–47, 2019, doi: 10.26555/ijain.v5i1.168.

{5}

J. Domala et al., “Automated Identification of Disaster News for Crisis Management using Machine Learning and Natural Language Processing,” Proc. Int. Conf. Electron. Sustain. Commun. Syst. ICESC 2020, no. Icesc, pp. 503–508, 2020, doi: 10.1109/ICESC48915.2020.9156031.

{6}

M. R. Alam, A. Akter, M. A. Shafin, M. M. Hasan, and A. Mahmud, “Social media content categorization using supervised based machine learning methods and natural language processing in bangla language,” Proc. 2020 11th Int. Conf. Electr. Comput. Eng. ICECE 2020, pp. 270–273, 2020, doi: 10.1109/ICECE51571.2020.9393095.

{7}

M. I. Asad, M. Abubakar, S. Hussain, N. Hassan, and J. M. Gul, “Classification of News Articles using Supervised Machine Learning Approach,” Hpej.Net, pp. 26–30, 2020, [Online]. Available: https://www.hpej.net/journals/index.php/pakjet/article/view/589.

{8}

F. Miao, P. Zhang, L. Jin, and H. Wu, “Chinese News Text Classification Based on Machine Learning Algorithm,” Proc. - 2018 10th Int. Conf. Intell. Human-Machine Syst. Cybern. IHMSC 2018, vol. 2, pp. 48–51, 2018, doi: 10.1109/IHMSC.2018.10117.

{9}

R. Kusumaningrum, M. I. A. Wiedjayanto, S. Adhy, and Suryono, “Classification of Indonesian news articles based on Latent Dirichlet Allocation,” Proc. 2016 Int. Conf. Data Softw. Eng. ICoDSE 2016, pp. 1–5, 2017, doi: 10.1109/ICODSE.2016.7936106.

{10}

A review of machine learning algorithms for web page classification . al, Benlahmar El Habib et. s.l. : IEEE, 2018, IEEE.

{11}

D. Singh and S. Malhotra, “Intra News Category Classification using N-gram TF- IDF Features and Decision Tree Classifier,” no. February, 2020.

{12}

News Classification using Machine Learning Approaches. Suleymanov, Umid. s.l. : IEEE, 2018.

{13}

A Comparative Analysis of Logistic Regression, Random Forest. al., Kanish Shah et. s.l. : Springer, 2020.

{14}

C. Jiang, Y. Li, L. Li, A. Liu, and C. Liu, “News readers’ sentiment analysis based on fused-KNN algorithm,” Proc. - 2019 4th Int. Conf. Comput. Intell. Appl. ICCIA 2019, pp. 21–29, 2019, doi: 10.1109/ICCIA.2019.00012.

{15}

Duong and V. Truong Hoang, “A Survey on the Multiple Classifier for New Benchmark Dataset of Vietnamese News Classification,” 2019 11th Int. Conf. Knowl. Smart Technol. KST 2019, no. 1, pp. 23–28, 2019, doi: 10.1109/KST.2019.8687509.

{16} D. Patel, D. Sanghvi, and M. Shah, "A Comparative Analysis of Logistic Regression, Random Forest, and KNN Models for Text Classification," in Augment. Hum. Res., vol. 5, no. 1, pp. 1–16, 2020, doi: 10.1007/s41133-020-00032-0.

{17}

H. T. Duong and V. Truong Hoang, “A Survey on the Multiple Classifier for New Benchmark Dataset of Vietnamese News Classification,” 2019 11th Int. Conf. Knowl. Smart Technol. KST 2019, no. 1, pp. 23–28, 2019, doi: 10.1109/KST.2019.8687509.

{18}

T. Y. Wu and T. Chang, “Interference Reduction by Millimeter Wave Technology for 5G- Based Green Communications,” IEEE Access, vol. 4, pp. 10228–10234, 2016, doi: 10.1109/ACCESS.2016.2602318.

{19}

S. Selva Birunda and R. Kanniga Devi, “A Novel Score-Based Multi-Source Fake News Detection using Gradient Boosting Algorithm,” Proc. - Int. Conf. Artif. Intell. Smart Syst. ICAIS 2021, pp. 406–414, 2021, doi: 10.1109/ICAIS50930.2021.9395896.

{20}

Audio-visual Broadcast Transcription System Using Artificial Neural Networks.Chaloupka, Josef. s.l. : IEEE, 2021. International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics (ECMSM).

{21}

BCC NEWS Classification Comparison between Naïve Bayes, Support Vector Machine, Recurrent Neural Network . Chandana, N. s.l. : IEEE, 2021. Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV 2021).

{22}

Word Embedding based News Classification by using CNN. al, Faisal Ahmed et. s.l. : IEEE, August 21, 2021, 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM).

{23}

Classification of Bangla News Articles Using Bidirectional Long Short Term Memory. al, Md. Mahmudul Hasan Shahin et. Dhaka, Bangladesh : IEEE, june 07, 2020, IEEE Region 10 Symposium (TENSYMP).

{24}

X. mei Yu, W. zhi Feng, H. Wang, Q. Chu, and Q. Chen, “An attention mechanism and multi-granularity-based Bi-LSTM model for Chinese Q&A system,” Soft Comput., vol. 24, no. 8, pp. 5831–5845, 2020, doi: 10.1007/s00500-019-04367-8.

{25}

B. Ibrokhimov, C. Hur, H. Kim, and S. Kang, “A-DBNF: adaptive deep belief network framework for regression and classification tasks,” Appl. Intell., vol. 51, no. 7, pp. 4199– 4213, 2021, doi: 10.1007/s10489-020-02050-2.

{26}

R. Marzban and C. Crick, “Lifting sequence length limitations of NLP models using autoencoders,” ICPRAM 2021 - Proc. 10th Int. Conf. Pattern Recognit. Appl. Methods, no. Icpram, pp. 228–235, 2021, doi: 10.5220/0010239502280235.

{27}

K. N. Singh, S. D. Devi, H. M. Devi, and A. K. Mahanta, “A novel approach for dimension reduction using word embedding: An enhanced text classification approach,” Int. J. Inf. Manag. Data Insights, vol. 2, no. 1, p. 100061, 2022, doi: 10.1016/j.jjimei.2022.100061.

{28}

Z. Liu, R. Wang, N. Japkowicz, D. Tang, W. Zhang, and J. Zhao, “Research on unsupervised feature learning for Android malware detection based on Restricted Boltzmann Machines,” Futur. Gener. Comput. Syst., vol. 120, pp. 91–108, 2021, doi: 10.1016/j.future.2021.02.015.

{29}

R. Marzban and C. Crick, “Lifting sequence length limitations of NLP models using autoencoders,” ICPRAM 2021 - Proc. 10th Int. Conf. Pattern Recognit. Appl. Methods, no. Icpram, pp. 228–235, 2021, doi: 10.5220/0010239502280235.

{30}

“BBC News Classification,” 2020. https://www.kaggle.com/c/learn-ai-bbc (accessed Sep. 10, 2022).

{31}

“New York Times Comments,” 2018. https://www.kaggle.com/datasets/aashita/nyt- comments (accessed Aug. 14, 2022).

{32}

“Topic Modeling,” 2020. https://www.kaggle.com/code/canggih/topic-modeling/notebook (accessed Sep. 10, 2022).

{33}

“Reuters newswire classification dataset,” Keras, 2017. https://keras.io/api/datasets/reuters/ (accessed Aug. 15, 2022).

{34}

“BBC News Categorization using Embedding,” 2019. https://www.kaggle.com/code/nguyendaitruongthanh/bbc-news-categorization-using- embedding (accessed Sep. 11, 2022).

{35}

Alonso Patron-Perez, “TV Human Interaction Dataset,” 2021. https://www.robots.ox.ac.uk/~alonso/tv human interactions.html (accessed Aug. 10, 2022).

{36}

“News article classification using Naive Bayes,” 2020. https://www.kaggle.com/code/cnokello/news-article-classification-using-naive-bayes (accessed Sep. 12, 2022).

{37}

“TV Commercial Detection Dataset,” 2019. https://www.kaggle.com/datasets/prashant111/tv-news-channel-commercial-detection- dataset (accessed Aug. 10, 2022).

{38}

“Sentiment Analysis using BERT,” 2019. https://www.kaggle.com/code/prakharrathi25/sentiment-analysis-using-bert/notebook (accessed Aug. 12, 2022).

{39}

“News Headlines Dataset For Sarcasm Detection,” 2019. https://www.kaggle.com/datasets/rmisra/news-headlines-dataset-for-sarcasm-detection (accessed Aug. 09, 2022).

{40}

“Headline News Dataset,” 2019. https://www.kaggle.com/datasets/rmisra/news-category- dataset (accessed Aug. 12, 2022).

{41}

“20 Newsgroups,” 2017. https://www.kaggle.com/datasets/crawford/20-newsgroups (accessed Aug. 08, 2022).

{42}

R. Misra, “News Category Dataset,” ResearchGate, 2018.

{43}

“News-classification,” 2021. https://github.com/topics/news-classification (accessed Aug. 08, 2022).

{44}

“A Million News Headlines,” 2022. https://www.kaggle.com/datasets/therohk/million- headlines (accessed Aug. 03, 2022).

{45}

“Multi-News,” 2020. https://paperswithcode.com/dataset/multi-news (accessed Aug. 07, 2022).

{46}

“Ag-news-classification-lstm,” 2018. https://www.kaggle.com/code/ishandutta/ag-news- classification-lstm (accessed Aug. 06, 2022).

{47}

“Sports Dataset(BBC),” 2019. https://www.kaggle.com/datasets/maneesh99/sports- datasetbbc (accessed Aug. 02, 2022).

{48}

“NeuralNews,” 2022. https://paperswithcode.com/dataset/neuralnews (accessed Aug. 06, 2022).

{49}

P. K. Mallick, S. Mishra, and G. S. Chae, “Digital media news categorization using Bernoulli document model for web content convergence,” Pers. Ubiquitous Comput., 2020, doi: 10.1007/s00779-020-01461-9.

{50}

R. H. Patel, R. Patel, S. Patel, and N. Patel, “Detecting Fake News Using Machine Learning,” Lect. Notes Data Eng. Commun. Technol., vol. 101, pp. 613–625, 2022, doi: 10.1007/978-981-16-7610-9 45.

{51}

B. Jang, I. Kim, and J. W. Kim, “Word2vec convolutional neural networks for classification of news articles and tweets,” PLoS One, vol. 14, no. 8, pp. 1–20, 2019, doi: 10.1371/journal.pone.0220976.

Downloads

Published

2023-12-28

How to Cite

Ramzan, S., -, F. J., -, Z., & -, S. (2023). Advancements in News Article Classification: Approaches in Machine Learning and Deep Learning across Sports, Entertainment, Politics, Business, and Weather Domains. VAWKUM Transactions on Computer Sciences, 11(2), 83–97. https://doi.org/10.21015/vtcs.v11i2.1654