Metode BERTopic dan LDA untuk Analisis Tren Penelitian Bidang Ilmu Komputer

Nursyahrina; Sarjon Defit; Rini Sovia

doi:10.35134/komtekinfo.v11i4.580

Authors

Nursyahrina Universitas Putra Indonesia YPTK Padang https://orcid.org/0000-0003-1513-8785
Sarjon Defit Universitas Putra Indonesia YPTK Padang
Rini Sovia Universitas Putra Indonesia YPTK Padang

DOI:

https://doi.org/10.35134/komtekinfo.v11i4.580

Keywords:

Ilmu Komputer, Tren Penelitian, Topic Modeling, LDA, BERTopic

Abstract

Ilmu Komputer merupakan disiplin ilmu yang berkembang pesat, dengan jumlah publikasi penelitian yang meningkat secara signifikan dalam lima tahun terakhir. Namun, analisis tren penelitian di bidang ini masih terbatas, sehingga penting untuk mengidentifikasi topik-topik penelitian dominan dan memahami dinamika perkembangannya. Penelitian ini bertujuan untuk menganalisis topik dan tren penelitian di bidang Ilmu Komputer dengan menggunakan dua metode topic modeling, yaitu Latent Dirichlet Allocation (LDA) dan BERTopic. Data yang digunakan terdiri dari metadata artikel penelitian yang diperoleh dari situs Emerald Insight, dengan total 4.892 data pada periode publikasi 2019-2023. Penelitian ini menerapkan LDA dan BERTopic untuk mengidentifikasi dan mengelompokkan topik-topik penelitian berdasarkan teks judul dan abstrak. Metode BERTopic yang berbasis embedding menghasilkan coherence score tertinggi sebesar 0,49 pada model dengan kombinasi TruncatedSVD-KMeans yang mengidentifikasi 13 topik, sementara LDA menghasilkan coherence score tertinggi sebesar 0,42 pada model yang menggunakan teknik ekstraksi fitur Bag-of-Words (BoW) dengan 11 topik. Hasil penelitian ini menunjukkan bahwa BERTopic lebih unggul dalam menghasilkan topik-topik yang lebih koheren dan relevan dibandingkan LDA, berkat kemampuannya dalam mempertahankan konteks semantik antar kata dalam dokumen. Analisis tren menggunakan model BERTopic mengungkapkan dinamika tren penelitian dalam Ilmu Komputer selama lima tahun terakhir, di mana penelitian terkait analitik bisnis dan pemasaran, dan teknologi blockchain menunjukkan pertumbuhan konsisten dengan rata-rata peningkatan sebesar 20% per tahun. Sebaliknya, topik-topik seperti VR dan teknik prediksi menunjukkan fluktuasi yang signifikan. Secara keseluruhan, fokus penelitian bergerak menuju analitik bisnis, blockchain, IoT, dan teknik prediksi seperti deep learning, sementara topik tradisional seperti manajemen proyek mengalami penurunan atau pertumbuhan yang lebih lambat. Penelitian ini memberikan kontribusi penting dalam memahami perkembangan tren penelitian di bidang Ilmu Komputer dan dapat menjadi acuan dalam perencanaan penelitian di masa depan.

References

Shu, X., & Ye, Y. (2023). Knowledge discovery: Methods from data mining and machine learning. Social Science Research, 110, 102817. https://doi.org/10.1016/j.ssresearch.2022.102817

Jansevskis, M., & Osis, K. (2023). Knowledge discovery frameworks and characteristics. Baltic Journal of Modern Computing, 11(4), 686–702. https://doi.org/10.22364/bjmc.2023.11.4.08

Głowania, S., Kozak, J., & Juszczuk, P. (2023). Knowledge discovery in databases for a football match result. Electronics, 12(12), 2712. https://doi.org/10.3390/electronics12122712

Palacios, C. A., Reyes-Suárez, J. A., Bearzotti, L. A., Leiva, V., & Marchant, C. (2021). Knowledge discovery for higher education student retention based on data mining: Machine learning algorithms and case study in Chile. Entropy, 23(4), 485. https://doi.org/10.3390/e23040485

Siino, M., Tinnirello, I., & La Cascia, M. (2024). Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on transformers and traditional classifiers. Information Systems, 121, 102342. https://doi.org/10.1016/j.is.2023.102342

Aleqabie, H. J., Sfoq, M. S., Albeer, R. A., & Abd, E. H. (2024). A review of text mining techniques: Trends, and applications in various domains. Iraqi Journal for Computer Science and Mathematics. https://doi.org/10.52866/ijcsm.2024.05.01.009

Birunda, S. S., & Devi, R. K. (2021). A review on word embedding techniques for text classification. In Innovative Data Communication Technologies and Application: Proceedings of ICIDCA 2020 (pp. 267–281). Springer. https://doi.org/10.1007/978-981-15-9651-3_23

Khairunnisa, S., Adiwijaya, A., & Faraby, S. A. (2021). Pengaruh text preprocessing terhadap analisis sentimen komentar masyarakat pada media sosial Twitter (Studi kasus pandemi COVID-19). Jurnal Media Informatika Budidarma, 5(2), 406. https://doi.org/10.30865/mib.v5i2.2835

de Lima, B. C., Baracho, R. M. A., Mandl, T., & Porto, P. B. (2023). Reactions to science communication: Discovering social network topics using word embeddings and semantic knowledge. Social Network Analysis and Mining, 13(1). https://doi.org/10.1007/s13278-023-01125-5

Niroomand, K., Saady, N. M. C., Bazan, C., Zendehboudi, S., Soares, A., & Albayati, T. M. (2023). Smart investigation of artificial intelligence in renewable energy system technologies by natural language processing: Insightful pattern for decision-makers. Engineering Applications of Artificial Intelligence, 126, 106848. https://doi.org/10.1016/j.engappai.2023.106848

Takacs, V., & O’Brien, C. D. (2023). Trends and gaps in biodiversity and ecosystem services research: A text mining approach. Ambio, 52(1), 81–94. https://doi.org/10.1007/s13280-022-01776-2

Scopus. (n.d.). Scopus. Elsevier. https://www.scopus.com/ (accessed 21 August 2024).

Qin, H., Zeng, J., & Ma, X. (2021). Trend analysis of research direction in computer science based on Microsoft academic graph. In Proceedings of ACM International Conference (Vol. Part F168982). Association for Computing Machinery. https://doi.org/10.1145/3448734.3450470

Samsir, Saragih, R. S., Subagio, S., Aditya, R., & Watrianthos, R. (2023). BERTopic modeling of natural language processing abstracts: Thematic structure and trajectory. Jurnal Media Informatika Budidarma, 7(3), 1514–1520. https://doi.org/10.30865/mib.v7i3.6426

Akbarighatar, P., Pappas, I., & Vassilakopoulou, P. (2023). A sociotechnical perspective for responsible AI maturity models: Findings from a mixed-method literature review. International Journal of Information Management Data Insights, 3(2), 100193. https://doi.org/10.1016/j.jjimei.2023.100193

Yu, D., Fang, A., & Xu, Z. (2023). Topic research in fuzzy domain: Based on LDA topic modelling. Information Sciences, 648, 119600. https://doi.org/10.1016/j.ins.2023.119600

Jung, Y. J., & Kim, Y. (2023). Research trends of sustainability and marketing research, 2010–2020: Topic modeling analysis. Heliyon, 9(3), e14208. https://doi.org/10.1016/j.heliyon.2023.e14208

Kukushkin, K., Ryabov, Y., & Borovkov, A. (2022). Digital twins: A systematic literature review based on data analysis and topic modeling. Data, 7(12), 173. https://doi.org/10.3390/data7120173

Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of topic modelling approaches in the banking context. Applied Sciences, 13(2), 797. https://doi.org/10.3390/app13020797

Widyaningsih, T. W., Dewi, M. A., & Andrianingsih, A. (2021). Analisis bibliometrik untuk memetakan tren penelitian COVID-19 dalam topik ilmu komputer. Techno. Com, 20(3), 440–454. https://doi.org/10.33633/tc.v20i3.4593

Kumar, A. N., Raj, R. K., Aly, S. G., Anderson, M. D., Becker, B. A., Blumenthal, R. L., Eaton, E., et al. (2024). Computer Science Curricula 2023. Association for Computing Machinery. https://doi.org/10.1145/3664191

Gan, J., & Qi, Y. (2021). Selection of the optimal number of topics for LDA topic model—Taking patent policy analysis as an example. Entropy, 23(10), 1301. https://doi.org/10.3390/e23101301

Faizah, & Lin, B. S. (2023). Visualizing change and correlation of topics with LDA and agglomerative clustering on COVID-19 vaccine tweets. IEEE Access, 11, 51647–51656. https://doi.org/10.1109/ACCESS.2023.3278979

Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. https://doi.org/10.48550/arXiv.2203.05794

Mendonça, M., & Figueira, Á. (2024). Topic extraction: BERTopic’s insight into the 117th Congress’s Twitterverse. Informatics, 11(1), 8. https://doi.org/10.3390/informatics11010008

Lu, C., Zhu, L., Xie, Y., Xu, W., Zhao, Y., & Cao, Y. (2024). Analysis of hot topics and evolution of research in world-class agricultural universities based on BERTopic. Applied Mathematics and Nonlinear Sciences, 9(1). https://doi.org/10.2478/amns-2024-0327

Khadija, M. A., & Nurharjadmo, W. (2024). Enhancing Indonesian customer complaint analysis: LDA topic modelling with BERT embeddings. Sinergi, 28(1), 153–162. https://doi.org/10.22441/sinergi.2024.1.015

Hananto, V.R. (2023). Implementation of Dynamic Topic Modeling to Discover Topic Evolution on Customer Reviews. Jurnal Online Informatika, 8(2), 147–157, https://doi.org/10.15575/join.v8i2.963