Metode BERTopic dan LDA untuk Analisis Tren Penelitian Bidang Ilmu Komputer
DOI:
https://doi.org/10.35134/komtekinfo.v11i4.580Keywords:
Ilmu Komputer, Tren Penelitian, Topic Modeling, LDA, BERTopicAbstract
Ilmu Komputer merupakan disiplin ilmu yang berkembang pesat, dengan jumlah publikasi penelitian yang meningkat secara signifikan dalam lima tahun terakhir. Namun, analisis tren penelitian di bidang ini masih terbatas, sehingga penting untuk mengidentifikasi topik-topik penelitian dominan dan memahami dinamika perkembangannya. Penelitian ini bertujuan untuk menganalisis topik dan tren penelitian di bidang Ilmu Komputer dengan menggunakan dua metode topic modeling, yaitu Latent Dirichlet Allocation (LDA) dan BERTopic. Data yang digunakan terdiri dari metadata artikel penelitian yang diperoleh dari situs Emerald Insight, dengan total 4.892 data pada periode publikasi 2019-2023. Penelitian ini menerapkan LDA dan BERTopic untuk mengidentifikasi dan mengelompokkan topik-topik penelitian berdasarkan teks judul dan abstrak. Metode BERTopic yang berbasis embedding menghasilkan coherence score tertinggi sebesar 0,49 pada model dengan kombinasi TruncatedSVD-KMeans yang mengidentifikasi 13 topik, sementara LDA menghasilkan coherence score tertinggi sebesar 0,42 pada model yang menggunakan teknik ekstraksi fitur Bag-of-Words (BoW) dengan 11 topik. Hasil penelitian ini menunjukkan bahwa BERTopic lebih unggul dalam menghasilkan topik-topik yang lebih koheren dan relevan dibandingkan LDA, berkat kemampuannya dalam mempertahankan konteks semantik antar kata dalam dokumen. Analisis tren menggunakan model BERTopic mengungkapkan dinamika tren penelitian dalam Ilmu Komputer selama lima tahun terakhir, di mana penelitian terkait analitik bisnis dan pemasaran, dan teknologi blockchain menunjukkan pertumbuhan konsisten dengan rata-rata peningkatan sebesar 20% per tahun. Sebaliknya, topik-topik seperti VR dan teknik prediksi menunjukkan fluktuasi yang signifikan. Secara keseluruhan, fokus penelitian bergerak menuju analitik bisnis, blockchain, IoT, dan teknik prediksi seperti deep learning, sementara topik tradisional seperti manajemen proyek mengalami penurunan atau pertumbuhan yang lebih lambat. Penelitian ini memberikan kontribusi penting dalam memahami perkembangan tren penelitian di bidang Ilmu Komputer dan dapat menjadi acuan dalam perencanaan penelitian di masa depan.
References
Shu, X., & Ye, Y. (2023). Knowledge discovery: Methods from data mining and machine learning. Social Science Research, 110, 102817. https://doi.org/10.1016/j.ssresearch.2022.102817
Jansevskis, M., & Osis, K. (2023). Knowledge discovery frameworks and characteristics. Baltic Journal of Modern Computing, 11(4), 686–702. https://doi.org/10.22364/bjmc.2023.11.4.08
Głowania, S., Kozak, J., & Juszczuk, P. (2023). Knowledge discovery in databases for a football match result. Electronics, 12(12), 2712. https://doi.org/10.3390/electronics12122712
Palacios, C. A., Reyes-Suárez, J. A., Bearzotti, L. A., Leiva, V., & Marchant, C. (2021). Knowledge discovery for higher education student retention based on data mining: Machine learning algorithms and case study in Chile. Entropy, 23(4), 485. https://doi.org/10.3390/e23040485
Siino, M., Tinnirello, I., & La Cascia, M. (2024). Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on transformers and traditional classifiers. Information Systems, 121, 102342. https://doi.org/10.1016/j.is.2023.102342
Aleqabie, H. J., Sfoq, M. S., Albeer, R. A., & Abd, E. H. (2024). A review of text mining techniques: Trends, and applications in various domains. Iraqi Journal for Computer Science and Mathematics. https://doi.org/10.52866/ijcsm.2024.05.01.009
Birunda, S. S., & Devi, R. K. (2021). A review on word embedding techniques for text classification. In Innovative Data Communication Technologies and Application: Proceedings of ICIDCA 2020 (pp. 267–281). Springer. https://doi.org/10.1007/978-981-15-9651-3_23
Khairunnisa, S., Adiwijaya, A., & Faraby, S. A. (2021). Pengaruh text preprocessing terhadap analisis sentimen komentar masyarakat pada media sosial Twitter (Studi kasus pandemi COVID-19). Jurnal Media Informatika Budidarma, 5(2), 406. https://doi.org/10.30865/mib.v5i2.2835
de Lima, B. C., Baracho, R. M. A., Mandl, T., & Porto, P. B. (2023). Reactions to science communication: Discovering social network topics using word embeddings and semantic knowledge. Social Network Analysis and Mining, 13(1). https://doi.org/10.1007/s13278-023-01125-5
Niroomand, K., Saady, N. M. C., Bazan, C., Zendehboudi, S., Soares, A., & Albayati, T. M. (2023). Smart investigation of artificial intelligence in renewable energy system technologies by natural language processing: Insightful pattern for decision-makers. Engineering Applications of Artificial Intelligence, 126, 106848. https://doi.org/10.1016/j.engappai.2023.106848
Takacs, V., & O’Brien, C. D. (2023). Trends and gaps in biodiversity and ecosystem services research: A text mining approach. Ambio, 52(1), 81–94. https://doi.org/10.1007/s13280-022-01776-2
Scopus. (n.d.). Scopus. Elsevier. https://www.scopus.com/ (accessed 21 August 2024).
Qin, H., Zeng, J., & Ma, X. (2021). Trend analysis of research direction in computer science based on Microsoft academic graph. In Proceedings of ACM International Conference (Vol. Part F168982). Association for Computing Machinery. https://doi.org/10.1145/3448734.3450470
Samsir, Saragih, R. S., Subagio, S., Aditya, R., & Watrianthos, R. (2023). BERTopic modeling of natural language processing abstracts: Thematic structure and trajectory. Jurnal Media Informatika Budidarma, 7(3), 1514–1520. https://doi.org/10.30865/mib.v7i3.6426
Akbarighatar, P., Pappas, I., & Vassilakopoulou, P. (2023). A sociotechnical perspective for responsible AI maturity models: Findings from a mixed-method literature review. International Journal of Information Management Data Insights, 3(2), 100193. https://doi.org/10.1016/j.jjimei.2023.100193
Yu, D., Fang, A., & Xu, Z. (2023). Topic research in fuzzy domain: Based on LDA topic modelling. Information Sciences, 648, 119600. https://doi.org/10.1016/j.ins.2023.119600
Jung, Y. J., & Kim, Y. (2023). Research trends of sustainability and marketing research, 2010–2020: Topic modeling analysis. Heliyon, 9(3), e14208. https://doi.org/10.1016/j.heliyon.2023.e14208
Kukushkin, K., Ryabov, Y., & Borovkov, A. (2022). Digital twins: A systematic literature review based on data analysis and topic modeling. Data, 7(12), 173. https://doi.org/10.3390/data7120173
Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of topic modelling approaches in the banking context. Applied Sciences, 13(2), 797. https://doi.org/10.3390/app13020797
Widyaningsih, T. W., Dewi, M. A., & Andrianingsih, A. (2021). Analisis bibliometrik untuk memetakan tren penelitian COVID-19 dalam topik ilmu komputer. Techno. Com, 20(3), 440–454. https://doi.org/10.33633/tc.v20i3.4593
Kumar, A. N., Raj, R. K., Aly, S. G., Anderson, M. D., Becker, B. A., Blumenthal, R. L., Eaton, E., et al. (2024). Computer Science Curricula 2023. Association for Computing Machinery. https://doi.org/10.1145/3664191
Gan, J., & Qi, Y. (2021). Selection of the optimal number of topics for LDA topic model—Taking patent policy analysis as an example. Entropy, 23(10), 1301. https://doi.org/10.3390/e23101301
Faizah, & Lin, B. S. (2023). Visualizing change and correlation of topics with LDA and agglomerative clustering on COVID-19 vaccine tweets. IEEE Access, 11, 51647–51656. https://doi.org/10.1109/ACCESS.2023.3278979
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. https://doi.org/10.48550/arXiv.2203.05794
Mendonça, M., & Figueira, Á. (2024). Topic extraction: BERTopic’s insight into the 117th Congress’s Twitterverse. Informatics, 11(1), 8. https://doi.org/10.3390/informatics11010008
Lu, C., Zhu, L., Xie, Y., Xu, W., Zhao, Y., & Cao, Y. (2024). Analysis of hot topics and evolution of research in world-class agricultural universities based on BERTopic. Applied Mathematics and Nonlinear Sciences, 9(1). https://doi.org/10.2478/amns-2024-0327
Khadija, M. A., & Nurharjadmo, W. (2024). Enhancing Indonesian customer complaint analysis: LDA topic modelling with BERT embeddings. Sinergi, 28(1), 153–162. https://doi.org/10.22441/sinergi.2024.1.015
Hananto, V.R. (2023). Implementation of Dynamic Topic Modeling to Discover Topic Evolution on Customer Reviews. Jurnal Online Informatika, 8(2), 147–157, https://doi.org/10.15575/join.v8i2.963
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Jurnal Komtekinfo

This work is licensed under a Creative Commons Attribution 4.0 International License.