Large Language Model Method as a Translator Indonesian Into SQL Language

Authors

  • Candra Putra Universitas Islam Negeri Syekh Ali Hasan Ahmad Addary
  • Syafri Arlis
  • Gunadi Widi Nurcahyo Universitas Putra Indonesia YPTK

DOI:

https://doi.org/10.35134/komtekinfo.v12i3.658

Keywords:

NLP, LLM, Indonesian, Information System, Python

Abstract

The development of information technology has encouraged the massive implementation of information systems and web-based applications in various sectors, including in the academic environment. However, one of the challenges that are still often faced is the difficulty in extracting or mining information from databases flexibly without having to create additional report modules or write SQL code manually. This problem becomes an obstacle for non-technical users, such as administrative staff or lecturers, who need certain data quickly from academic information systems. In this paper, it is intended to convert Indonesian commands into SQL queries automatically, without the need to add additional programming code. Along with advances in Natural Language Processing (NLP) and Machine Learning technology with the Large Language Model (LLM) method, there is now a new approach that allows users to interact with databases only through commands in natural language. The case study was conducted on the Academic Information System of UIN Padangsidimpuan using a dataset of 1,500 student data. The focus of the research is on the type of Data Query Language (DQL) query in Indonesian form, which is then translated by the model into a SQL command to obtain the desired data. The results showed that this approach was able to achieve results with a Rouge1 conversion precision rate from 0.03 to 0.89. This shows that the integration of LLM technology in academic information systems has great potential in improving data accessibility, operational efficiency, and supporting data-driven decision-making faster and more intuitively, especially for users who do not have a technical background.

References

Setiawan, A., Samsugi, S., & Alita, D. (2023). Rancang Bangun Sistem Informasi Akademik SMK TAMAN SISWA 1 Tanjung Karang BERBASIS WEB. J. Inform. Dan Rekayasa Perangkat Lunak, 4(1), 53-59.

Pratiwi, Y. A., Ginting, R. U., Situmorang, H., & Sitanggang, R. (2020). Perancangan Sistem Informasi Akademik Berbasis Web Di Smp Rahmat Islamiyah. Jurnal Teknologi Kesehatan Dan Ilmu Sosial (Tekesnos), 2(1), 27-32.[1]

Sitorus, J. H. P., & Sakban, M. (2021). Perancangan Sistem Informasi Penjualan Berbasis Web Pada Toko Mandiri 88 Pematangsiantar. Jurnal Bisantara Informatika, 5(2), 12-24.

Mantu, A. M., Tatuhey, E. L., & Thamrin, R. M. (2024). Rancang Bangun Platform E-commerce berbasis Website pada Media Cell. Jutisi: Jurnal Ilmiah Teknik Informatika dan Sistem Informasi, 13(3).

Setiawan, A., Samsugi, S., & Alita, D. (2023). Rancang Bangun Sistem Informasi Akademik SMK TAMAN SISWA 1 Tanjung Karang BERBASIS WEB. J. Inform. Dan Rekayasa Perangkat Lunak, 4(1), 53-59.

El Boujddaini, F., Laguidi, A., & Mejdoub, Y. (2024, May). A survey on text-to-sql parsing: From rule-based foundations to large language models. In International Conference on Connected Objects and Artificial Intelligence (pp. 266-272). Cham: Springer Nature Switzerland.[2]

Zhang, H. (2024). Application of LSTM-Based Seq2Seq Models in Natural Language to SQL Conversion in Financial Domain. Science, Technology and Social Development Proceedings Series, 2, 10-70088.[3]

Shen, R., Sun, G., Shen, H., Li, Y., Jin, L., & Jiang, H. (2023). Spsql: Step-by-step parsing-based framework for text-to-sql generation. arXiv preprint arXiv:2305.11061.[4]

Xusheng, L., Yeteng, A., Jingxian, L., Huimin, Z., Yumeng, Z., Min, L., ... & Huiqin, L. (2023, January). Research on BERT-based Text2SQL Multi-task Learning. In 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA) (pp. 864-868). IEEE.[5]

Ren, T., Fan, Y., He, Z., Huang, R., Dai, J., Huang, C., ... & Wang, X. S. (2024, May). Purple: Making a large language model a better sql writer. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) (pp. 15-28). IEEE.[6]

Nawrot, P. (2023). nanot5: A pytorch framework for pre-training and fine-tuning t5-style models with limited resources. arXiv preprint arXiv:2309.02373.[7]

Kanburoğlu, A. B., & Tek, F. B. (2023, September). TUR2SQL: A cross-domain Turkish dataset for Text-to-SQL. In 2023 8th International Conference on Computer Science and Engineering (UBMK) (pp. 206-211). IEEE.[8]

Heakl, A., Mohamed, Y., & Zaky, A. B. (2024). Araspider: Democratizing arabic-to-sql. arXiv preprint arXiv:2402.07448.

Zhao, X., Zhou, X., & Li, G. (2024). Chat2data: An interactive data analysis system with rag, vector databases and llms. Proceedings of the VLDB Endowment, 17(12), 4481-4484.[9]

Wong, A., Pham, L., Lee, Y., Chan, S., Sadaya, R., Khmelevsky, Y., ... & Ferri, M. (2024, April). Translating Natural Language Queries to SQL Using the T5 Model. In 2024 IEEE International Systems Conference (SysCon) (pp. 1-7). IEEE.[10]

Chen, Y., Huang, S., Zhuan, Z., & Zhou, E. (2021, November). Research on the Technology of Generating Single-Table sql Query Sentences in Chinese Natural Language. In 2021 2nd International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI) (pp. 48-52). IEEE. [11]

Cahyawijaya, S., Winata, G. I., Wilie, B., Vincentio, K., Li, X., Kuncoro, A., ... & Fung, P. (2021). IndoNLG: Benchmark and resources for evaluating Indonesian natural language generation. arXiv preprint arXiv:2104.08200. [12]

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., ... & Wei, J. (2024). Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70), 1-53.

Meena, A., Kaur, P., Singh Bains, S., Bagri, A., & Agrawal, S. (2024, June). Cross-Language Question-Answering System Using Hugging-Face Transformers. In International Conference on Intelligent Computing and Big Data Analytics (pp. 316-329). Cham: Springer Nature Switzerland.

Srihari, C., Sunagar, S., Kamat, R. K., Raghavendra, K. S., & Meleet, M. (2022, August). Question and answer generation from text using transformers. In International Symposium on Intelligent Informatics (pp. 201-210). Singapore: Springer Nature Singapore.

Downloads

Published

2025-09-30

How to Cite

Putra, C., Arlis, S. ., & Nurcahyo, G. W. (2025). Large Language Model Method as a Translator Indonesian Into SQL Language. Jurnal KomtekInfo, 12(3), 124–130. https://doi.org/10.35134/komtekinfo.v12i3.658

Issue

Section

Articles