Treffer: Clonación de código en proyectos Go open source: un análisis de prevalencia y estructura en GitHub

Title:
Clonación de código en proyectos Go open source: un análisis de prevalencia y estructura en GitHub
Contributors:
Cadavid Rengifo, Héctor Fabio, Garzón Alfonso, Wilmer Edicson, Vega Bernal, Heimar Yadir, Toquica Barrera, Javier Ivan
Publisher Information:
Universidad Escuela Colombiana de Ingeniería
Bogotá
Maestría en Informática
Publication Year:
2025
Document Type:
Dissertation master thesis
File Description:
110 páginas; application/pdf
Language:
Spanish; Castilian
Relation:
D. Zügner, T. Kirschstein, M. Catasta, J. Leskovec, and S. Günnemann, “Languageagnostic representation learning of source code from structure and context,” arXiv preprint arXiv:2103.11318, 2021.; X. Han, Z. Zhang, N. Ding, Y. Gu, X. Liu, Y. Huo, J. Qiu, Y. Yao, A. Zhang, L. Zhang, W. Han, M. Huang, Q. Jin, Y. Lan, Y. Liu, Z. Liu, Z. Lu, X. Qiu, R. Song, J. Tang, J.-R. Wen, J. Yuan, W. X. Zhao, and J. Zhu, “Pre-Trained Models: Past, Present and Future,” arXiv, 2021.; D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, “UniXcod; J. A. M. Santos, J. B. Rocha-Junior, L. C. L. Prates, R. S. Do Nascimento, M. F. Freitas, and M. G. De Mendonça, “A systematic review on the code smell effect,” Journal of Systems and Software, vol. 144, pp. 450–477, 2018.; J. Kanwal, O. Maqbool, H. A. Basit, M. A. Sindhu, and K. Inoue, “Historical perspective of code clone refactorings in evolving software,” PLOS ONE, vol. 17, no. 12, p. e0277216, 2022.; M. Pyl, B. v. Bladel, and S. Demeyer, “An Empirical Study on Accidental CrossProject Code Clones,” 2020 IEEE 14th International Workshop on Software Clones (IWSC), vol. 00, pp. 33–37, 2020.; R. Perez-Castillo and M. Piattini, “An empirical study on how project context impacts on code cloning,” Journal of Software: Evolution and Process, vol. 30, no. 12, 2018.; R. Cox, R. Griesemer, R. Pike, I. L. Taylor, and K. Thompson, “The Go programming language and environment,” Communications of the ACM, vol. 65, no. 5, pp. 70–78, 2022.; L. Jiang, G. Misherghi, Z. Su, and S. Glondu, “Deckard: Scalable and accurate treebased detection of code clones,” in 29th International Conference on Software Engineering (ICSE’07), pp. 96–105, IEEE, 2007.; C. Wang, J. Gao, Y. Jiang, Z. Xing, H. Zhang, W. Yin, M. Gu, and J. Sun, “Go-clone: Graph-embedding based clone detector for golang,” in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, (New York, NY, USA), p. 374–377, Association for Computing Machinery, 2019.; I. Bahar, M. Herlihy, E. Witchel, A. Lebeck, T. Tu, X. Liu, L. Song, and Y. Zhang, “Understanding Real-World Concurrency Bugs in Go,” Proceedings of the TwentyFourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 865–878, 2019.; A. E. Hassan, “The road ahead for mining software repositories,” in 2008 frontiers of software maintenance, pp. 48–57, IEEE, 2008.; GitHub, “Repository statistics query using github api,” 2023. Retrieved from https: //api.github.com/.; Q. U. Ain, W. H. Butt, M. W. Anwar, F. Azam, and B. Maqbool, “A systematic review on code clone detection,” IEEE Access, vol. 7, pp. 86121–86144, 2019.; H. P. Samoaa, F. Bayram, P. Salza, and P. Leitner, “A systematic mapping study of source code representation for deep learning in software engineering,” IET Software, vol. 16, no. 4, pp. 351–385, 2022.; D. Jurafsky and J. H. Martin, “Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition.”; U. Alon, M. Zilberstein, O. Levy, and E. Yahav, “code2vec: Learning Distributed Representations of Code,” arXiv, 2018.; B. Casey, J. C. S. Santos, and G. Perry, “A survey of source code representations for machine learning-based cybersecurity tasks,” arXiv, 2024.; H. Wang, J. Li, H. Wu, E. Hovy, and Y. Sun, “Pre-Trained Language Models and Their Applications,” Engineering, vol. 25, pp. 51–65, 2023.; D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fu, M. Tufano, S. K. Deng, C. Clement, D. Drain, N. Sundaresan, J. Yin, D. Jiang, and M. Zhou, “GraphCodeBERT: Pre-training Code Representations with Data Flow,” arXiv, 2020.; K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text Classification Algorithms: A Survey,” Information, vol. 10, no. 4, p. 150, 2019.; L. Li, H. Feng, W. Zhuang, N. Meng, and B. Ryder, “Cclearner: A deep learningbased clone detection approach,” in 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 249–260, 2017.; H. Jelodar, Y. Wang, C. Yuan, X. Feng, X. Jiang, Y. Li, and L. Zhao, “Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey,” Multimedia Tools and Applications, vol. 78, pp. 15169–15211, 06 2019.; S. BELLAOUAR, M. M.BELLAOUAR,andI.E.GHADA,“Topic Modeling: Comparison of LSA and LDA on Scientific Publications,” 2021 4th International Conference on Data Storage and Data Engineering, 2021.; M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” arXiv, 2022.; A. Lerina and L. Nardi, “Investigating on the impact of software clones on Technical Debt,” 2019 IEEE/ACM International Conference on Technical Debt (TechDebt), vol. 00, pp. 108–112, 2019.; D. Lo, S. McIntosh, N. Novielli, Q. Wu, H. Song, and P. Yang, “Real-World CloneDetection in Go,” 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), vol. 00, pp. 78–79, 2022.; Y. Golubev and T. Bryksin, “On the Nature of Code Cloning in Open-Source Java Projects,” arXiv, 2021.; A. I. Kadhim, “An evaluation of preprocessing techniques for text classification,” International Journal of Computer Science and Information Security (IJCSIS), vol. 16, no. 6, pp. 22–32, 2018.; PoolC, “5-fold-clone-detection,” 2022. Retrieved from https://poolc.org/.; C. Strobl, J. Malley, and G. Tutz, “An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.,” Psychological methods, vol. 14, no. 4, p. 323, 2009.; D. Das, A. A. Maruf, R. Islam, N. Lambaria, S. Kim, A. S. Abdelfattah, T. Cerny, K. Frajtak, M. Bures, and P. Tisnovsky, “Technical debt resulting from architectural degradation and code smells: a systematic mapping study,” ACM SIGAPP Applied Computing Review, vol. 21, no. 4, pp. 20–36, 2022.; C. K. Roy and J. R. Cordy, “A survey on software clone detection research,” Queen’s School of computing TR, vol. 541, no. 115, pp. 64–68, 2007.; https://repositorio.escuelaing.edu.co/handle/001/3664; Universidad Escuela Colombiana de Ingeniería; Repositorio Digital; https://repositorio.escuelaing.edu.co/
Rights:
Attribution 4.0 International ; http://creativecommons.org/licenses/by/4.0/ ; http://purl.org/coar/access_right/c_abf2 ; info:eu-repo/semantics/openAccess
Accession Number:
edsbas.9DFC1CB1
Database:
BASE

Weitere Informationen

En este trabajo de grado, se consideró la prevalencia de la clonación de código en repositorios Go open source en GitHub. Para la detección de los clones en los proyectos se refinó en el conjunto de datos PoolC el modelo pre-entrenado UniXcoder. Los repositorios fueron clasificados según sus dominios de aplicación por medio del modelado de temas, con el algoritmo de Latent Dirichlet Allocation, además se identificaron las grupos de clones formados por grupos de códigos similares dentro de cada dominio de aplicación. ; In this thesis, we considered the prevalence of code cloning in Go open source repositories on GitHub. To detect clones in projects, the pre-trained UniXcoder model was refined in the PoolC dataset. The repositories were classified according to their application domains using topic modeling with the Latent Dirichlet Allocation algorithm. In addition, groups of clones formed by groups of similar codes within each application domain were identified. ; Índicegeneral Declaracióndelautor ii Resumen iii Abstract iv Agradecimientosydedicatoria v Índicedefiguras ix Índicedecuadros xi 1. Introducción 1 1.1. Preguntasdeinvestigación . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Contribuciones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Objetivos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1. Objetivogeneral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2. Objetivosespecíficos . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4. Metodología . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.1. Mineríaderepositorios . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.2. Prácticasdedesarrollo . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5. Contenidoresumido. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.Marcoteórico 12 2.1. Clonacióndecódigo: fundamentos . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.1. ¿Quéesunclondecódigo?. . . . . . . . . . . . . . ...