A structural and statistical approach to code similarity detection

Austeri, Alessandro Maria (A.A. 2024/2025) A structural and statistical approach to code similarity detection. Tesi di Laurea in Introduction to computer programming, Luiss Guido Carli, relatore Alessio Martino, pp. 44. [Bachelor's Degree Thesis]

[img]
Preview
PDF (Full text)
Download (1MB) | Preview

Abstract/Index

Background and motivation. Problem statement. Research questions and objectives. Proposed approach and contributions. Overview of code similarity and clone detection. Structural approaches. Statistical and NLP-inspired methods. Hybrid methods. Evaluation metrics and benchmarks. Structural-statistical pipeline. Motivations and high‐level design. File gathering and preprocessing. Parsing to abstract syntax trees. Identifier anonymization. Extracting structural features: root-to-leaf paths. Constructing the bag-of-paths. TF–IDF vectorization. Cosine similarity computation. Downstream analyses: clustering, thresholding and visualization. Parameter sensitivity and tuning. Performance.

References

Bibliografia: pp. 36-38.

Thesis Type: Bachelor's Degree Thesis
Institution: Luiss Guido Carli
Degree Program: Bachelor's Degree Programs > Bachelor's Degree Program in Management and Computer Science, English language (L-18)
Chair: Introduction to computer programming
Thesis Supervisor: Martino, Alessio
Academic Year: 2024/2025
Session: Summer
Deposited by: Alessandro Perfetti
Date Deposited: 13 Nov 2025 14:32
Last Modified: 13 Nov 2025 14:32
URI: https://tesi.luiss.it/id/eprint/43834

Downloads

Downloads per month over past year

Repository Staff Only

View Item View Item