

A Novel Approach for University Admission Prediction using Feature selection and Data Upgrading Strategies
Abstract
The uncertainty surrounding university admissions has become a significant concern for students aspiring to secure positions in reputable academic institutions. With increasing competition and complex admission criteria, applicants often struggle to evaluate their chances accurately. This paper presents a robust machine learning-based approach to predict the probability of university admissions by leveraging student academic profiles and entrance exam scores. The core objective is to provide a data-driven framework that can assist both students in making informed decisions and institutions in streamlining the selection process. The study begins by collecting and preparing a diverse dataset that includes students' GRE, TOEFL, EAMCET, NEET, and PGCET exam scores, undergraduate GPA, university ratings, and categorical demographic attributes such as gender and caste—crucial in regional contexts where reservation policies are in place. Multiple supervised machine learning models were developed and compared, including Linear Regression, Decision Tree, Random Forest, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and LASSO regression. Feature selection techniques were employed using methods such as best subset selection and decision tree feature importance, while hyperparameter tuning was conducted using Grid Search to improve model accuracy. Evaluation metrics such as Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) were used to measure and compare the performance of the models. Among all the approaches, the Random Forest algorithm demonstrated superior accuracy and robustness in predicting admission chances.
References
S. S. Shapiro, M. B. Wilk, and B. T. Laboratories, “An analysis of variance test for normality,” 1965.
T. K. Ho, Random Decision Forests. USA: IEEE Computer Society, 1995
R. M. O’Brien, “A caution regarding rules of thumb for variance inflation factors,” Qual. Quant., vol. 41, no. 5, 2007
Sara Aljasmi wrote the paper. Ali Bou Nassif, Ismail Shahin and Ashraf Elnagar revised the experiments and the whole paper.
D. E. Farrar and R. R. Glauber, “Multicollinearity in regression analysis; the problem revisited,” no. 1, pp. 5–7, 2003.
R. M. O’Brien, “A caution regarding rules of thumb for variance inflation factors,” Qual. Quant., vol. 41, no. 5, 2007.
N. S. Altman, “An introduction to kernel and nearest-neighbour nonparametric regression,” Am. Stat., vol. 46, no. 3, pp. 175–185, 1992
F. Salo, M. Injadat, A. Moubayed, A. B. Nassif, and A. Essex, “Clustering Enabled Classification using Ensemble Feature Selection for Intrusion Detection,” in 2019 International Conference on Computing, Networking and Communications (ICNC), 2019, pp. 276–281.
C. López-Martín, Y. Villuendas-Rey, M. Azzeh, A. Bou Nassif, and S. Banitaan, “Transformed k-nearest neighborhood output distance minimization for predicting the defect density of software projects,” J. Syst. Softw., vol. 167, p. 110592.
M. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Multi-split Optimized Bagging Ensemble Model Selection for Multi-class Educational Data Mining,” Appl. Intell., vol. 50, pp. 4506–4528.
Refbacks
- There are currently no refbacks.