Open Access Open Access  Restricted Access Subscription Access

Future Proofing Real Estate: Machine Learning for Price Predictions

Hafsa Fathima, Soha Juveria, Syeda Hifsa Naaz, I.V. Shashikala, K. Dora Babu

Abstract


Predicting housing prices accurately is a real-world problem with huge implications in real estate planning, investment decisions, and policy-making. This research focuses on building a regression model using the XGBoost algorithm to estimate house prices based on the California Housing dataset. The dataset includes key features such as median income, average number of rooms, and population density, which all influence housing prices. Through data preprocessing, visualization, and training, we explored the relationship between these features and house prices. The model was evaluated using metrics like R-squared and Mean Absolute Error, showing strong performance and reliable predictions. Overall, this study highlights the effectiveness of gradient boosting models in real estate price forecasting and opens doors for future applications in automated valuation systems and smart housing analytics.


Full Text:

PDF

References


Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830. https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html

Chen, T., & Guestrin, C. (2016, August). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785

California Housing Dataset. (n.d.). Scikit-learn documentation. Retrieved from https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55

Waskom, M. (2021). Seaborn: Statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021

Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., & Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2

McKinney, W. (2010). Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 51–56). https://doi.org/10.25080/Majora-92bf1922-00a

Python Software Foundation. (2023). Python Language Reference, version 3.x. Available at https://www.python.org/


Refbacks

  • There are currently no refbacks.