

Future Proofing Real Estate: Machine Learning for Price Predictions
Abstract
Predicting housing prices accurately is a real-world problem with huge implications in real estate planning, investment decisions, and policy-making. This research focuses on building a regression model using the XGBoost algorithm to estimate house prices based on the California Housing dataset. The dataset includes key features such as median income, average number of rooms, and population density, which all influence housing prices. Through data preprocessing, visualization, and training, we explored the relationship between these features and house prices. The model was evaluated using metrics like R-squared and Mean Absolute Error, showing strong performance and reliable predictions. Overall, this study highlights the effectiveness of gradient boosting models in real estate price forecasting and opens doors for future applications in automated valuation systems and smart housing analytics.
References
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830. https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html
Chen, T., & Guestrin, C. (2016, August). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). https://doi.org/10.1145/2939672.2939785
California Housing Dataset. (n.d.). Scikit-learn documentation. Retrieved from https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
Waskom, M. (2021). Seaborn: Statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., & Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2
McKinney, W. (2010). Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 51–56). https://doi.org/10.25080/Majora-92bf1922-00a
Python Software Foundation. (2023). Python Language Reference, version 3.x. Available at https://www.python.org/
Refbacks
- There are currently no refbacks.