Open Access Open Access  Restricted Access Subscription Access

From Raw to Refined: Python's Touch on Data Cleaning

M. Bharathi, D. Abhiram, I.V. Dwaraka Srihith

Abstract


This paper employs Python, particularly the pandas library, as a powerful tool for navigating and transforming data. Beginning with dataset loading and exploratory analysis, it delves into crucial techniques like handling missing values, selecting pertinent columns, and converting categorical variables. The process includes imputation methods, numeric data adjustments, and categorical data transformation through one-hot encoding. This guide culminates in a validated and polished dataset, emphasizing Python's efficacy in elevating data quality and setting the stage for robust analysis, underlining the importance of pristine data in the analytical pipeline.


Full Text:

PDF

References


https://www.kdnuggets.com/2023/04/exploring-data-cleaning-techniques-python.html

Fathima, Juveria, and T. Aditya Sai Srinivas. "Fortune Forecaster: Harnessing Machine Learning for Profit Prognostication." Advancement of Computer Technology and its Applications 7, no. 1 (2023): 47-52. https://doi.org/10.5281/zenodo.10254099

https://www.analyticsvidhya.com/blog/2021/06/data-cleaning-using-pandas/

Cite as: Juveria Fathima, & T. Aditya Sai Srinivas. (2023). ChicCode: Python-Powered Fashion Recommendation for Trendsetters. Journal of Advanced Research in Artificial Intelligence & It's Applications, 1(1), 9–14. https://doi.org/10.5281/zenodo.10253849

https://towardsdatascience.com/how-to-clean-your-data-in-python-8f178638b98d

T. Aditya Sai Srinivas, Y. Vinod Kumar, Y. Sravanthi, & I.V. Dwaraka Srihith. (2024). Optimizing Machine Learning Models with Data Resampling in Python. Advancement of Computer Technology and Its Applications, 7(1), 32– 36. https://doi.org/10.5281/zenodo.10077296

https://www.w3schools.com/python/pandas/pandas_cleaning.asp

I.V. Dwaraka Srihith, A. David Donald, T. Aditya Sai Srinivas, G. Thippanna, & P. Vijaya Lakshmi. (2023). Exploratory Data Analysis on Autopilot: Python's Automatic Solutions. Recent Trends in Androids and IOS Applications, 5(3), 20–26. https://doi.org/10.5281/zenodo.8379053

https://realpython.com/python-data-cleaning-numpy-pandas/


Refbacks

  • There are currently no refbacks.