Algorithms for Data Cleaning in Knowledge Bases

Adeel Ashraf, Sarah Ilyas, Khawaja Ubaid ur Rehman, Shakeel Ahmad


Data cleaning is an action which includes a process of correcting and identifying the inconsistencies and errors in data warehouse. Different terms are uses in these papers like data cleaning also called data scrubbing. Using data scrubbing to get high quality data and this is one the data ETL (extraction transformation and loading tools). Now a day there is a need of authentic information for better decision-making. So we conduct a review paper in which six papers are reviewed related to data cleaning. Relating papers discussed different algorithms, methods, problems, their solutions, and approaches etc. Each paper has their own methods to solve a problem in an efficient way, but all the paper have a common problem of data cleaning and inconsistencies. In these papers data inconsistencies, identification of the errors, conflicting, duplicate records etc problems are discussed in detail and also provided the solutions. These algorithms increase the quality of data. At ETL process stage, there are almost thirty-five different sources and causes of poor quality constraints.

Full Text:



Rahm, E., Do, H.H. (2000). Data Cleaning: Problems and Current Approaches. IEEE Data Engineering Bull. Vol 23 No. 4, pp. 3-13

Informatics and Computational Intelligence (ICI) 2011, Mohamed H.H . IEEE Xplore Digital Library. “E-Clean : A Data Cleaning Framework for Patient Data”

Monge, A. E. (2000). Matching Algorithms within a Duplicate Detection System. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, pp. 18-19.

Louardi BRADJI, Mahmoud BOUFAIDA. (2011). Open User Involvement in Data Cleaning for Data Warehouse Quality. International Journal of Digital Information and Wireless Communications (IJDIWC) 1(2), pp. 573.

Deku JerryYao,Mohammad Sarrab and Hamza Aldabbas (2012).Three Tier level Data Warehouse Architecture for Ghanaian Petroleum Industry. International Journal of Database Management Systems (IJDMS) Vol.4, No.5, pp 1

Vassiliads, P.(2009). A Survey of Extract-Transform-Load Technology. In InternationalJournal of Data Warehousing & Mining,vol.5 ,no. 3, pp. 1-27

Monge, A. E. (2000). Matching algorithms within a duplicate detection system. IEEE Data Eng. Bull., 23(4), 14-20.

Salim, N., & Ibrahim, R. (2011, December). Towards data quality into the data warehouse development. In Dependable, Autonomic and Secure Computing (DASC), 2011 IEEE Ninth International Conference on (pp. 1199-1206). IEEE.

Paul, A., Ganesan, V., Challa, J. S., & Sharma, Y. (2012, March). HADCLEAN: A hybrid approach to data cleaning in data warehouses. In Information Retrieval & Knowledge Management (CAMP), 2012 International Conference on (pp. 136-142). IEEE.

Yan, H., Diao, X. C., & Li, K. Q. (2008, November). Research on information quality driven data cleaning framework. In Future Information Technology and Management Engineering, 2008. FITME'08. International Seminar on (pp. 537-539). IEEE.

Ahmed, I., & Aziz, A. (2010). Dynamic approach for data scrubbing process. International Journal on Computer Science and Engineering, 2(02), 416-423.

Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., 23(4), 3-13.

Housien, H. I., Zuping, Z., & Abdulhadi, Z. Q. (2013). A Comparison Study of Data Scrubbing Algorithms and Frameworks in Data Warehousing. International Journal of Computer Applications, 68(25).

Kulkarni, Prerna S., and J. W. Bakal. "Survey on Data Cleaning." structure 3.4 (2014).

Patil, R. Y., & Kulkarni, R. V. (2012). A review of data cleaning algorithms for data warehouse systems. International Journal of Computer Science and Information Technologies, 3(5), 5212-5214.

Akmal, H., Fox, R., Iqbal, S., & Khan, Y. D. An Educational Data Science Approach Towards Prediction Of Student Performance. IACB, ICE & ICTE.

Kim, W., Choi, B. J., Hong, E. K., Kim, S. K., & Lee, D. (2003). A taxonomy of dirty data. Data mining and knowledge discovery, 7(1), 81-99.

Hernández, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data mining and knowledge discovery, 2(1), 9-37.

Noh, J. B., Lee, K. C., Kim, J. K., Lee, J. K., & Kim, S. H. (2000). A case-based reasoning approach to cognitive map-driven tacitknowledge management. Expert systems with applications, 19(4), 249-259.

Maletic, J. I., & Marcus, A. (2009). Data cleansing: A prelude to knowledge discovery. In Data Mining and Knowledge Discovery Handbook (pp. 19-32). Springer, Boston, MA.



  • There are currently no refbacks.