3. Data cleansing
This operation is based on a postulate, often summed up as "garbage in, garbage out": if the elements are dirty on entry, the result can only be dirty on exit. Cleaning operations are essential and can be significant. They can be carried out by "mills", processing operations that automatically correct newly collected, processed and integrated data. This is where the talents of humans and, more recently, artificial intelligence come into play, as well as the author's ability to carry out this work.
For the most part, cleaning methods are strongly correlated with technologies and technical developments. Here's a brief overview.
The very "Web 1.0" antediluvian method for Internet users, based mainly on existing site sources, consisted in ad hoc retrieval of data from the Web (copy-paste), page source code or text interpreted by the browser,...
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference
This article is included in
Management and innovation engineering
This offer includes:
Knowledge Base
Updated and enriched with articles validated by our scientific committees
Services
A set of exclusive tools to complement the resources
Practical Path
Operational and didactic, to guarantee the acquisition of transversal skills
Doc & Quiz
Interactive articles with quizzes, for constructive reading
Data cleansing
Bibliography
Also in our database
Bibliography
Using Scrapy to acquire online data and export to multiple output files , Matthew J. Holland.
Exclusive to subscribers. 97% yet to be discovered!
You do not have access to this resource.
Click here to request your free trial access!
Already subscribed? Log in!
The Ultimate Scientific and Technical Reference