Data cleansing
Scraping, methods and tools for business intelligence
Practical sheet REF: FIC1275 V1
Data cleansing
Scraping, methods and tools for business intelligence

Author : David COMMARMOND

Publication date: August 10, 2024 | Lire en français

Logo Techniques de l'Ingenieur You do not have access to this resource.
Request your free trial access! Free trial

Already subscribed?

3. Data cleansing

This operation is based on a postulate, often summed up as "garbage in, garbage out": if the elements are dirty on entry, the result can only be dirty on exit. Cleaning operations are essential and can be significant. They can be carried out by "mills", processing operations that automatically correct newly collected, processed and integrated data. This is where the talents of humans and, more recently, artificial intelligence come into play, as well as the author's ability to carry out this work.

For the most part, cleaning methods are strongly correlated with technologies and technical developments. Here's a brief overview.

The very "Web 1.0" antediluvian method for Internet users, based mainly on existing site sources, consisted in ad hoc retrieval of data from the Web (copy-paste), page source code or text interpreted by the browser,...

You do not have access to this resource.
Logo Techniques de l'Ingenieur

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource. Click here to request your free trial access!

Already subscribed?


Article included in this offer

"Management and innovation engineering"

( 434 articles )

Complete knowledge base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

View offer details
Contact us