Practical sheet | REF: FIC1275 V1

Scraping, methods and tools for business intelligence

Author: David COMMARMOND

Publication date: August 10, 2024 | Lire en français

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!


3. Data cleansing

This operation is based on a postulate, often summed up as "garbage in, garbage out": if the elements are dirty on entry, the result can only be dirty on exit. Cleaning operations are essential and can be significant. They can be carried out by "mills", processing operations that automatically correct newly collected, processed and integrated data. This is where the talents of humans and, more recently, artificial intelligence come into play, as well as the author's ability to carry out this work.

For the most part, cleaning methods are strongly correlated with technologies and technical developments. Here's a brief overview.

The very "Web 1.0" antediluvian method for Internet users, based mainly on existing site sources, consisted in ad hoc retrieval of data from the Web (copy-paste), page source code or text interpreted by the browser,...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource.
Click here to request your free trial access!

Already subscribed? Log in!


The Ultimate Scientific and Technical Reference

A Comprehensive Knowledge Base, with over 1,200 authors and 100 scientific advisors
+ More than 10,000 articles and 1,000 how-to sheets, over 800 new or updated articles every year
From design to prototyping, right through to industrialization, the reference for securing the development of your industrial projects

This article is included in

Management and innovation engineering

This offer includes:

Knowledge Base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

Practical Path

Operational and didactic, to guarantee the acquisition of transversal skills

Doc & Quiz

Interactive articles with quizzes, for constructive reading

Subscribe now!

Ongoing reading
Data cleansing