Evaluating artificial intelligence

Overview

ABSTRACT

Artificial intelligence (AI) is rapidly growing, questioning all audiences, individual, professional, academic. Rational and shared principles and practices to measure the performance and limits of intelligent systems have to be set up.

A methodical approach that complies with the rules of metrology allows us to draw the broad outlines: metrics to carry out quantitative and repeatable performance measurements, physical and virtual testing environments to perform reproducible experiments that are representative of the real operating conditions of the AI being evaluated, and organizational tools (benchmarking, challenges, competitions) that meet the needs of the entire ecosystem.

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHOR

Guillaume AVRIN: Head of AI Evaluation Department - Laboratoire national de métrologie et d'essais, Paris, France

INTRODUCTION

Since 2017, artificial intelligence (AI) has seen major developments in many professional sectors (diagnostic assistance, biometric identification, chatbots, detection of vulnerabilities and cybersecurity threats, collaborative industrial robots, inspection and maintenance robots, autonomous mobility systems, etc.) and at home (personal assistance robots, medical devices, personal assistants, etc.). It is therefore one of the top European and international priorities for technological and industrial development and the health breakthrough of 2020 contributes to this transformation towards a more "virtualized" society, less exposed to biological vulnerabilities.

To ensure that the market is not driven solely by supply, and that the conditions are in place for matching supply with demand, scientific and technical methods are needed to evaluate AI . This promises to provide reliable, quantitative results concerning the levels of performance, robustness and explainability achieved by different AI systems. This will provide end-users with the guarantees that determine the acceptability of these technologies. They will be able to choose between different existing solutions thanks to objective and unambiguous common references. Developers, for their part, will benefit from benchmarks to guide their R&D and quality control efforts, as well as tools to demonstrate their lead and stand out from the competition. Evaluation will therefore build the confidence needed to make the transition from developing AI to marketable AI.

Standardization work is underway to adapt existing software development standards (IEC 62304 for medical devices, ISO 26262 for road vehicles, etc.) to the specificities of AI (notably Cen-Cenelec JTC21 and ISO/IEC JTC1/SC42).

This work will focus in particular on assessment tools and methods, of which two generic approaches can be distinguished (cf. ISO/IEC 17011): auditing and testing. Audits consist in analyzing...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource. Click here to request your free trial access!

Already subscribed? Log in!

KEYWORDS

CAN BE ALSO FOUND IN:

Ongoing reading
Evaluating artificial intelligence

General principle of AI evaluation

Article included in this offer

"Industry of the future"

( 103 articles )

Complete knowledge base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

View offer details

Bibliography

(1) - EUROPÉENNE (C.) - Intelligence artificielle – Une approche européenne axée sur l'excellence et la confiance - (2020).
(2) - TEAM (A.P.) - Artificial Intelligence Measurement and Evaluation at the National Institute of Standards and Technology -...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource. Click here to request your free trial access!

Already subscribed? Log in!