Harm, Safety, Effective AI Risk Management, and Regulation

On October 30, 2023, the White House announced its Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. The order addressed, in an industry-led US context, calls for governments to become actively engaged in regulating artificial intelligence. The European Union Artificial Intelligence Act, passed in March, takes a much more prescriptive approach. However, both are based on precautionary principles, requiring AI developers to actively manage risks of harm from their applications to end users and society. In the EU case, developers could be held strictly liable when harm arises if they are found to have breached any regulatory obligations.

Risk management lies at the core of both sets of governance arrangements. If a risk is to be managed, then it first must be defined. Mathematically (in the Knightian sense), this is the product of an outcome’s probability of occurring and the (financial) magnitude of the outcome’s effects. A broader definition from the International Standards Organization (ISO) risk management standard, ISO 31000, is the “effect of uncertainty on objectives.” While ISO 31000 acknowledges that the effect may be either positive or negative, in practice, risk management activities almost always focus on negative consequences—that is, minimizing the likelihood or mitigating the consequences of falling short of achieving (harms to) the espoused objective.

However, to manage the effects of a harm, it must first be possible to:

Define it (i.e., have a theory of harm and how it arises),
Detect it, and
Design an instrument to prevent it from occurring (or to mitigate its effects).

The instrument needs to be:

Effective (i.e., actually reduce or avert harm and its consequences),
Reliable (with low rates of false positive and false negative outcomes), and
Cost-effective. (Detecting and managing the harm should be less costly than the expected diminution of the objective if it materializes.)

Both the EU Artificial Intelligence Act and the US National Institute of Standards for Technology (NIST) AI Risk Management Framework follow the ISO 31000’s inclusive processes for:

Risk identification throughout an AI’s lifecycle,
Risk analysis and evaluation (including prioritizing multiple risks),
Risk treatment,
Monitoring and review, and
Communication and consultation (including transparency, documentation, and stakeholder engagement).

So how well do these instruments actually handle harms associated with AIs, and in particular, the generative pretrained transformers such as the large language models (LLMs)?

The first obstacle comes with definition. The LLMs are designed by their developers to be used (and even modified) by end users in myriad applications. The nature of these models means that even their developers do not know how or why they produce specific outputs. It is therefore impossible ex ante to identify all potential harms and probabilities from these applications. At best, the risks identified and managed will be those already encountered—for example, bias in outputs and accuracy in specific tasks (e.g., those measured in the Human-Centered Artificial Intelligence project’s AI Index). By definition, unknown risks cannot be identified in the first place, let alone analyzed, evaluated, treated, or even detected ex post.

The EU (medium-risk applications) and NIST processes at best oblige application developers to identify, document, and manage a number of already known or anticipatable harm scenarios in which some conception of the probabilities of occurrence and magnitudes of harm already exists. Reliability and cost-effectiveness are judged by the effects on the application developer’s objectives, not society’s. The EU act does not manage the boundaries between high-risk and banned applications by identifying true positive or true negative interventions or by a principled assessment of the costs and benefits to society of releasing an application. Rather, it relies on identifying a set of situations deemed so important that the use of AI tools in them will not be permitted regardless of the outcome of any assessment of trade-offs (e.g., “infer[ring] emotions of a natural person in the areas of workplace and education institutions”).

Dealing with the situation of true Knightian uncertainty requires a much more complex decision-making process about application use rather than an operational risk management process. This requires careful evaluation of the worst-case scenarios of releasing (catastrophic harm) and not releasing (including foregone benefits that will not accrue) when neither outcomes nor probabilities are well understood. These are not easy decisions. Ultimately, some sort of judgement must be made. In the face of unknown probabilities of genuine catastrophe, the threshold for intervention to prevent a technology being released rather than utilizing a risk management approach is when “waiting and learning” is no longer prudent.

It is not clear that current AI risk management processes cope with uncertainty or handle harm effectively.

Learn more: Competition in AI Regulation: Essential for an Emerging Industry | AI Regulation Increases Certainty—but for Whom? | Transparency—Like Charity—Begins at Home? | Who Should Be Responsible for Election Content Authentication?