“AI” and accessible front-end components: is the nuance generatable?

Companies are rushing to add generated AI capabilities to their products. Some promise to produce front-end components for you. Is that even possible, given the nature of accessibility and the nature of generative AI? And is it desirable?

The short answer is no, to both questions. The risk: that our rush to technlogical solutions comes at the expense of users.

To find out why, let's consider: how is the process of building an accessible component different between humans and machines? And what are the ethics of our tendency to reach for technological solutions?

The human approach#heading-1

Let's look at the differences in process first. A human who writes accessible front-end code, writes (mostly) HTML elements and attributes based on:

their understanding of specs and how they work together (including HTML and WAI-ARIA)
what they intend to convey
what they know about how assistive technologies interpet the code they write
knowledge of browser and assistive technology support
looking up the syntax and applying it correctly

(Leaving aside all the useful templating languages and orchestration libraries)

So they translate what they or their designer counterparts want to exist into something that works according to those intentions in a browser. Intentions are a key word here. Conveying author intentions accurately and understanding user needs is essential to accessibility.

They are likely also involved in writing CSS for things like colours, typography and spacing, which can all affect whether websites have barriers for users. And add JS for interactive stuff, managing state(s) and more.

The machine approach#heading-2

A tool that generates code using language models basically predicts lines of code based on statistical likelihood, a bit like an autocomplete. If the output happens to be high quality, that's, in principal, coincidental. A systems' success rate can be (and is usually) increased by training models specifically with very good examples. In some cases, systems get very close to high quality, because they have enormous amounts of training data. For accessibility, this data is hard to get by—most of the web has accessibility problems: what we can see in the automated tests of the WebAIM Million is just the tip of the proverbial iceberg.

While humans map intentions to interactive content and apply their understanding in the process, LLMs don't have intentions or understanding. They just output blobs of text that matches some input best. I think this is fascinating, impressive and often akin to magic. And the output can look (and sometimes be) production ready and high quality. But it's unsurprising that the output can also contain problems. And as reasonable web developers, we've got to look at the problems we create.

To make this more concrete, let's look at v0, Vercel's LLM-based code generator product that the Vercel CEO announced as:

v0.dev produces the kind of production-grade code that we'd want to ship in our own @vercel products.

(From: tweet by Guillermo Rauch, 12:15 AM · Sep 15, 2023)

I mention this specifically, because I think claims like “production-ready” are an overestimation of the technology and an undervaluation of the need for humans. Which has real-world effects on people.

When I read “production-grade”, I read “accessible”. I had a brief look at the first six components in the ”featured“ section of v0, I found WCAG violations and accessibility barriers in each.

Examples of barriers in each

in math learning app example: buttons marked up as links, progress indication that was only visual with no text alternative, heading marked up as div
In kanban board example: list of items not marked up as list, column headers with low contrast, overlapping text on zoom
in accessibility helper example: overrides existing shortcuts, icons not marked as decorative
in terminal UI: buttons not marked up as buttons
in pricing table: icons not marked up as decorative, button with insufficient contrast
in music player example: various buttons not marked up as buttons, some buttons not available with just keyboard, buttons without accessible names

This isn't a full conformance audit, I just listed the first few things that stood out. I don't mean this as an attack, I just want to show exactly how common accessibility issues in LLM output are.

You might say it's not all terrible, and that's true. I also found lots of markup that makes things accessible, for instance headings that are useful to navigate in various tools, good contrasts and useful + valid ARIA. But that same level of accessibility often exists on websites that didn't involve LLMs. Lots of websites have fairly useful headings, good contrast on many elements and valid ARIA. It's the bits where those things aren't in place where web UIs create barriers for people with disabilities. It's the nuance that matters.

The self-confidence issue#heading-3

In Can generative AI help write accessible code?, Léonie Watson looks at the output of three other generative AI tools (ChatGPT, Bard and Fix My Code). Like me, she found things that weren't terrible, things that were actually helpful and things that constituted accessibility issues. But Léonie points out a different problem: these tools thend to present themselves as authoritative. Regardless of whether they are. She explains:

Other than the generic statements about the need to check its responses, none of these generative AI tools gives any hint that their answers may not be correct or provides any recommended resources for checking".

In contrast, most good blog posts and resources about accessible coding have a lot of nuance in them. They usually can't recommend one authoritative solution that is guaranteed to work at all times (what definition of “work” would they use?). And that reflects making accessible interfaces in general. It involves rabbit holes. There are generally multiple ways and multiple least bad outcomes to balance between.

Ok, but can LLMs at least be partially useful?#heading-4

Maybe the problem of authoritativeness could be solved. We could tune these tools to output responses that don't present as mansplainy know-alls. But that still leaves us with other problems: inaccessible suggestions, lack of intention and understanding and lack of innovation.

Falsehoods and hallucinations

LLMs give inaccessible suggestions, as demonstrated in the examples I shared above and in the examples in Léonie's post. If these falsehoods are a consequence of training data, that could be improved with different training data (emphasis on “in theory”). But it's also due to “hallucinations”, a problem inherent to the tech that research shows is inevitable. They make wrong stuff up. Output may be nonsensical. At the expense of users. That can't possibly be an improvement to the status quo: even without “AI” there are plenty of accessibility tips on the web with specific bugs or issues, automating the addition of falsehoods and hallucinations to the mix seems absurd.

Lack of intentions

LLM tools don't have intentions, and intentions are necessary for (most) “accessible coding”. In his post Why doesn't AI work for producing accessible code, Alastair Campbell explains accessibility is not an average. That makes it incompatible with statistical methods to make suggestions.

Lack of innovation

While there are lots of open source component libraries, many UI patterns and their implications haven't been invented yet. Their assumptions dearly need testing. Relying on LLMs for suggestions means relying on (remixed) existing knowledge, so it's unsuitable for making new patterns accessible.

These three reasons make me wonder: are LLMs useful at all in assisting us in building accessible front-end components? If there is a use, it's probably in helping developers discover resources that do contain nuance, not in code suggestions. Maybe there are also uses outside of component code, but that's for another post (see also Aaron Gustafson's Opportunities for AI in Accessibility).

The focus#heading-5

Probably something for a post on its own, but I feel I should mention here: a focus on trying to find a “fix” or “solution” for accessibility constitutes a misunderstanding of what accessibility is about. When we make websites, the onus is on us to make them accessible. If we want to try and outsource that work to a tool (that we can't trust), we put the onus on disabled users (see also: disability dongles).

As Adrian Rosellli wrote in AI will not fix accessibility, accessibility is about outcomes, not outputs:

Accessibility is about people. (…) When we target output versus outcomes, we are failing our friends, our family, our community, and our future selves. We are excluding fellow humans while we try to absolve ourselves of responsibility. (…)

Eric Bailey posted:

Thinking AI will "solve" accessibility is a bad frame stemming from a technoableist mindset.

The industry seems to me hoping for a magic, binary solution (…) Personally, I'd look to the social model of disability for guidance here: what exactly are we looking to "fix" and why?

In summary#heading-6

Is the nuance that accessibility usually needs generatable? I think not. Not reliably, anyway. If you take away one thing from this post, I hope it's this warning: LLM-based tools can't be the magic bullet for writing accessible component code that they promise to be. Because nuance, understanding and conveying intent are inherent to accessibility, LLMs cannot be of great help with the accessibility of component code. In addition, they hallucinate inevitably and tend to pose as authoritative while outputting (occassional, but real) falsehoods. The latter can be dangerous and is likely to come at the expense of users.

My suggestion to developers who want help building accessible components? Use a design system that's well tested with people, that is well documented, and that (at least) attempts to capture the nuance. Or get involved in building one. Not everyone wants to do this nuanced and precise work, not every organisation has the budget. That's fine, but let's not suggest it can be automated away, magically. Let's value the human effort that can make web products actually great.

Thanks to Baldur Bjarnason for advice when I was working on an earlier draft of this post, and Eric Eggert for review of an earlier draft.