“I’m ready to throw my goddamn phone out the window. I spent an hour setting up a reservation & then the ‘server’ had an ‘error’ & I had to do it again.” I’ve been carefully watching non-geeks use computers & we are failing our users.
LLMs could do something genuinely useful for all of us. First, though, some background.
Thanks to sponsor Graceful.dev
Beautiful, effective advice for Ruby/Rails programmers (and beyond). Get 10% off lifetime by using coupon code BECK at checkout.
Windows/Panes/Menus/Items—Actions
Back at the dawn of our current user interface paradigm (early 70s Smalltalk), the interface had a four-level hierarchy:
We had windows of various types, each of which was split into panes, also of various types, each of which had a menu which had several items. That was it. Want to know what you can do with the interface? Draw this tree. Rake up the leaves. Those are your available actions.
(Now, in Smalltalk this is a vast understatement of the capabilities of the interface because one of those menu items was “Do It”, which executed arbitrary code. But for a non-programming user, these were the available actions.) (All users were expected to learn to program btw.)
Scale
Folks soon wanted more actions than fit into this paradigm. Well, folks wanted to add more actions if “folks” meant programmers/product folks. Did users want more actions? Maybe.
Anyway, the desire for more actions led to the great Sub Menu War—were menu items that expanded into more menu items a blessing or just the symptom of poor higher-level user interface design? Would different windows or different panes make one-level menus sufficient?
Sub-menus won (sorry about that y’all). Now we could accommodate the next exponent’s worth of actions.
Intentions
People come to computer systems with intentions that are separate from the actions those systems provide. I want to:
Add a layover in Minneapolis to my current trip
Watch The Voice
Switch the clauses in a sentence
I don’t want to select things, push buttons, choose menu items, & fill in fields. There’s a mapping from these higher level “intentions” to actions.
When the mapping is direct, 1-to-1, then using a computer is magic.
We even tell our friends these stories—”…and then the computer did exactly what I wanted!!!”
Mismatch
We tell these stories because they are rare. More often the computer does kinda what we want.
Or we have to synthesize a compound action out of the available actions:
Want to watch The Voice? It’s simple:
Select HDMI 2 on remote 1
Press power on remote 2
Navigate to Peacock
Choose The Voice
Oh, you want a recorded episode? Well, if you want that you need remote 3
OFFS (← acronym)
The mapping from Intention to Actions is hard:
You need to have learned the necessary actions exist
You need to remember the actions
You need to find the actions
You need to execute the actions
You need to be confident that executing the actions that you know about/remember/find in a particular order will result in your intention being satisfied
Any failure of the above will result in failure
Kids
Sidebar—Why do kids navigate the digital world so easily?
Learn—they have a social group that rapidly spreads knowledge of new actions
Remember—this is their native language, of course they remember all the actions
Find—they have steered through the action navigation mechanisms since birth
Execute—again, lots of practice
Confidence—again, lots of practice where earlier generations, even with advanced CS degrees, are only ever kinda sure they can make the computer do what they want)
LLMs
You know that things where you type “how do I turn off dark mode on my iPhone?” into Perplexity and it says, “easy peasy, go here do this go there do that”? And then you say, “Yeah please do that,” & Perplexity just stares at you?
An LLM is the always-available kid. It knows all the actions (kinda). It knows how to get to them all (kinda). It knows how to execute the actions (kinda). It knows (kinda) how to map from intentions to sequences of actions.
Just (“just,” he says, as if) remove those “kindas” & we have a compelling use for LLMs.
But in the meantime FFS (← another acronym) adding more & more actions is subtracting value from our computers. And the less the user is like a young computer science savant, the more value is subtracted.
> “I’m ready to throw my goddamn phone out the window. I spent an hour setting up a reservation & then the ‘server’ had an ‘error’ & I had to do it again.” I’ve been carefully watching non-geeks use computers & we are failing our users.
It's maddening. I wrote about the state of software back in 2019, and now it's even worse. GenAI is exasperating the problem. I suspect high quality software will become a unicorn in the coming years.
https://jeffbailey.us/blog/2019/11/09/death-by-1000-cuts/
Technology is broken. It's like Idiocracy IRL.
You should be telling your phone to turn off dark mode like Jean Luc Picard. "Phone (or whatever you told it to respond to), turn off dark mode" and have done with it. Likely, it would turn the lights on in the room you're in instead—provided you had a connected house.
But seriously, as a software engineer I understand how things can get so crazy. We have these general use frameworks as building blocks. Dependencies that aren't very dependable just because it's so complex. Networks of dependencies (on people and on technology). All these cells trying to communicate across subtly different or even wildly different contexts. Efficiency comes at a cost. Dare I say...a tradeoff!?