I’ve been using the Vision Pro now for almost 3 months. It’s a neat machine, crammed with technology and party tricks pulled from the future. Yet, after the initial hype, its software and hardware limitations have become apparent over time. It still hints at what AR/VR could become with a couple of years’ worth of iteration and focus[, though it has run into the familiar problems of cold-starting an ecosystem and limited utility.
Right now, the comparisons to the iPad are valid and largely fair. With the tablet’s initial launch, Apple highlighted its productivity capabilities alongside its consumption and entertainment value, and put in the effort to port over and release its iWork office suite on day one. In the subsequent 1½ decades, while the hardware evolved to become even more minimalist—now just a glass screen, aluminum frame, and a notch for the camera submodule—the software steadily advanced. The iPad line forked iOS to create a parallel operating system lineage; multitasking was worked and reworked; the team eventually accessorized by adding support for hardware keyboards and trackpads.
The Vision Pro may well follow the same playbook. The same iWork office suite apps were ported over and available for launch; basic windowing is supported but multitasking and window management controls are lacking, and expected to come with future versions of the OS; keyboards and trackpads1 connect via Bluetooth and work natively within VisionOS. Right now though, its claim as a productivity device is just as weak as its older iPad cousin. Sure, people have found ways to make their iPads operate as a work-capable device; the instructions and descriptions inevitably come with a laundry list of caveats and hyperbolic enthusiasm. Most of the users who are finding their Vision Pros productive are wholesale projecting their Macs into their displays and leveraging VR as a giant 2D screen streaming macOS.
That people are streaming their personal computers is telling: it remains the gold standard in productive computing, and that positively ancient, legacy platform running on laptops has the advantage resulting from half a century of refinement for the workplace. It’s instructive, then, to dig into what makes computers so damned useful, and how a VR system can emulate that utility, if not copy the interface outright. I broke it down into outputs and inputs.
On the outputs side, Vision Pro makes its strongest output case with its implementation of spatial computing—what VisionOS enables by projecting its camera imagery into the headset and projecting virtual elements into the space around its user. This allows for arbitrarily-sized floating windows, as many as you can fit and manage within your simulated physical space. When the Vision Pro was initially announced, people dreamt of carrying it into hotel rooms and surrounding themselves with screens, Evil Overlord-style.
The software is hampered by the lack of utilities to manage, arrange, and handle all this virtual real estate. This is a pretty obvious omission, though, and should come about with iterations of the OS. Some of my favorite computing workplace management systems over the years—macOS’s Dock; Windows’s window snapping and auto-arrangements; Android’s notifications system; iOS’s widgets implementation—don’t yet exist in VisionOS and few 1st or 3rd party alternatives exist2. An infinite desktop is only as useful as the mechanisms available to organize it.
There’s also the issue of flexibility and interoperability across apps. Think of tasks like attaching photos to emails, or embedding a spreadsheet in a presentation; apps specialize in their data types, and most productive work requires the comingling of data to create the final output. Here, VisionOS builds off the interface norms developed for iOS, which feels limiting given we’re not constrained by small phone screens when in fact—we have lots and lots of screen.
The scoreboard gets worse on the inputs front.
To be clear, it’s a monumental technical achievement to control a full computing interface with your eyes and a small handful of hand gestures. But it’s a bit like the iPad insisting on only its touchscreen at launch, stubbornly clinging to its minimalism despite the need for more precise and tactile input. All the added keys and buttons on modern keyboards and mice add different mechanisms to communicate to the device what we want to do; it’s a big reason why all the fancy mechanical keyboards, meant to enhance the feeling of tactility, are also highly programmable and customizable.
For the Vision Pro today, the lack of input precision makes it unusable for work, at least with its native control method. The eye-tracking is the best commercial implementation available, but it pales in comparison to a high-DPI mouse or even a functional trackpad in accuracy. In practice, I often miss targeting UI elements within the OS with my eyes, even after I pull the window real close so there’s a larger area to target. Pinching fingers to click works reasonably well, but as an analog to swiping on touchscreens, the pinch-and-drag motion is awkward and prone to error. Text input is probably the worst of all worlds: the onscreen virtual keyboard lacks any tactile confirmation, so registering key presses is slow by design, while the lack of spatial consistency makes it impossible to develop sufficient muscle memory for faster typing.
But at least VisionOS supports external keyboards and mice to alleviate some of these issues. Using external accessories, though, highlights a more fundamental shift in interface paradigm: it’s slower to use your eyes as a focusing device. With 40+ years of computing with multiple windows and applications/programs, it turns out that decoupling looking at something from controlling something is pretty natural and allows for natural multitasking, or at least controlling multiple apps at the same time quickly. Using optical focus as computing focus makes the interface akin to a series of singular, full-screen apps; possible and perhaps even encouraged for novice users, but awkwardly slow.
All that said, the Vision Pro presents a big technological leap in hardware which is what makes this line of thinking possible in the first place. Most VR headsets have their hands full just rendering low-resolution screens and apps; the Vision Pro has enough screen resolution and compute to spare to present a full-fledged, albeit nascent, operating system. I wouldn’t be surprised if it’ll eventually take multiple tries to discover the right interface to make the most out of this hardware, but whether they even get there is…well, a bit up in the air.
Only Apple’s trackpad is supported for now, though.↩
Well, someone did build a <a href=”https://github.com/kjwamlex/SpatialDock”>Dock app</a> that you can sideload yourself via developer mode.↩