Thoughts on the Apple Vision Pro, spacial computing and new layers of reality

Published on August 7, 2023

I’ve been noodling for quite some time on what it would actually be like to wear a AR / MR / XR headset on a daily basis, and what kind of experiences it would enable. After Apple announced the Vision Pro a few weeks back, I finally decided to spend some time arranging my thoughts (hopes?) for what kind of experiences we might eventually see if XR headsets see the kind of adoption in the next decade as smartphones have seen over the last two.

I’ve arranged my thoughts into 4 'layers' with the first layer being directly in front of the face and the last layer being ‘mapped onto physical reality’.

Layer one

The first layer is what I would call ‘personal’. What some would refer to as the ‘heads up display’ or HUD, this is the layer of data that’s always a fixed distance from your eyes, and a fixed angle away from your center of vision. Think of it like wearing a space helmet where you can pin screens to the surface of the helmet. These displays will move with your head and unless they’re really important, always land in your peripheral vision. In most HUD applications, these display elements have some transparency and persist in your visual field all the time.

Layer one experiences

To be honest, anyone that really and truly needs information in a layer one display probably already has a HoloLens and is using it in some kind of high risk industrial manufacturing context. However, Apple Vision Pro is about to enable a whole wide range of “could do" scenarios that will prompt future generations to ask if they were “should do" scenarios.

Notifications from VIPs: If my wife sends me a text, I want it scrolling across the bottom of my field of view until I dismiss it.
Security cameras: When an unfamiliar face is detected on a security camera in my home, I’d want the feed to slide into my peripheral vision immediately.
Grocery list: Although I’ll pick this scenario back up in another layer, I can imagine that in certain contexts like grocery shopping, I’d pin a translucent grocery list in the top right corner.

The data that’s here is somewhere between what you want on your smartwatch and your smartphone.

Layer two

The second layer can be thought of as your “workspace”. This layer is not moving with your head, it’s one or more configurations of display windows that are movable and adjustable, but essentially get instantiated based on your body position. This is everything from a 5 inch screen at half-arms length to a 13 inch screen on your lap to a 100” screen across the room. In a non-XR context, these are all functionally the same experience because they cover the same amount of your field of view. Try sitting on the couch and holding up your phone up at arms length between your eyes and the TV. It probably either covers or nearly covers your TV. The Apple Vision Pro theoretically enables an infinite number of screens at any point between those two extremes. The question becomes ‘how many of these screens do you want (need?) floating in space at once, and what will you put on them, given infinite screens?

Layer 2 experiences

In many ways, this is the easiest (and most boring) layer to imagine because it’s capable of replacing most of the screens we’re already familiar with (with some massive caveats of course).

Outlook, Excel, Word, Web Apps: All your favorites will be there floating in front of you, only you’re no longer limited to looking at one at a time. This is the moment you went from 1 screen to 2 screens (or from 2 to 4), taken to the extreme of “sitting inside a sphere of TVs.
Livestreams: because everyone’s hyped to watch bears or kittens or something cute out of the corner of their eye.
Chats: so many chat windows everywhere…

The downside here is that someone in the room with you, but not wearing a headset, can’t see anything you’re doing or working on… the trade-off is that if someone does have a headset, they can be anywhere in the world and share a workspace with you.

Layer 3

The third layer is what I would call the ‘environment’ layer. This is where it gets exciting and the part that Apple didn’t even hint at from what I’ve seen of WWDC, but seems well within reach based on what their hardware seems capable of.

In Layer 3, the headset is capturing, rendering and positioning you within a digital twin of the space you’re occupying. This is where you start to get the sense that your physical home is occupied overlaid with digital assets.

Examples of Layer 3 experiences:

Hang a virtual copy of the great wave painting on the wall in my dining room.
Replace the floors in my house with marble.
In the hallway where my thermostat is, put up a dashboard that shows my home’s energy usage and smart home controls.
Instead of this window that looks out on my neighbors backyard, I want to see an immersive 3d livestream of Huntington Beach.

Because this layer is tied to physical places, we can also think of it as persisting even when we’re not present in space, or present in the space but not wearing a headset. Because of this dynamic, this layer will also have to introduce the prospect of audiences and privacy. If a family member virtually visits your home, they might see your latest artwork on the wall or a bulletin board. However, if a colleague visits that same space, you might choose to curate their experience to see your shared whiteboard on the wall and the latest iteration of a 3d model for a project you’re working on. And maybe when a stranger jogs by during their morning workout (from their treadmill, at home, or physically on your sidewalk, viewing your home in AR via their iPhone) they might see your recently caught Pokemon frolicking in your front yard if you chose to publish that data. In this layer, things like your Instagram feed, your personal blog, or newsletters can become 3d assets that are physically embodied in spaces that you curate for specific audiences.

Layer 3.5 Experience (Facetime to the max)

Another experience that, based on what we’ve seen of Apple Vision Pro so far, seems like it should be well within reach, is a true telepresence experience. The camera and sensor array on the Apple Vision Pro is already capturing the space around you and creating a 3d model of the space you’re in so that it can appropriately render AR effects. That realtime capture can be used to create a static model of the space, possibly even identifying individual furniture pieces and replacing them in the model with 3d assets from the manufacturers. This space model can then be shared with the person you’re Facetiming with, enabling them to move their avatar within the digital twin of your environment and occupy that space in a natural way. Instead of talking to a Memoji avatar floating in a window, that avatar could be sitting in the chair across the room from you. And on the other end of the call, the person you’re speaking to might have ‘full VR mode’ turned on so that they only see your environment, with your avatar positioned within it. This could be the killer app for the Apple Vision Pro.

Layer 4

The fourth layer is ‘dynamic, ambient and realtime’. This layer has the potential to rewrite our experience of the physical world as it happens (for better or for worse), and I don’t think we’ll truly see this come to fruition until headsets or glasses go from an ‘occasional, experiential, use it in a specific setting’ type device to a ‘more often than not, I have this on my face’ type of device.

Layer 4 experiences:

When I see a person who’s face is associated with someone in my contacts list, overwrite their real face with their Memoji and float their name and pronouns above their head.
When I see a billboard for a scary movie, replace it with a cute picture of a cat.
Make all the city buses into Totoro style cat buses.
Let nearby Pokemon pop out of the environment naturally and have Team Rocket Grunts show up in public.
If one of my friends put digital graffiti within a quarter mile of my current location, show a pin on the map in my Layer 1 heads up display.
Put my grocery list items on a heads up map of the store I’m shopping in.

This layer becomes a opportunity for creative expression and sharing, with people producing new filters and layers for reality, or creating experiences that are tied to specific locations, etc. (I’m not oblivious to the reality that all of these layers are ripe for privacy abuses, data leaks, and all manner of trolling, bullying and other bad behavior, but I’ll leave that thought experiment to someone else out there.)

I wrote this post to collect my thoughts and also to avoid blathering on to my friends, family, coworkers, pets and random people in public about what I think the future of computing could look like. If you’re into this kind of stuff and want to talk more, hit me up in the comments below.