Augmented Reality Archives — Jason Michael Perry
  1. Welcome to the metaverse

    When Mindgrub announced our move to the metaverse, we wanted to explore the many sprouting virtual worlds and determine where to plant our virtual roots.

    The way people talk about it, the idea of a metaverse sounds like one world or one place you can enter to access a broad land of virtual content. The truth is: there is no singular virtual world (yet). Futurists imagine that instead of one world, the metaverse may parallel the internet. It might reflect a network of worlds that allows all of us to enter and leave at a whim. Facebook, now Meta, believes it has what it takes to help create that future. I’m not sure if their vision will win, but it’s helping us all see what the end could be.

    Suppose you find yourself, like me, office hunting in the virtual world. In that case, you quickly learn that what exists now is a patchwork of siloed communities, each with different levels of immersion, rules, and financial expectations. In many ways, these worlds’ quirky ideas and explorative nature give a feel of a new frontier to explore – reminiscent of the early days of the internet. Just the heart of this leads to one hugely important question.

    What is the Metaverse?

    Let’s start with the Wikipedia definition:

    “In futurism and science fiction, the metaverse is a hypothetical iteration of the internet as a single, universal and immersive virtual world facilitated by the use of virtual reality (VR) and augmented reality (AR) headsets.[2][3] In colloquial use, a metaverse is a network of 3D virtual worlds focused on social connection.”

    This definition limits what many may see as the actual metaverse. An immersive virtual world does not require virtual reality headsets – levels of immersion can happen on a computer in a networked 3D world. The foundation of the metaverse was built by communities like Second Life. Second Life is primarily known as one of the first virtual worlds with an expansive economy and a vast set of communities. In 2003, it was a pioneer, giving anyone connected to the internet a place to create a new life to live and explore. Many have spent thousands of hours immersed in this world. The key to that experience is being immersed, which defines a metaverse. I describe the metaverse as:

    An immersive network of interconnected worlds or communities commonly accessed through devices such as a phone, computer, and a virtual or augmented headset. These worlds can be used for dating, fun, social connection, work, or recreation.

    This definition better encapsulates what currently exists and what is possible. The key to this definition is immersion. Imagine using immersion on a scale similar to the six levels of vehicle autonomy:

    • Level 0 (No Driving Automation)
    • Level 1 (Driver Assistance)
    • Level 2 (Partial Driving Automation)
    • Level 3 (Conditional Driving Automation)
    • Level 4 (High Driving Automation)
    • Level 5 (Full Driving Automation)

    Vehicle autonomy is a scale that differentiates cars by their autonomous driving abilities. Having such a scale allows the US Department of Transportation to define better rules and regulations for a car based on how autonomous a person should expect a vehicle to be. On this scale, a level 5 vehicle would no longer need a driving wheel – it is so autonomous we can depend on it to handle all driving conditions and focus our time watching a movie or relaxing.

    These rules allow us to acknowledge the foundation of autonomous driving and see what the future will bring us. Many of today’s US cars, including a Tesla, come standard with technologies such as adaptive cruise control, parallel parking, blind side monitoring, and lane assistance, all of which rate as level 2 features. A scale like this also lets us pause and see how much technology has evolved in a few quick years while realizing that massive chasm of technical intelligence needed for us to move from a level 2 vehicle to a level 4.

    The six levels of human immersion

    If we keep those same six levels of vehicle autonomy in mind and use them as a template for the software and hardware that enables the metaverse, we get the scale of immersion:

    • Level 0 (No Augmentation)
    • Level 1 (Device Augmentation)
    • Level 2 (Augmented/Mixed Reality)
    • Level 3 (Virtual Reality)
    • Level 4 (Physically Immersive Virtual Reality)
    • Level 5 (Full Mental Reality)

    Level 1

    We as humans exist with no augmentation or connection to any reality but that we can see or imagine. We step into level 1 immersion with the assistance of a device, think game consoles, laptops, or phones. Each of these transfers you into an immersive land. Get lost in Second Life, World of Warcraft, Minecraft, or Roblox? Lost in the scrolling feeds of TikTok, Instagram, or Snap? These worlds exist now and function with whole economies, social interactions, rules, and regulations. Level 1 immersion tends to focus heavier on sight with the option for advanced audio. On a scale of immersion, it requires concentration and, sometimes, our imagination to remove our existing reality and truly feel enveloped.

    Level 2

    Level 2 devices connect us with a virtual community as an overlay of the real world. It can overlay the virtual or contextual information in the real world. The first notable example is Google Glass, which allows you to overlay directions or store reviews over the real world while looking around. It also expanded our idea of sharing by imagining the ability to let one truly see your viewpoint. Other older successes include games like Pokemon Go that use a phone’s camera to meld the Pokemon world with our own. Additional credit for products like Microsoft’s HoloLensnreal’s AR glasses, and Snap’s glasses.

    Other fringe devices in this space include AR Drones and game consoles that require physical toys to interact. These devices are less about the immediate plane but still invite a user to connect to reality in a different and more immersive way.

    The rumor mill continues to circle on an Apple device targeted to this level of immersion. We can only speculate what Apple may bring to the table, but the idea of contextual visual interfaces that evolve on Google Glass seems probable. In recent years Apple and Google have incorporated LIDAR and other stereographic sensors into their devices mixed with developer-friendly tools such as ARKit, making level 4 devices easier to bring to the masses.

    Level 3

    Level 3 requires a virtual reality headset that masks a person’s vision and, optionally, hearing, immersing them in a new world. A clear sign of level 3 is a device that attempts to remove you from your current reality as much as possible while offering an interactive and immersion experience. This means a device should allow interaction through head tracking, hand tracking, and an external gamepad. At level 3, a person should feel as if the sense of sight and hearing have been transported into a different world. Popular devices in this category include the Meta OculusPlaystation VR, and HTC Vive. Many of these devices may quickly move between an augmented (level 2) and level 3 world. Until recently, level 3 VR has mainly been a space for immersive games like Half-Life: Alyx and impressive demos, but the pandemic sped up the development of social spaces, games, and work environments for VR but many of these are new and early. In social, some notable names include Meta Horizons,, and . For work Spatial IO, and, . For games Roblox, , and.

    A tiny segment of the population still regulates virtual reality. Few have regular access to it, so the possibilities and impacts remain largely unexplored. Our content consumption is essentially 2D; for all the visual advances in movies and television, we still look at a 2D plane and primarily rely on audio to create the feeling of 3D immersion. VR changes the idea and opens the world of storytelling up into a different and much more immersive experience. A horror movie no longer directs you, but your experience and fear may change based on how you orient yourself to that world.

    Level 4

    Levels 4 and 5 often feel like a dream but are much closer than you realize. Level 4 devices must trick three senses. These often focus on sight, hearing, and touch, immersing your body in a different world. Level 4 devices commonly track movement to allow users to move around an environment or feel vibrations and feedback. The Meta Oculus Quest and Quest Pro are notable for allowing you to define a boundary and physically walk in those confines but mask this virtually to give users an infinite playground. CES is always great to see the many level 4 devices that take this idea further. Devices like an immersive body suit or glove transmit the feelings of touch or impact; walking devices allow a person to move, walk, or run in place; or even a rollercoaster.

    Those examples give a good taste of what is possible in level 4 devices and the amount of equipment needed, which makes it out of reach for many homes. At the same time, the technology gets more portable, and arcades, art exhibits, and other experiences open with immersive level 4 options. A new chain with locations in many major cities opened with arcades that offer real-life arena games, including virtual laser tag. The difference is in these worlds, you play with real people and feel the impact of others shooting at you. One experience I hope to try combines a satellite with sensory deprivation tanks to simulate floating in space.

    Level 5

    At the peak of our imagination and scores of anime like Sword Art Online is level 5 immersion. Level 5 requires an immersion that tricks every one of our senses sight, hearing, touch, taste, and smell. Imagine the ability to travel to a distant country and smell the countryside while tasting the food. That is the true pentacle of an immersive world – a place nearly indistinguishable from our reality. Much research and development is needed for level 5 immersion, but a surprising amount is coming from technologies focused on accessibility that have recently begun to converge with big tech. This technology is also further along than many realize it is.

    Researchers have worked on robotic implants, hearing technologies, assistive sight devices, and brain control for decades. Some of this has begun to merge into products for the everyday consumer. Apple, for example, offers AirPods the ability to alter our external audio or magnify it, similar to hearing aids. It has a watch using sensors to detect the movements on our fingers (or the muscles attached to them) to allow assistive touch control options. Elon Musk has a company called Neuralink that enables primates to play pong with their mind using brain implants.

    How immersed are we?

    Using our definition of the Metaverse, Mindgrub wanted an approachable environment that allows anyone to interact without the need to invest in a headset or other hardware. For Mindgrub, the environment we invite users to should embrace the best immersion possible without requiring more than a laptop or a phone. Accessibility on the run or while traveling feels like something that should be essential in an office environment. I believe that any true metaverse needs to be hospitable to varied ways of connecting. An individual should be able to cross many, not all, levels of immersion. Think of our current world, I may invite you to a Zoom, Amazon Chime, or Microsoft Teams meeting, but does that exclude you from connecting with a phone call? You may have a diminished experience, but as a tool, it includes and allows folks to connect how they can versus excluding.

    True magic happens when different levels of immersion mix. Each device offers a different perspective on the world and allows users to interact differently. Gamers may think of this as an MMO that allows PC users to play with console users (Playstation, Xbox, Nintendo Switch, etc.) – a keyboard and mouse can be very precise. Still, a joystick or gamepad can sometimes allow for faster movement. In the end, we have the same meta realm but different ways of interacting through the view of other devices. Of course, this mixing can be confrontational. It is weirdly believed that the precision I mentioned from a mouse and keyboard can offer an unfair competitive advantage to users on a gamepad.

    Regardless of strengths and different levels of immersion, we, as users, always judge our environments to determine the best option for our needs. A TV is far superior for watching a movie or a quick how-to video, but a phone’s ability to watch a video anywhere often supersedes the best experience.

    In our vision of the metaverse, this is crucial. It also leads us to three rules on what we expect for how we use the metaverse:

    • It should target the level of immersion that best deliveries the message
    • It needs to support devices from many levels of immersion
    • It should feel simple (and ideally effortless)

    The most important of these rules is #1. Much of the angst and unhappiness with some of Meta’s ideas around the metaverse is that it ignores looking for the right technology for the right message. When I call my parents, it is not to interact with my mom and dad’s virtual avatar but to connect with them. For those who FaceTime or Video chat over a voice call, it is to create that intimacy you can only imagine in person. We want to see each other and know that you look and feel well.

    Google has a fascinating research project that attempts to recreate a 3D physical visual representation of a person if given the option. That would be my favorite way to converse without having that person with me.

    Virtual reality allows a whimsical possibility that would not replace a video chat but will enable me to show ideas in a way I could not before. Some of the Disney Lucas art ideas of creating a world in a world are some of the most amazing things I have ever seen. How can we imagine a world of 3D cad printouts in 2D when a much better medium now exists to create this?

    Mindgrub’s new office

    Ultimately, we learned a lot and had to reimagine what the metaverse meant to us as a company. We did that by coupling our reimagining of the metaverse with our understanding of the vast landscape of existing technologies and tools.

    The number one rule we came back to is to target the best immersion level for the message we need to deliver. In many, many cases, some of the technologies – while aged – that we currently have done that very well. We use Slack, Zoom, and Google Workplace and find these to be ok – not always great – but ok tools for a lot f our work. These tools are not going away, and until something better comes along will not change.

    We quickly fell in love with Mozilla Hubs, an open-source world that can create around a well-backed and robust platform and framework freely. A structure that balanced openness by allowing you to produce a world and host it independently. This openness allowed us to let people experience one world using different devices of varying levels of immersion. It also allowed us to test the boundaries by pulling zoom, slack, and other of our go-to ideas into a place we could control and further iterate.

    For Mindgrub and me, the internet and the interconnected virtual realm are essential. While many technologies make up the tools we use as the foundation of the internet, we can move from website to website regardless of what was used to construct them. Hubs align more with web domains, each web domain or URL representing an independent world.

    I find Hubs kin to modern development frameworks (like Drupal, WordPress, or Node) – a solid bedrock with low-cost or free hosted tools to create an environment but with the ability to venture out and design and develop something different while rifting on the ideas of the others. With hubs, we can build a module or plug-in and decide to make that available to the more incredible world freely.

    I’m sure we will all have much more to say as we begin our adventure but in the meantime, feel free to visit and check out our “lobby.” If you like it, stay a while – VR headset optional.