OpenVerseLab: a Testbed towards an Open Metaverse

8 min readJul 11, 2023

OpenVerseLab: a Testbed towards an Open Metave

Neil Stevenson coined the term Metaverse to describe a virtual reality world where participants engage in life alternative to their physical bodies with avatars, and has since captured the imaginations of many.

The term was only known mostly to the scifi and virtual world communities until in recent years, when affordable VR headsets and the emergence of blockchain economy, suddenly made the prospect seem real, including Facebook purchsing the leading VR headset company Oculus and even renamed itself to Meta.

Many people are well aware and concerned of a Metaverse developed and controlled by a single company entity, which also appearantly goes against the spirit and architecture of the Web and even Internet itself. Hence many efforts exist in building a more open, decentralized, and interoperable "Web3D" or virtual world systems.

At the same time, not necessarily engaged in a shared Metaverse goal, researchers in the multimedia and virtual world/virtual reality (VR) systems have often come up with innovative works and concepts, that seem to be good components for the Metaverse, yet due to the sheer effort of getting any practical systems to work, they often cannot get beyond the research prototype stage. Without major corporoate sponsorship, advances towards more accessible adoption is often slow.

We observe that while incremental advances in various related subdomains have been interesting, to accelerate to the next major breakthrough appears to require a more concerted and intentional effort.

While such effort can and has been put into motion with major coroporate investment and / or government projects, major long term investment by either the government or private sectors have become less likely due to both general economic downturn and major uncertainties of the payoff.

Without major intentional backing or investment, will we necessarily be stuck with slow progress towards a shared and open Metaverse? Are we waiting for the hardware to be more matured and accessible, or are we simply waiting for a big Metaverse owned by a private company to emerge?

We would like to offer an alternative possibility: an open Metaverse run and operated by independent parties of different concerns, who may not be necessaily focusing on the same goals or values, yet can seamlessly collaborate together due to a common set of rules / tools.

Such models have already been running successfully in a few domains, including the PlanetLab environment for network researchers, the Bitcoin network, the World Wide Web servers, and even the Internet itself via routers.

What this has inspired us is that as long as a common set of protocols are adopted for serving a specifc purpose, scalable and decentralized systems can be built relatively easily and quickly. Demands for the network utility alone will be enough to drive impressive progress and growth.

Here we present the design and rationale of OpenVerseLab, a framwork and protocol to allow mutimedia and virtual reality systems to become "interconnected and interoperable" by abstracting and supporting the "control of interactions" within a Metaverse context.

We believe that once people realize the power and positive possibilities of collaborating with others' unique ability on an open platform, progress will start to grow exponentially, as has been demonstrated by the network research communities, and open source communities.

Below we wil describe the design and implications of such a system, as well as its future outlook and roadmap.

Design

fully peer to peer
PlanetLab, WWW, and the Internet itself all have an important characteristic: nodes responsible for computing or storage can be added or removed from the network at any time, without hurting the functionality or service for the overall system.

There is also technically no centralised authority to provide permission to join or leave the network. For admin purposes there are authorities locally to handle the join/leave of nodes. This is known as a peer to peer or fully distributed architecture.

Peer to peer systems typically have the properties of being more scalable, fault-tolerant, and also more affordable, as not too powerful or specialized hardware is required to run such systems.

We would like our design to be fully distributed peer to peer, from an architecture point of view.

separation of control & content
In multimedia systems such as video streaming, and AR/VR systems such as games or virtual tours, the data can be roughly distinguished to two types: control data and content data.

Control data is often smaller, and represents meta data related to the actual content, for example resolution or bit rate for video streams. These are typically data that has to be sent correctly and reliably, or the resulting rendering/presentation of the media content would not be correct.

Content data on the other hand are typically larger in volume (such as textures, or a 3D point cloud representation of a person or object), but may also be more tolerant to data loss.

Many diverse forms of data format and structures have been developed over the years, and for good reasons: multimedia is a field of diverse representations, the ingenuity of the researchers partly lies in how to design ever efficient representation for both control and content, and it will be both unrealistic and limiting to think that we can design and single format to rule them all.

That been said, without a common agreement for the data structure or protocol, we cannot create a system that allows interoperable media formats to exist on a common test bed, and each systems will continue to be their own little islands of innovation.

We propose to address this challenge by at least partially agree on two things:

1) separate the transmission of control data from content data, so that we do have a common communication channel for all the critical meta data that has to be transmitted reliably (for the most part).

2) require the control data be communicated through a publish/subscribe (pub/sub) layer, so that at least the discovery and interactions among various media entities / objects can be standardized and be made interoperable. We still leave the decisions of how the actual delivery of content to the designers of various media content.

spatial pub/sub (standards-based)

Pub/sub is a simple communication concept that sits behind many messaging systems. From the BBS from early days of the internet, to tech giants such as Facebook and Twitter, the behind the scene mechanism is some flavor of pub/sub.

Pub/sub asks the sender of data send to an abstract channel, often represented by a name, and require receivers to subscribe to certain channels of interests. Thus both publishers and subscribers do not have to know or be aware of each other, and also do not have to record the relations between senders and receivers.

It reduces complexity when designing communication systems, and keep the messages sent limited to relevant parties only.

While channel-based pub/sub is sufficient for video streaming such as many people watching for a sports event, for AR/VR systems where user interact within a map or spatial domain, having only channel-based pub/sub is insufficient.

Many research has gone into how to most efficiently partition the space/map so that channel assignment is optimal, this is not in general a generic solution as many different use cases exist.

We acknowledge this requirement for spatial interactions and propose to utilize a "spatial publish/subscribe" (SPS) layer instead.

SPS extends the concept of pub/sub to areas, where receivers subscribe to areas, and senders publish to areas. When the two overlap the message is delivered.

This maintains the simplicity and power of pub/sub systems, while perform effective message filtering for applications where space or map is involved (games, AR/VR apps, and location-based services such as Uber or Google Map).

For the purpose of having an interoperable, standards based system, we propose to leverage existing pub/sub standards such as MQTT, and extend its functions to include spatial pub/sub.

Implications

comparisons
There have been various attempts at constructing inoperable multimedia and/or AR/VR systems, and we identify two more well-known cases to see the differences between our proposal and existing work that also aim at openness and interoperability.

Web3D
As its name implies, Web3D aims to support 3D content and it is based on web concepts and standards, there also have been well-known counter parts to the data/protocol duo HTML+HTTP in the form of VRML+VRTP. However, due to the inefficiency of both the data design and protocol, industrial use has been limited, even academics would prefer to develop new approaches based on more widely adopted video streaming standards.

The main difference between OpenVerseLab and Web3D is that we do not impose limit on the data format or even the protocol, except for a narrow range of protocol for node discovery, thus allowing more diverse types of applications be integrated on top of the framework.

OpenSim
Second Life was one of the first more widely known virtual worlds that supported a content economy. It gained traction by allowing players create sellable 3D items and have even created real world millionaires.

The company open sourced its client program in hope to gain community improvements, and a reverse engineered server was built by the community called OpenSim.

Given the more matured and sophisticated client functions, OpenSim allowed virtual worlds of more complexity be constructed in an open environment.

As there are specific scenarios the system aims to support, the protocol is restricted on virtual world/Second Life specific interactions, while other types of multimedia systems may not be easily integrated onto the system, as the rendering is assumed to be done by a Second Life compatible client.

OpenVerseLab's main difference with OpenSim is that it is designed for interoperability not just for Second Life's specific 256x256 meter grid-based virtual systems, but rather aims to accommodate a wide range of multimedia or virtual world systems, it thus aims to impose as little restrictions as possible, and only support node discovery as the absolutely minimal that all rendering systems must adhere to.

HLA
High-Level Architecture (also known as IEEE 1516) is a standards based approach to ensure simulation models and interact with each other. It was originally designed and developed by the US Military for simulator interoperability to reduce waste and maximize the usage for various military simulators.

It defines more on the interaction part rather than the presentation and rendering and leave that open for individual simulators to decide.

Our proposal thus matches the most with HLA in design philosophy, with the main difference that we aim to make adoption easier and simpler than HLA.

As HLA was designed as a standard to be complied and adopted by defense contractors, there are many aspects and details that require a steeper learning curve for a typical researcher or grad student.

One aspect of HLA that deals with interactions does have the spatial publish/subscribe concepts and thus our proposal can be seen as a subset of the HLA framework that aims to be more easily adoptable by the wider research community for building prototypes.

Rendering clients
One of the arguably most important component for a multimedia/AR/VR is the rendering client, where the rich experiences from multiple media allow a user to engage in deep interactions and immersion.

The client may consist of different types of hardware/software combination, as well as that of different capabilities in terms of the rendering effect.

Due to various network or computing constrain, a good multimedia system is often designed with this in mind, and offers smooth degradation of rendering results given the device/client capabilities.

We recognize that there are a wide range of clients and rendering options, yet a full treatment would be beyond our current scope.

We do like to point out much like the WWW design, we want the content and presentation be separately handled. This ensures that a client may be free to choose whichever and however its received data will be rendered (or not displayed).

Written by Shun-Yun Hu