Ways of expressing the placement of objects in Augmented Reality

This is more a question than an essay — and the question is “what are good ways to express the placement of virtual objects in VR and AR worlds?”


Imagine you have a virtual puppy in a mixed reality world. The puppy wants to follow you around, explore, bury bones, interact with other puppies. It wants to be intrusive but not so intrusive that it causes you harm (such as while driving). We all have intuition about what we imagine that puppy doing — how it interacts with the rest of the world. But how do we express those intuitions? Should that puppy float in the air or should it stay on the ground? This seems intuitive but unless a designer can clearly express their intention the outcomes may not be what users expect.

Game Grammars

Video games often tackle these kinds of problems. A programmer defines a game grammar and then a designer designs a game experience with that grammar. A typical game grammar includes all of the game artifacts (monsters, treasure chests, traps and suchlike) and all their relationships. For example a typical game scenario can include rollover trigger regions which are wired up to bad monster beasties such that when a player rolls over the trigger the monsters are unleashed. On top of this are lightweight scripting rules that procedurally evaluate what to show, how often to show it, how busy to keep the game player. And as well the same kind of thing happens with the audio engine — which also does significant arbitration so that the game player hears the most important facts first.

A key element here is that multiple different independent entities or actors all have a shared understanding of the roles and rules, so that they can all play together well. They all have the same idea of priority, and the same idea of space and occlusion. There’s no one actor that suddenly occupies the entire field of view for the rest of the game duration. Any system we devise has to have reasonable collaboration by all actors in the system.


Interior designers of course face similar issues. Some rules are softer, design patterns, feng shui. There are formal rules for laying out signage in airports. I can even see capturing some higher level design intentions. Of course when one intelligent party controls the entire architecture this is easier — and controlled spaces are not exactly a good analogue. We’re looking for more of an ecosystem model where a reasonable balance emerges from well struck rules rather than the intensive cogitation ahead of time by thoughtful people.


In some ways what I want to do is have a model similar to HTML. The document object model (perhaps not entirely successfully) attempts to describe relationships between objects such that layout can succeed in a variety of target rendering environments. There’s also a focus on accessibility — enough intentionality is expressed that users who have limited vision can still navigate a web page.

Designing Rules for Dynamic Environments

One trivial way of expressing an objects position is to simply say “it is at these specified XYZ coordinates”. But that doesn’t really get to the heart of the matter. Although an object ultimately ends up at a position in space — what are the rules that guide that decision? Does it always require a person to hand-place every object, or are there “intentions” that be used to automatically place objects?

The problem is that virtual and real worlds are dynamic, and it’s even hard for a 3d reconstruction algorithm to always “get it right” — sometimes algorithms can get confused about exactly where the floor is or where a wall is (that they work at all is in fact slightly miraculous). Real world floors and walls can move, and do effectively wobble as far as the algorithms are concerned. And many different objects from many different authors are potentially competing for your attention at the same time, each clamoring with what it thinks is the highest priority. Of course that virtual puppy is the most important thing in the universe — what else could be more important?


There are a variety of kinds of use cases that I’d like to have solutions for. Here’s a few of them:

  1. Priority. You’re in one experience (say a game) but other information still needs to interrupt you. For example you’re walking around San Francisco with a Skyrim overlay — so everything looks a bit more medieval than usual. However you may still want to see alerts telling you if friends are nearby, or if there are potholes to avoid. What’s the right way for information objects to be able to take precedence? Should there be some kind of importance value, or some kind of relevance value?
  2. Placement. You want to place signage on a wall of your cafe to tell people the prices of product. You want to the signage to be pinned to a wall, to hang upright and to be at eye-level for every customer. How do you express this? In HTML there’s an idea of a DOM — a document object model — that expresses how objects relate to each other. Something similar may be needed.
  3. Relationships. You want to decorate an existing virtual object with another virtual object — for example to put a mustache on a virtual statue. In this case it’s not so important that the object be pinned to a place in the real world, but more important that it stay pinned relative to some other virtual object.


I imagine something like this emerging as virtual and augmented reality grammar for describing object placement:

  1. Anchors. Today mapping solutions such as ARKit and ARCore already return semantic hints — they can tell you that surfaces exist and as well if a surface is a floor or a wall. They can also tell you exactly where the viewers head pose is (which is useful for billboards for example). Anchors are a key concept — which connect between real and virtual worlds — and this is a concept already supported by ARKit and ARCore. Here I imagine taking the idea slightly further and calling it a first class object in a grammar for describing a virtual space. These objects could be created dynamically whenever a given image exists or specified statically at a geographic location in space or a ray intersection with a surface from a starting point in space. They would act simply as place-holders and could be shared with any virtual world’s data set.
  2. Declarative Intentions. This could be quite a long laundry list and should be separated into declarative intentions versus those that require custom logic.
    -Declaratively it seems reasonable to specify if an object should be bound to an anchor, a floor, a wall, a ceiling.
    -Declaring if it should be a billboard (face you) makes sense too.
    -Declaring if an object is relative to another object or to a feature in the real world.
    -Declaring how much you care about a specific location in space, or if it is simply visibility that is important.
    -Declaring if an object is “related” to the real world or just intended to provide heads up context. (For example in a heads up display you may wish to have a radar map or a todo list in the corner of your field of view).
  3. Procedural Intentions.
    -Procedurally you should be able to do some work to further qualify an objects placement or visibility. For example an Open/Closed sign may only be visible at certain times.
    -Procedurally objects presumably can interrogate the world and adjust their placement as needed — in the case of a virtual puppy it will want to navigate flat surfaces, and perhaps choose to stay on what is considered the “floor” or low lying tables or couches.
  4. Prioritization. Each object is associated with an “author” or an “emitter” application. All of those authors are scored based on your interest level. For example if you like the Huffington Post then objects created by the Huffington Post get scored higher. Scoring can be a weighted contextual graph, so that objects scored by subsidiaries of the Huffington Post also have a higher chance of being visible. If you’re playing a game (are focused on that context) then those virtual objects have a high priority. Overall there’s an undefined black box algorithm that scores objects and the results are sorted and the most visible ones shown to you. It could be thought of as a series of dials, where one can control focus or interests. This could be one of the hardest parts to get right — you may even have a serendipity dial in order to avoid filter bubbles.

What am I missing?

SFO Hacker Dad Artist Canuck @mozilla formerly at @parcinc @meedan @makerlab