Future Tools for Story Telling in XR

“If you want to make an apple pie from scratch, you must first create the universe.” — Carl Sagan’s Apple Pie Recipe.

The Web and Mixed Reality go together like chocolate and peanut-butter. They’re a tasty combination; you get instant world wide audiences, censorship resistance, access to a fascinating new and delicious medium.

But tools are sorely lacking. And that needs to change.

In this article I’m going to explore, somewhat for myself, and somewhat for you dear reader, what stories in 3d could be like with the right tools.

What is a story?

Is it spring again already!? Hmmm it’s still slightly overcast; lush with dew. The air is silvery. It feels like winter. Yet, despite this, tiny pink petals mass in our hair — a coronet — and laughter as we shake them loose for the best possible Instagram photos. Crooked cherry trees in fragrant rows before us. To the right you listen and hear a drawn out squeal of train tracks where metal beasts promise to carry you away.

It does seem like each spring each of us again becomes poets; we crawl out of our caves and emerge blinking into the light as it were. Spring seems to bring out that very human cacophony of trying to reproduce what we’re feeling — almost as if it isn’t real until we make excited sounds about it.

Above we have both a photo, and a piece of text relating my own experience walking through a cherry orchard. The experience itself is impossible to capture. But the representation, the drawing, or sketch is an artistic attempt to capture at least some essence.

The story is a relationship between a viewer and an audience. It’s something we’ve done since time immemorial; and something we continue to do with new technology.

By way of comparison here are some other brief poignant stories created with the ancient technology of Haiku:

crooked and thin
our cherry tree bunches
exotic blossoms

[Richard Pettit — Nordborg, Denmark]

Pacific winds —
a cherry petal blizzard
turns the park pink

[Ian Storr — Sheffield, UK]

picnic for two —
a fallen cherry blossom
in both cups

[Graham High — Blackheath, UK]

And here is a visual riff that somebody was moved to create using newer mediums:

Here is yet another riff of the same idea — a wonderful blend of Augmented Reality and a real cherry blossom orchard:

If we look at Instagram or other social networks we’ll see many other attempts to capture and share this experience. In fact we will tweet and trill absolute torrents of images and short videos. One can imagine a bemused planetary AI watching us create Instagram photos, enough to drown the whole world a thousand times, beyond any human capacity to consume.

We also use longer form video, and even 360 video which gets even closer to providing a sense of presence. These are all ways that we are trying to communicate something.

But actual immersive and interactive sharing of an experience is still rare. Right now it’s reserved for domain experts. How do we as ordinary users get past photos, text and canned media and play with richer media ourselves? With media that is interactive, not just passive?


For me, as a reformed video game developer, a story is best “interactive”.

It’s something that responds to the user, it acknowledges user input and changes in response to that user.

The question becomes what tools do users need to do more than simply arrange static objects, but to also specify how those objects behave and respond to user input? And as well, how can we put these tools in the hands of users so that they can tell their own stories?

We can consider classical story telling tools:

  1. Choreography. A choreographer blocks out scenes, seeking a mass effect, like an artist masses pencil lines; creating “volume” where they want — conveying and strengthening the emotions they want to produce. They direct the dancers, the actors, the performers, to act out scenes in ways that their audience can understand — that are best said “just so” — not explicitly with words, but with gestures, with bodies, with motion. This can lean either way; but often towards being static outputs.
  2. Film. A good film director does the same; they rip away anything not essential to get to the bottom of a shot. The actors will move their eye a certain way, look that way, hold a specific expression, and the director will pan and cut the shot. Concatenate these shots and a director paints very quickly in a palette of human emotions, human expressions. There’s very little patience there; the clock is ticking. Like acrylic, movies are often built on hastily applied primaries.
  3. Video Games or Interactive Media. This area has become the catch all for interactivity or dynamic interaction. It is surprisingly ill defined in some senses (and I do sometimes think we’re kidding ourselves when we use any other term other than “video games” to talk about interactive media). But overall; it’s the place right now where we can easily teleport somebody into an experience that they can control. It’s the right “conceptual framework” for thinking about interactive authoring.

Static content is like being on a train. A passenger on a train is carried along by that train. We are passive observers, we can turn our heads but we can’t make the train go somewhere else. That isn’t to say that there isn’t a delight in being carried along. Movies, film, books, plays, theater — these are all very good at asking us to give up control, and go along for the ride. There’s an understanding that the journey is a transitional space, that connects two points together. And there is change in a sense — there is the version of you before the story, and the version of you after the story.

Yet interactivity seems to offer new possibilities. Notably interactivity can enhance the sense of presence. As a learning technique teachers know that experiential knowledge is more deeply entrained. There’s something about how we consume experience that is different. Perhaps part of the reason for this is that interactivity has risk. A hike with friends, contact improv or ecstatic dance, each embodies an unknown or unpredictable outcome that requires more attention.

The best interactive experiences reach across the void between us, they can put us into a flow state that we call ‘fun’, they are filled with visceral teachable moments that change our points of view.

From the perspective of a video game creator it’s an honor to be able to create a space where we can guide the user through an experience. In a sense an immersive experience can be like creating a world.

From the perspective of the person being told a story — we want to be teleported somewhere else, to actually be there, to be able to exercise our compassion, empathy, our sense of understanding. There’s something about interactivity that forces a sense of presence. Somehow you seem more implicated when the world responds to you. From that perspective it is a gift to be able to dip into another person’s life for a moment and try out their choices, and try exercise your own degrees of freedom within their rules.

What do Storytellers want?

In my experiences so far I’ve had the luck (or misfortune) to have been yelled at by movie directors, in kitchens under master chefs (while building software to organize recipes or mix drinks), interior designers (who want 3d models of their interiors), often by video game designers, and even more often by funders and executives desperate to recoup costs while trying hard to understand what is actually going on behind the scenes.

One way to imagine what it is like being a tools programmer in these industries is to imagine yourself in a roomful of people running around screaming with their hair on fire, for months at a time while you try to douse them with buckets of of what you hope is water. Everybody needs code to manage their products, and often they need it yesterday.

However on the plus side this has often meant working in a cross-disciplinary fashion with outcome driven creatives. Game designers, hockey players, pianists, cartoonists, board game creators — anything you can think of. People with amazing skills. Often as well it’s not just one person but a large team, all relying on that one code base or pipeline. God forbid you break the pipeline and slow ship date with some bad code!

Many creative industries are surprisingly similar, often creatives are driving a project on hard deadlines and need tools to orchestrate the work. The video games industry is probably the most hermetic (game developers are in too much of a rush to share or document their techniques) but it is the same.

Creatives are hungry for anything that lets them work together to deliver a something on budget and on time that even vaguely resembles the original contractual agreements. Teams often go bankrupt if they miss their launch window, so in many of these industries slipping is not an option. Shipping is more important than quality even.

From the perspective of the conductor, all the art, the artist tools and even the artists are just colors in a larger palette, parts of an orchestra. In making something new the team is exploring, doing something that cannot yet be entirely said, can only be approached — by definition is it is new and doesn’t itself have a perfect label. There’s a larger symphony or composition that is often being engaged on, often the work of many human minds together, often working out some kind of exegesis — some story, narrative or urgent insight that they want to share, that they feel needs to be told or even just for money.


Artists do have many tools at their disposal. Ableton Live allows for the orchestration and deformation of percussive instruments over time to produce rhythm and music. Max MSP and Pure Data explore lower level sound and noise generation and design. A modern iPad with a pen and Procreate or Paper or any of hundreds of other apps do a not terrible job of recreating traditional media. Newer tools like UMake start to let creatives model in 3d. 3D Immersive tools like Quill and Tiltbrush for the Rift and Vive VR headsets start to really let us feel comfortable in a 3d native creative environment.

Of course the non-digital physical and analog arts are also co-evolving. Experimental sculpture, noise shows, dance, performance art, place based experiences, big art come to mind. Traditional ‘crafts’: painting, acoustic music play a role even if they’re not necessarily a critical art inquiry. These non-digital mediums can often produce impressive results — a Balinese Gamelan gong luang performance with a dozen musicians explores overlapping waves of interference patterns with the same acoustic fervor as an Ableton backed noise performance by a solo artist in a late night warehouse grunge show. Wandering around Burning Man can be much more intense even just purely visually than any VR experience.

That said, we’re still in a dark ages of tools. They are frustratingly limited, and two dimensional, and the level of expertise required to use them can be a personal lifetime. They often speak to one media type, or are expensive, or don’t work with each other, and this newest form of media, interactive 3d storytelling is desperately under-represented.

Storytelling Tools

We don’t quite yet have the equivalent of a visual typewriter to arrange objects and actions into visual stories. We appreciate that improvisational storytelling should be easy, we understand the power of stories, there have been many early efforts, but tools here are still only beginning to be accessible.

When you’re picking a 3d storytelling tool today it’s like being at a train-station with four trains going in four different directions. Each train is going to take you very far away from where you started very quickly, and it will be difficult to go back and try again. You can pick one of these four gates:

  1. An imperfect engine, tool or framework that already exists.
  2. The fantasy of a perfect engine that does not yet exist.
  3. The conceit of writing your own solution.
  4. The illusion of not having any core engine or technology at all.

Designer expectations in a 3D Storytelling tool

From a non-technical storyteller or designer perspective there are some reasonable expectations:

1) Simple. That it can be used without any coding expertise. You can create in a high level way; outlining, describing, placing, ordering — like a director.

2) Playful. That the line between creation and play is dissolved or somewhat arbitrary, a mode at best. That you can change things, that you can undo and are not stuck with consequences. That you can get fast results in a few minutes; be in a flow state, continuously rewarding and granular. That you can do this with friends, not alone if you want.

3) Durable. That you can pull in all kinds of nice pre-built art assets from a variety of places and that it can handle these assets. That it will be more than a toy, that it will scale with you and that you won’t outgrow it tomorrow.

Programmer expectations in a 3D storytelling tool

But behind the scenes, the programmers also have needs. And so many of them. We as programmers hate getting painted into corners. We want to help designers be creative, but we also want the foundations to be solid. In essence we’re building a bridge between designer language and programmer language:

1) Appropriate. A game designer or storyteller works at a high level, not micro-managing boring details. They want to be able to tell a story in terms that make sense to them. It’s very important that from an experience designer perspective you can say “place a puppy on the ground in front of the player” and have that just happen. There’s an appropriate game grammar that has an appropriate set of verbs and nouns, and assets themselves are the nouns. So we want to expose the “right grammar”. We don’t want designers to be endlessly bogged down in saying “I want a puppy geometry, and a wagging tail animation, and I want to decorate it with an attach to ground physics rule”… We want designers to speak at an intuitive storytelling human level — making broad gestures — like a film director.

2) Precise. Interactive experiences are somewhat different than say making music or drawings because usually what you’re trying to do is arrange a bunch of virtual objects such that the product of their interactions produces the experience. Behavior is emergent not explicit. It’s more like the movie Groundhog Day. You’re not literally “telling a story over time” but rather “telling the same story over and over”. Often you’re going to play the story hundreds, thousands of times over and over, exploring all of the variations of outcomes that you want to produce, and tuning and tweaking the story relentlessly until it is stable and produces the outcomes you want. In a sense it is like you have a control panel with hundreds of thousands of variables and knobs that you can twist and turn, to try and affect the outcome. It’s a bit more like making a ship in a bottle, or arranging a long line of dominos — you can’t directly “hammer” on the outcome you want, rather you have to “set it up”. So you need a lot of fidelity and control in those initial conditions.

3) Collaborative. Not only you are twisting the knobs and placing art elements, but often so are many other people on your team. A multi-player authoring environment can be a huge help to a collaborative design process.

4) Formal. You need some way to organize even just the files what can be a surprisingly large volume of self similar art assets, with various revisions, and associated code, between often many artists, designers and creatives. Even if it is a solo project, you can quickly find yourself dealing with thousands of files. You all have to manage commits to your master branch, have development branches, merge and mark milestones and suchlike. Even a small team of three to four people needs a formal process for managing everybody at once. Often in a larger team a design bible specifies exactly where to put assets, how revisions and builds are released and so on. When hiring new people there’s often a two week or more learning period as they get up to speed on asset pipelines and practices of that team.

5) Playback. In general there’s a three part pattern in the creative process. It consists of a) composition, b) storage and c) playback. You arrange your characters, avatars, events, scenarios as suits your intent. You save that to disk to some kind of file format. And then you play it back for the audience who experience your story. This is the same pattern as most art. A musician for example uses a musical notation editor, or even just paper, or a tape recorder, to scribe music, that music itself is stored in some durable file format, and then the music is played back say on a Korg keyboard, by a singer, an orchestra or a gamelan ensemble. Or in a similar fashion, a web developer uses a text editor to describe a page layout, the document is saved as HTML, and then played back with a browser. A writer writes a script, it is shot on film, and played back for a theater crowd. Don’t be held hostage to a player or a market that is not open to you, to re-use or retarget as you wish.

6) Document Based. You will want a durable file format on disk for describing your high level game. Ideally text based, ideally transparent, free, easy to edit, unencumbered by licensing, laws or any obligations, limits or third party policies. Your description fo your game or experience is your gold — and you want to protect this. From a framework what you’re looking for is something that lets you describe a scenario in a formal way. At the heart of all this is a document that describes your story — that document format must be transparent to you — you cannot be at the mercy of a third party closed editor.

7) A Framework. It turns out your tool becomes your framework, it organizes your work, sets the boundaries of what is easy or hard to do and sets limits. If you don’t know this then you end up arriving at some framework “by accident” simply by hacking and thrashing about; throwing things willy nilly into various folders. It’s best to approach with a conscious plan therefore. You want to focus on making your experience, not on “wheelcraft”. Almost everything you want to do has been done before by somebody else, better than you can do. Any idea around player capabilities, features, general “physics” or foundational aspects of the interactive experience you are building has been done already. Any way of organizing ideas, assets, behaviors has been done already.

This sounds bad but is actually good. Rules limit your options and they increase your speed in one direction. However that said, it does mean you have to do some research and decide what are the right limits. For example Unity3D disallows using their framework to build world-builders (since the easiest thing to make in Unity3D would be Unity3D itself and this would compete agains their own core business model). So if you want to make an open world world builder game, don’t use Unity3D — as Improbable found out [ https://www.gamesindustry.biz/articles/2019-01-10-unity-explains-improbable-license-revocation-calls-spatialos-creators-post-incorrect ].

Components a frameworks should have

From a programmer perspective this is a laundry list of the kinds of features one would want for building 3d stories:

1) Packages. There are various levels of assets. Most of us tend to focus on 3d art and animations, but that’s just one part of the behavior of a complex entity. Entire entities need to be packaged up as single blobs that can be reused over and over easily.

2) Scene Graph. A heavily enforced standard on-stage “live” representation of all state — basically a place to to put everything that is alive in a given scenario. You can think of an interactive vignette as a subset of the larger off-stage database of all assets. A live story might consist of some set of actors in a room, with lighting, with behaviors. The industry has a very well established pattern today of calling this a “scene” and representing all of the actors and behaviors in the scene as a “scene graph”. A scene owns and contains lights, cameras, puppets. And those puppets contain various kinds of behaviors and responses to each other and to you. The player or players themselves are also represented as puppets. When you step into a game experience you’re stepping into an avatar that is just one more character.

3) Scene Queries. An interactive story has many internal queries. “Where is the player?”. “Did the player stand near the magical book yet?” “Did the book fall in the water?”. A good framework has a robust internal query model that lets a designer ask complicated questions and program the game to do useful things when those conditions occur. There’s a deeper issue here of designing the “physics” of your game world — the rules of magic or arcania or whatever it is that is important to you.

4) Events. A critical and under-valued core design aspect of a good framework is the ability for entities to broadcast and receive and respond to events easily. Most of what you’re doing as an experience designer is setting up pre-conditions. “If the tree is impacted by the player then let the cherry blossoms fall”. The common practice today is “inversion of control”. The core engine should call your code when interesting events occur, rather than you having to call the system. For example when your bird is first added to a scene it can be notified with a “just added to scene” event.

5) Libraries. There are a lot of boring details in making an experience. Loading a sound, placing it in 3d, drawing a circle, writing text. At this point most libraries do this fairly well. But there are other sets of utilities and behaviors that are not done very well — such as tweening or vr hands or multiplayer networking. Different frameworks have different powers. For example BabylonJS is a browser based 3d library that supports collision models and physics — which are normally a hassle to setup otherwise — and for some game developers it may be a better choice than ThreeJS — the other dominant framework for web based 3d experiences.

6) Debuggable. Interactive emergent behaviors are fantastically hard to debug. Arguably most of what a team is doing for most of the product development cycle is just looking for islands of stability that are the product of the intersection of millions of variables in an n dimensional space. A good framework lets you freeze, rewind, inspect and change anything at any time.

7) Compilation. Often an experience is produced out to multiple targets. It may run on a desktop screen in 3d. It may run inside an HMD. Each target has a different set of constraints and often some kind of compilation is needed. I tend to lean away from compilation (I like a raw debug mode that is very easy to trace) but ultimately retargeting means some kind of compilation.

From a programmer perspective there are three fairly critical and somewhat orthogonal pieces that help their engine fly. There’s the definition of “what is a component or object?”. There’s the definition of “how are objects arranged in relationship to each other to form the visible user experience?” and there’s a definition of “how to heterogenous objects message each other to handle events like collisions?”.

A framework primarily seeks to marshal and serialize these concepts in a standards based way that everybody can agree on and iterate faster with than without.


Dedicated 3d Game Authoring engines. We are all familiar with Unity3D, and Unreal Engine. On the web we have tools like Sumerian, PlayCanvas [ https://playcanvas.com ] and arguably BabylonJS [ https://www.babylonjs.com ] and arguably AFrame [ https://aframe.io ]. These tools are good for experts but simply not accessible to complete novices.

You can describe a scene, place art assets, set properties and behaviors and then save that entire scenario and play it back for a player. It does have mature scene graph management, it is fairly easy for novices to poke at, it has packages, art assets, and is focused on performance.

But these tools are NOT FREE. Also the products of Unity3D are binary blobs that are run by a Unity Player. It’s not ‘transparent’ in the sense that you can edit, alter, change, remix at a granular level. And there are some legal constraints you have to follow.

AFrame. This is a good tool for moderately technical beginners but to get beyond that requires a steep hill climb. Also one drawback is that it is built on top of the DOM so it requires some understanding of HTML. It still needs high level verbs to express high level actions — while you can use it to quickly assemble 3d scenes — it is harder to turn those scenes into interactive stories. With some work you could build your own abstractions overtop of AFrame however.

Babylon3D or ThreeJS. This is a good option for expert programmers. Many 3d game developers on the web build directly on top of one of these low level frameworks. You can directly use a lower level framework like this, and produce great results, but you still have to organize and manage assets, and you have to invent some system for sequencing or choreographing events over time. You’d basically have to write something similar to what I wrote below.

Spoke. This produces static scenes only. It is a graphical user interface focused on arranging scenes for import into HUBS. It’s focused on pure layouts right now. See https://hubs.mozilla.com/spoke .

MrEd. This is a new 3d authoring tool from Mozilla. [ TBD ].

Rec Room Circuits ( see https://rec-room.fandom.com/wiki/Circuits ) . This does expose some procedural and programmatical behaviors to novices, but it’s not at the right level for storytellers. However with some work this could be a good solution.

Roblox has some visual programming capabilities, again it’s not exactly at a story telling level. See https://developer.roblox.com/ .

Blox. This is a wrapper I built for ThreeJS that uses text files to describe 3d scenes. See http://github.com/anselm/blox .


There will be a rapid explosion of creativity in terms of interactive storytelling on the web — and where 2d was the norm, now we should start to see 3d also rapidly grow and become a dominant art form.

When we talk about telling stories, about ideas, about inspiration, we want that to be our focus. We should be able to walk in parks in the morning and make a beautiful interactive experience later that same day that explores the feelings we had.

To get there we will need good tools. Because 3d is so complicated these tools are often embedded in the final product (unlike a movie or a song where the tool is separate from the product).

But often tools are under valued, we tend to hack things out quickly, and then paint ourselves into corners.

Good choices are creative choices. We must apply our creativity and intellect to our processes, not just our products. The creative process of working with many other people simultaneously, with artists, with designers, with programmers, can be sped if the right choices are made. That itself is a choice.

Ultimately you want to be able to settle into a build cadence, with everybody easily able to contribute, with regular releases so that the team can fall into an expected code/test/debug cycle. That formalism, or heartbeat, is deeply gratifying to all parties, especially funders! As a practice you first want to get to the stage where there’s an absolute minimum viable spark — and then you want to keep that spark alive, never letting the project submarine, or fail to build and run, and continue to then breathe it into a flame. The day to day work itself should be fairly simple, indeed even blue collar, repetitive — as simple as applying brush strokes to paper. You want to be free. You want to be able to be creative. Not get painted into corners by poor framework choices.

The stories we share can be beautiful if we can express them easily. And to do that it helps if our tools are pretty on the inside, not just the outside.

SFO Hacker Dad Artist Canuck @mozilla formerly at @parcinc @meedan @makerlab