WTF is a DOF?

TexasGreenTea
11 min readAug 20, 2021

--

Want to start Twitter drama among the hardcore nerds of the spatial computing industry in zero seconds flat?

Let it never be said that we don’t feel things about our jargon.

BUT… we do take terms for granted sometimes.

For instance, do you remember the first time you chatted up a fellow nerd at a meetup, and someone said “6DOF”? And you had never heard that term before?

If you’re like me, you once struggled as an industry noob to find the right balance of telling the truth (admitting your noobishness) versus just keeping your mouth shut and hoping no one noticed. On the surface, you may have been motivated to let your face imply…

Yup, I definitely know what that means…

But just below the surface, you were thinking…

What does it mean???!!!

I wrote this article for two reasons:

  1. Impostor syndrome is universal. We’ve all been the noob, even with terms that are so common they feel like second nature now.
  2. I just happen to have OCD about deep-diving into amounts of detail no one requested, so prepare yourselves…

It is time… for a deep-dive into the the meaning of 6DOF.

But why? EVERYONE knows that one, right?

Yeah, not really. Ask anyone outside our little XR bubble. A majority of the population will give you that very same “sure I know what that means” face. But more importantly, do WE know what it means? Do we really?

Is there such a thing as a 1DOF interface? How about 0DOF?

If you know the answer, the bus didn’t explode, AND you probably have a pretty good idea how to get from 0DOF to 6DOF. But let’s try a few more as a brain teaser before we dig in.

How many DOFs does a QWERTY keyboard have? HINT: 0DOF x crap-ton=?
How about an NES controller? Now it’s getting a bit interesting. Is a D-Pad 2DOF?
I count seven 0DOF, two 0.5DOF, two 1DOF, and three 2DOF interfaces on a PS5 controller.

Did you get the same count as me? I think even the most seasoned VR veterans are looking at my count and thinking…

WTF is 0.5DOF?

Even with “industry-standard” terms, we don’t always agree when we get into the weeds. That’s why we get so triggered when all these darn noobs come in, hailing a freshly minted pot of gold called the Metaverse. What a brilliant idea. Why didn’t we think of it before?

So as I break down my idea of what a DOF is, I want you to know we can still be friends if you don’t agree.

A Degree Of Freedom, at it’s most basic level… is a floating-point input.

Oh God, more jargon.

Don’t worry. This is not a coding 101 lecture. Except for THIS paragraph. In code, we have to distinguish between numbers used just for counting versus numbers used for precise measurement, i.e. integers versus floats (A.K.A. decimal values). When we convert the body’s muscle action to a digital instruction, we have to throw out any data we don’t need.

If it doesn’t matter how hard you pressed the button, then the button can send a crap-ton less data into the computer.

That’s why we build discrete versus continuous inputs, and that’s often why videogame controllers have a combination of both. If the button can send a quick signal when it toggles down, and then one more when it toggles back up, it probably doesn’t need to send anything WHILE it’s down.

Zero-DOF interfaces are everywhere.

That’s about as discrete as it gets. It’s a 0DOF interface because your body is not free to choose a variation of values continuously. It’s a binary switch. We care about WHETHER or HOW MANY TIMES the switch flipped, but not BY HOW MUCH it changed in the past 60th of a second.

Compare that to a 1DOF interface, for which your body DOES have the option to adjust a variation of values for a single input channel in real time.

A volume knob is the perfect example. EQ Faders on a mixing board are another. Guitarists, audio engineers, and lighting technicians the world over are biding their time, waiting to corner you at a dinner party so they can unload their passionate ideas about discrete versus continuous inputs.

Yes, I’m just as bad as them, but you’re still here too, so cheers. It’s wall-to-wall nerds in here.

To see what I mean by 0.5DOF, let’s get back to gaming:

A D-Pad is not 2DOF?

The difference between a D-Pad an a joystick clarifies the grey area between 0DOF and 1DOF.

The A Button is obviously 0DOF. The left/right arrows on a D-Pad would comprise a 1DOF interface, if the input could change continuously. But it can’t… sort of. Each of the arrows is no different than the A Button. 0DOF + 0DOF does not = 1DOF. But by holding a binary switch down, you can execute a single, repetitive instruction across time. Continuous?

Nah. Its value can’t change across time in any way that’s different from the previous frame of measurement, but you at least have the FREEDOM to choose a duration. So it’s a hybrid form of input. It happens over time; that’s not exactly discrete. But you can’t change its value. It can be programmed to change at a certain rate; that’s not exactly continuous. The next update to the value isn’t one you picked. The system picked the value for you automatically. It’s kinda both discrete and continuous, and neither all at once. That’s why I call it 0.5DOF. It’s not 1DOF, but it’s more than 0DOF.

But 0.5DOF + 0.5DOF does not = 1DOF, just as 0DOF + 0DOF does not = 1DOF. But that’s confusing. We’d expect 0.5 + 0.5 = 1, so maybe 0.5DOF is a crappy name for this hybrid input. But whaddaya want? I’m spit-balling here. If you can think of a better name, have at it and let me know on Twitter.

You might conclude that the A button is 0.5DOF, or SHABLOOPY-DOF if you like that name better. It depends upon what action binds to the button. Sometimes, the A Button is 0.5DOF. If you use it to click a GO button in a menu, it’s 0DOF. If you use it to hold down an accelerator pedal in a racing game, and the B Button is the brake, then the A and B buttons together are indeed a 0.5DOF interface.

You need to have freedom of control to change the value in both directions, otherwise you’re not really “free” to access all available values on that channel, hence two binary buttons that each provide opposite directional change on a single floating-point value (e.g. speed) become a single interface. Also, time defines the difference between 0DOF and anything beyond. Control across time is key. Without time, there’s no F in your DOF.

Who cares? Why is this relevant to us in spatial computing?

If we become hyperaware that the F in DOF means “freedom to choose the value of an input channel continuously across time,” then we naturally start asking ourselves questions like…

  • How many DOFs am I putting to use at the same time?
  • Am I using all the DOFs at my disposal to their fullest potential?

These questions have big implications in spatial computing. If we fail to ask them, we may leave a lot of interactive capability on the table.

To answer them, we have to be vigilant about inventorying the DOFs we have on-hand for any given platform. Most of us know 6DOF is X, Y, and Z translation plus pitch, yaw and roll rotation.

But what is translating? What is rotating? Is it your hand? What does that even mean? The center of your palm? Your wrist joint? The base knuckle of your index finger? A conceptual node at the center-of-mass of your controller? How many different things are doing these translations and rotations? And isn’t it true that all these parts must each have unique DOFs if they’re kinematically connected to each other? Also, how do the other inputs like triggers and buttons supplement that whole thing from a throughput perspective?

So many questions.

But wait, there’s more.

6DOF is 3 + 3… doesn’t that imply we can add DOFs to DOFs? If the trigger button on Quest is 1DOF… doesn’t that mean we actually have 7DOF in each hand, or 14DOF in total?

I’ve been building some prototypes that need the highest possible muscle-to-data throughput, i.e. maximum DOFs utilized simultaneously. I’ve become obsessed with input limits, as you can see.

I believe 14 is an important number. You might say Quest or a few other platforms have more than 14 because of the grip button, thumbstick, etc. That’s perhaps 20 input channels or more.

But then I could say, “Ha ha, fooled you. With optical hand tracking, you actually have 6 DOF channels per joint. One for the wrist, one for each pinky knuckle, etc. That’s 16 per hand x 2 hands x 6DOFs = 192 channels! Wow!”

And that’s when you say, “this game has become stupid and pointless.”

And I agree. Those channels are all crucial for the smarty-pants devs who trained the hand-tracking algo, but they get to go nuts in those particular weeds so that the rest of us don’t have to.

And yet, somewhere in there, we did arrive at a practical maximum throughput awareness for everyone.

I believe that important number is 14 because that’s the number that can be utilized across all platforms, including both VR and AR. Whether you use VR controllers or optical hand tracking, you have 6DOFs per hand at the wrist, plus a single maybe-continuous trigger input at the fingers. Any gesture or trigger is the 7th input channel.

Hey Platform X, you like a bird-beak gesture more than finger-to-thumb tap? Cool. As long as you map it to the 7th channel like everyone else, we’re good.

I believe this concept of seven channels per hand will become a standard throughput scheme. Some platforms may add more channels, but the industry should pressure all platforms to support at least these 14 as an open standard ASAP. We saw this starting to happen with even more than 14 channels via Valve’s work on OpenVR, then OpenXR, and now a similar scheme is in the works with WebXR. But we need the 14 channels not just identified, but labeled with standardized names. The channels are identical on all platforms, yet they’re labeled differently in each provider’s API today.

We devs want our projects to work on all platforms, and we will also want highest possible throughput under that constraint. If I build a system that can’t run without 20 channels, it won’t thrive away from Quest. But if I target exactly 14, I can redeploy easily to all platforms that have either VR controllers or hand tracking.

If you’re working on an app that doesn’t need all the DOFs simultaneously, you don’t need to obsess over this stuff like I do, yet. But I suspect we’re all going to build a project in the near future in which we sit up and notice we’ve been neglecting the full advantages of the crazy-high throughput capability at our fingertips.

Round 1: FIGHT!

Compared to a 2DOF PC mouse, we should have friggin’ superpowers in spatial computing, yet we often still assume tasks are more efficient on a PC. Otherwise, wouldn’t we have adapted ALL PC applications to VR by now?

I suspect any assumption that ‘PC is more efficient’ is patently false in most cases. The inputs have simply not yet been mapped to UIs that take full advantage of all the DOFs yet. I am hacking at this very problem every day as I try to solve rapid text entry for XR, but that’s one of many attack vectors.

Some of our current habits are our Kryptonite. For instance, I despise raycast pointers. They are a 2DOF interface, yet we all use them in favor of other options that give us more simultaneous DOFs. Your cursor can strafe left/right or up/down. You have continuous freedom to place the cursor anywhere along two degrees only.

What if you want to move the cursor further forward, away from your body? How about bringing it backward, closer to your body? You can’t. Occasionally, platforms attempt to add in extra mappings to bandaid a depth control on top of their raycast pointer, but it never works the same way on depth as it does on width and height. Why not? Because raycast pointers are incapable of 3D navigation. Their core function deletes the 3rd D.

The pointer interface doesn’t let you move through depth. That part is automated. Sounds familiar, right? Like a D-Pad. Something is happening on that input channel, but you’re not controlling it. On a D-Pad, you control the duration, but the velocity and/or acceleration are constant, predetermined by the code. On a raycast pointer, the depth position of the mouse is predetermined by the arbitrary distance of the nearest object you target. It’s arbitrary because you can’t pick a different depth independently of moving the other degrees of freedom.

We don’t choose the depth value over time. The degree is not free. So we’ve been using the VR equivalent of a D-Pad this whole time. Which one of us will be the first to build the VR equivalent of a joystick instead?

In the early 1990s, there were plenty of videogame developers who believed that “D-Pad is all we need to navigate.” And then…

Raycast means auto-depth. I don’t want auto all the time. I want to drive sometimes. I’m old enough now.

Raycast pointers prevent us from exploring dozens of new ideas in content layout by deliberately cutting us off from some of the DOFs, which in turn exponentially limits the space we can touch. It’s like someone decided we’re not ready for all these DOFs yet, like we all need training wheels to navigate 3D space.

I have lots of ideas about how to fix that, but I’ll never have time to tackle text AND navigation. I need recruits for the war against the raycast pointer, brave souls who can stand with me against the horde of raycast acolytes by prototyping true full-field 14DOF navigation techniques for both cursors and objects. Winter is coming.

Clouds on the horizon whisper of revolt against the Infamous Tyrant of XR UX, the Dreaded Raycast Pointer. All who bend the knee to the evil king, take heed. A challenger emerges from the darkness, who treats all DOFs equally.

As we get into the guts of our industry-standard terms, we find grey areas within those we assumed were well-codified. It shows us exactly how many rings are in the trunk of the XR tree. We’re not an infant industry anymore, but we’ve barely hit puberty. Time to take off the training wheels.

← If you found this article interesting, please pass it on by clicking “Share on Twitter.”

Some other spatial computing articles I’ve written in the past:

Unlocking the 3rd D — A spatial design manifesto

Diegesis in XR — in case you haven’t had enough jargon yet

Written By: AJ Campbell, guy who builds stuff https://twitter.com/texasgreentea

--

--

TexasGreenTea

Prototyping Engineer/coder/UX designer, former Magic Leap & Technicolor — prior work: lead dev on Spotify launch on ML1 — now working on spatial text entry