ATVR

GESTURE RECOGNIZER

Showcase

This is a Unity Tool that can be used to bind VR guestures/movements to simple functions in Unity. It uses a Backpropagation Neural Network that can recognize your movements and output an event trigger that’ll invoke any given function.

The tool also includes a VR Rigged “Alex the Robot”, and a full in game tutorial for the designer to understand how it works. (I’m planning on making an extra tool for the designer to record their own tutorial for their players).

And this webpage will go in depth into how I made the tool, and mostly into how it works. I’ll explain my process into explaining the Neural Network, and how I implemented it into my Gesture Recognizer.

How it's made:

STEP 1: The Neural Network

The first step into making the Gesture Tracker is to understand how a Neural Network works, what kind of neural networks exist, and which one fits our system the best. I found this really interesting video playlist by 3Blue1Brown where he goes through the 4 steps of explaining what a Neural Network is, and how we can implement the neural network we’re gonna use in the Gesture Tracker, named the “Backpropagation Neural Network”.

I won’t go too much further into explaining how the neural network works, and I highly recommend watching 3Blue1Brown’s videos, but to sum it up for our convenience, let’s only look at what we can give the neural network, and what it’s going to give back:

Hopefully, you’ll now have a slight idea of how we’re going to use this system to create our guesture tracker. but if it’s still unclear, I’ll explain in the next step.

In a Neural Network, what are neurons?

A. Holders of large amount of data
B. They are values between 0 and 1
C. They are a visual representation of weights and biases

How many Hidden layers do you need in a network of:
120 input and 4 output neurons?

A. 120/4 = 30 = 3 layers of 10 neurons
B. Almost always the same as the amount of output neurons.
C. I don't know, but I should just test to find out what works best.

STEP 2: Using the Neural Network

Now that we understand how a backpropagation neural network works, and what kind of inputs and outputs it creates, we can start using it to create our Gesture Tracker.

The following video will explain how I sent my gesture data from the VR player over to the neural network, and how it determined what function to play after the network calculated it’s highest output value:

And to get a little bit more specific, I found this video from uNicoDev on youtube which is where i found the code for the Backpropagation Neural Network. It had the perfect setup, and the only changes I made to it where changing how the neurons where called. The rest of my time took me messing around with the system and understanding what each piece of code did.

Also, as you can see, uNicoDev implemented the math we learned from 3Blue1Brown perfectly with some simple lines of code in the Train function. You can find the code here.

[HARD]
In the weights 3d float array, what do the indexes indicate?

A. [structure-1][previous neuron][next neuron]
B. [structure][next neuron][previous neuron]
C. [structure-1][next neuron][previous neuron]

What does the Test function in the Neural Network return?

A. It returns a float array of output values. Highest value = strongest match!
B. It returns a 2D float array of all outputs with their closest match for the set inputs!
C. It returns a float array of input neurons that are the closest match to the given input neurons!

The rest of the code I did create only took use of this backpropagation network by creating save-able values for future use, and took proper use of the network by ONLY giving it values between -1 and 1 because of the normalized direction between the different tracked points.

(I did originally only give the points through, and this messed with the system because it sent through some values between -5 and 5, and this messed with the neural network because it calculated either 0.999 values or 1.5E^7 values, which messed with the output values and made it so only one of the many different outputs could be chosen.)

It’s also important to explain how the gesture recognizer will teach the neural network, because, it can’t learn with only one input at a time, It basically needs to go over all the possible inputs and outputs each time you teach it one new gesture.

Because of this, it is important to organize all your inputs and outputs in two 2d matrix arrays, where you can link the first index to an indexing of linked inputs and outputs, and the second index holds the specific values. 

With these values, the neural network can take the differences for all the weights and biases over all different gestures, and apply changes accordingly.
(And for convenience and speed, you can multiply these inputs and outputs so the neural network will have more data to take in evaluation)

Hopefully now you can understand how I used the Backpropagation Neural Network to create my Gesture Recognizer. These where just small and simple code snippets, and if you want to inspect the full code I recommend downloading my tool and inspecting the scripts yourself.

STEP 3: Implementing the Gesture Recognizers

Alright, everything works, so now let’s try to make some functions for the player to activate with their gestures, and link them to the Gesture Recognizer. 

So to sum up how to use the Guesture Recognizer in steps:

Step 1: Set the scripts:
Apply the Guesture Recognizer script to a EmptyGameobject child of your HMD, and apply the PointTracker script(‘s) to EmptyGameobject child(s) of your chosen tracked point(s) (this could be controllers, or even loose trackers)
(the HMD will record these trackers in correlation to it’s own position and rotation, which means that the HMD is in this case also a tracked object)

Step 2: Set your Gesture Recognizer values:
Network name: <Any name you want the network to save as (adding a folder name to it can work for orginization, for example: NeuralNetwork/MyNetwork, will save the file MyNetwork.sav in the folder Assets/NeuralNetwork)>
Iterations per second: how many points do you want the TrackedPoints to record per second
Seconds of record: how many seconds do you want every gesture to be.
Hidden Layers: this decides the amount of hidden layers the network should use.
(my recommendation: 1 layer for each 10 fold outputs, and 2-3 neurons per output neuron)(And try it out yourself, see what works out best for your gestures!).
Output Events:  set these events to fire once the network recognizes any of the taught gestures.
Learn iterations: how many times do you want the network to learn each gesture over 1 teach input.
Min value: the lowest value an output has to be to fire, if set to 0, then the network will always take the highest output value possible, but if set to 0.9, it’ll only take the highest value that’s also above 0.9, which makes the gesture recognizer more sensitive to very accurate inputs.
Check Line: *can leave empty* input a linerenderer with the check line script to show a line once you teach or test to debug your gestures.
Point Trackers: Set all your PointTracker you have on your trackers and drag them in this array.
Use Preset Data: will prompt a save data file to load a previously used network with all it’s learned values.

Step 3: Set functions, and fire:
Set the output functions in the Output Events list, and call these respective functions to use the Gesture Recognizer from any script or event caller of your choice:
GuestureRecognizer.StartTeach(<index of the output you want the network to learn>), which will start recording your points as soon as you call it, and then teach the neural network within a few frames.
or GuestureRecognizer.StartTest(), which will also start recording, and then run your input through the neural network to create your output values, and then invoke all the events of the output value that’s both the highest, and also higher than your MinValue you set in Step 2.

What does the LearnIterations variable change?

A. This changes the amount of times the Network learns with 1 Teach call
B. This decides the amount of times you have to teach the Network before it can be tested
C. This changes the amount of outputs the network returns on each Teach call

What function do you call to teach the Network a guesture for the 4th output?

A. StartTeach()
B. StartTeach(3)
C. StartTeach(4)

CONCLUSION

Thank you for your patience, the system is now complete and can be downloaded here for you to try yourself.
This .unitypackage includes a tutorial scene, (including a 2d shape recognizer), and all the scripts you saw in this tutorial.

Follow my Progress LIVE!