Week 02 – Case Study Semantris – Alison Frank

Link to presentation

For this project, I chose to look at a game, Semantris developed by Google’s Research Team which makes use of AI technology. 

This game is based on word-association, and has two different modes which create different challenges. The premise of the game is that you have to guess word which the game AI will relate to the keyword given. Throughout my play of Semantris, I found that the relations are mostly natural, but there were a few unexpected results.  

The word association training of Semantris was focused on conversational language, and relations. Therefore, Google tried to implement common questions and answers seen in human conversations. In order to gain data to use for Semantris, Google Research also looked back a to their project Talk To Books,  a project which connects user input to passages from books. 

Seen below is a sample of code for the project which was used to help the AI understand some common conversational questions and their corresponding answers.

Sample of Google's code highlighting use of conversational vocab

Semantris makes use of tensorflow’s word2vec model (link here), which is used to graph the semantic similarities between words. I found this to be an interesting way to gain quantitative data of a qualitative thought.  As when you train an AI model, you need your data to be in a format which can be understood by a computer. Therefore, this model needed to move on from simply comparing strings by the characters they contain, and rather focus on the meaning of the word. Personally, I think that this would be incredibly difficult to graph, but Tensorflow has some examples of how they accomplished this (pictured below).

graph showing word relation  (just one example of how words can be related)

skip gram model, used by tensorflow(another way to highlight how data sets can be formed)

Along with this, when training this project, it was semi-supervised, so that the word pairs could be more conversational and natural, according to Google. When you play the game, the AI also understands pop culture references, and some word-relations which are only understood in conversation. 

Outside of the boundaries of a game, Semantris could have other practical uses, especially when it comes to those who are learning to speak English. Along with this, the techniques used to code the game could be implemented in other text-AI to create more natural results. However, there is still some polishing which could be done.

Week 2 – Transparent Latent GAN – Jarred van de Voort

TL-GAN is an extension of NVIDIA’s pg-GAN which took the machine learning community by storm with its impressive ability to generate hyperreaslitic images of humans.

TL-GAN takes the pc-GAN one step further by deriving a feature vector formed by the latent space that enables us to peer into the “black box”. In doing so, we can adjust the weights of features to generate customizable faces. Use cases include stock photo generation, data augmentation, and smart editing that are all functions of the ability to product realistic, unique images of faces. 

The image above showcases the matrix of feature combinations. These features range from more general such as gender and age to more specific features such as bang length. Using this we can generate hyper realistic images that match our intended use case. 

Link to slides: 

https://docs.google.com/presentation/d/1sq-xTaZPPEkkh1i-434o1nyJQbRxpOPeWMkq3c_AaQc/edit?usp=sharing

Week 02 Research: alterego – Kevin Dai

I came across an especially interesting project being developed at MIT’s Media Lab called “alterego”. It is an AI headset that is able to detect and understand the user’s subvocalization (talking in your head). The headset is equipped with electrodes that receive input from facial and vocal cord muscle movements that appear during internalized vocalizations. These signals are not detectable by the human eye, but the headset can pick up on these subtle movements, and feed the input into a machine learning system that correlates specific signals with words. 

 

The machine is worn on the ear, but spans across the user’s jaw and cheek, in order to receive a variety of facial movement signals. With alterego, the user can effortlessly complete tasks such as interfacing with software. The user will receive output through bone conduction. 

The lead developer, Arnav Kapur, wanted to create an AI device that felt more “internal”, as if it was an extensions of the human body. Kapur sees alterego as a ‘second self’ where the human mind and computer intersect. 

Currently, the prototype displays a 90% accuracy on application specific vocabulary, and requires individualized training. However, Kapur foresees the alterego being seamlessly integrated into our everyday lives, providing us with a new level of privacy and effortless communication, as well as aiding those with speech impairments. 

Video Demonstration:

Project Link: https://www.media.mit.edu/projects/alterego/overview/

Week 02 Assignment: Case Study Research – Andrew Huang

Resnet vs ODENet

Neural Ordinary Differential Equations

One of the most prestigious AI conferences last year, NeurIPS, released several new papers detailing the bleeding edge of modern day machine learning. One of the “top papers” of that research was neural ordinary differential equations. At first at first that seems hard to understand, but I will try to explain that clearly.

A neural network is a universal function approximator, however, the way it trains is in a discrete manner, so that as you add more layers, the amount of computation needed to train it increases linearly. This is very important because in memory constrained environments ( phones, IoT devices) it becomes more and more important to make sure that the device has low memory usage and low power consumption. Big models like VGG are not feasible on these constrained environments, and it becomes apparent what the disadvantages are in not using them. Additionally, bigger models are harder to train because of a thing called “vanishing gradients” (hard to explain).  However, a couple years ago, Microsoft researchers came up with ResNet, where a simple principle was added to FF neural networks to rectify this problem. If we think of a neural network by layer as discrete matrix operations. A forward pass can be represented as  $ latex y_1 = mx+b $. However, we introduce a small change, instead of each layer simply being the output of the previous layer, it also is added on to the result of the last layer. So the next layer becomes $latex y_2 = m_2y_1 + b_2 + y_1 $ with a residual block style NN. This small change introduces some big implications. First of all, now the function can be represented as a Differential Equation, where we can now use Euler’s Method. There are some powerful implications of this, and now we can treat neural networks as continuous functions, and vastly improve their approximations. In the future, if this were to be used, neural networks could greatly improve their performance and training time, and also be more efficient in memory constrained environments

Sources:

https://arxiv.org/pdf/1806.07366.pdf

https://github.com/llSourcell/Neural_Differential_Equations/blob/master/Neural_Ordinary_Differential_Equations.ipynb