Week 6 VR/AR Assignment by Jonghyun Jee

The most amazing part of Blaise’s TED talk is that this video was released more than twelve years ago. Considering the fact that, back then, iPhone was not even around and many people were still using Yahoo, the technology that he showcased is more than impressive. Near the end of the video, host Chris Anderson asked Blaise: “what your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to link together?”—Blaise’s answer was firm yes. And yet, despite its innovativeness, Photosynth did not enjoy much popularity. This is partially due to the lack of a platform to utilize the captured contents and Microsoft’s prioritization of the hardware aspect of smartphones.
It was indeed an ambitious project to create 3-D visual platform by weaving together threads of 2-D images, however, the amount of data and the prevailing technology of the time were not sufficient to shape day-to-day operations. Thanks to the rapid development of artificial intelligence, 3-D filming techniques, and data processing, we have become able to substantialize what Blaise envisioned then. Now we are in a time of transition to interconnect and utilize the richness of semantic information embedded in graphic images. By doing so, we can blur the line between reality and digital world, redefine the dynamics of how we look at and interact with the world.

Week 5: Train CIFAR-10 CNN by Jonghyun Jee

Introduction

This week’s assignment is to train a CIFAR-10 CNN on our own, based on what we’ve learned in the last class. By trying different values for batch size and the number of epochs, I could find some noticeable characteristics of deep learning.

Machine Specs

CPU: Intel® Core™ i5-6200U CPU @ 2.30GHz

GPU: Intel® HD Graphics 520

RAM: 8.00 GB (7.08 GB usable)

Experiments

First, I want to address why terms such as batch size, epochs, and iterations were introduced in the first place. We use these concepts when the data we deal with is so large that we can’t pass all the data to the computer at once. So to figure out this problem, we should divide the data into a smaller size and give it to the computer one by one, updating the weight of the neural network at the end of each step to fit it to the given data.

Batch size and the number of epochs will, of course, depend on a variety of factors such as the size/type of training data, hardware performance, and so on. For this week’s assignment, I tried three experiments playing around with seemingly extreme values.

I began with standard starting points (batch size: 64, epochs: 20).

It took 68 minutes to complete this training and yielded the result of 0.7189. If given more time (or better computer), I’ll try to expand the number of epochs till the point it converges. This result was not unfavorable, but still not met the expectations–as our in-class example Fashion-MNIST yielded a way better accuracy within a shorter amount of time.

Then tried to see how this works (batch size: 1024, epoch: 1).

Ended in three minutes, yielded a poor result (0.287).  1024 will be a bit much to my computer.

And also this one (batch size: 1, epoch: 10).

Took more than 20 minutes for a single epoch, yielding 0.3589. Had to cancel right after the first epoch.

After going through a number of training processes, a question came up in my mind: is there a better way of finding the most efficient “batch size” and “epoch,” without trying to change values and re-train everything over and over?

Conclusion

Arguably, it’s hard to generalize something out of these small number of results; and yet, combined with a bit of research, I could observe some interesting features:

  1. Larger batch sizes result in faster progress in training, but it doesn’t mean that we should maximize. We have to consider our machine performance, and on top of that, we should keep in mind that larger batch sizes don’t always converge as fast. Smaller batch sizes train slower, but can converge faster. So we can probably start with the highest batch size without exceeding memory, and then lower it if it takes longer than a minute per batch.
  2. As the number of epochs increases, more number of times the weight are changed in the neural network and thus the curve goes from under-fitting to optimal to over-fitting curve. So for epochs we can start with a number between 5–10 and if the loss doesn’t lower then stop training and set that epoch size.
  3. Training took longer than I had expected. If failed to set good values on batch size and number of epochs, it’s definitely going to be time-consuming and inefficient. 
  4. There are some codes that help users calculate an optimal batch size; but still, since all data size/type and the objectives of the code vary, we anyway have to put values and try to train multiple times so we can see what would be the most optimal batch size for our own project.

Week 5 VR/AR Assignment by Jonghyun Jee

It has been three years since the Oculus Rift made its first foray into the market. VR has received both huge attention and criticism—some say it’s revolutionary; some say it’s overrated. And recently, the latest advances in the VR/AR field were showcased in Oculus Connect 6. During the two-hour long conference, three keynotes were particularly interesting for me.

Mark Zuckerberg himself took the first keynote. He stressed that virtual reality will be the next generation of computing platforms, noting that profits from the Oculus Store so far have exceeded $100 million. I agree with what he said because I’m also convinced that VR/AR has potentials to entirely reshape our lifestyle just as smartphone did. And definitely Facebook seems ahead of the curve at this point.

Mark Zuckerberg then introduced the Oculus Link. Earlier this year, Oculus announced PC-based VR headset “Oculus Rift S” and all-in-one VR headset “Oculus Quest,” which caused inconveniences for those who owned both devices because their libraries are incompatible; the Oculus Quest, which cannot be connected with PC, seems to remain in a transitional stage. New Oculus Link directly connects the Oculus Quest to your PC with a simple USB cable and also makes it compatible with your existing Rift library. This is pretty interesting because it sounds like Facebook basically marked the end of Rift. If Quest shows graphic quality as great as Rift does, there seems no other reason to choose Rift.

Another major announcement was the introduction of hand tracking. Until now, the Oculus headset could only be operated by controllers, which are limited in many ways. The Oculus Quest’s front sensors, according to Mark, will be updated to track the user’s hand motion directly. He added that until just six months ago, PCs, external sensors, headsets and controllers were required for full VR experience, but soon everything becomes possible with just a single Oculus Quest. This is definitely an amazing technology since it will allow users to perform more delicate and subtle interactions. I’m excited to see how this technology will bring new possibilities that we could hardly imagine before.

He also gave a shout out for CTRL-Labs, which was acquired by Facebook recently. Their research interest lies on converting signals from the brain into digital through a simple hardware such as wristbands, so a user can operate a device by thinking—without any traditional human-computer interaction. I’m particularly interested in this announcement, as this brain-computer interaction—if it functions properly—will dramatically reshape the whole digital environment. Many people will be concerned of its use though, because it’s literally invading your brain, your inner thoughts and going to do what you want your computer to do. Still a quite distant future but I’m wondering whether a computer will become able to send signals directly into a human’s brain. If it happens, we are literally building a Matrix by ourselves.

Andrew Bosworth, Facebook’s augmented and virtual-reality vice president, delivered the second keynote. He introduced an anecdote about a father who used VR headsets to watch basketball games with his son, and talked about how virtual reality actually makes its place in our everyday lives. Facebook Reality Labs announced that it’s currently developing an AR device that would create real-time images of characters, which I found similar to the ongoing Telepresence research in NYU Shanghai. Spatial constraints might become a nostalgic idea sooner or later.

Michael Abraham of Facebook Reality Labs, the last keynote speaker, emphasized the future of VR/AR from more a realistic point of view. Referring to Hofstadter’s Law—“it always takes longer than you expect, even when you take into account Hofstadter’s Law”—he expressed the prospect that virtual reality will be the most common and attractive technology in the next 50 years, but more time is needed at this point. We might be viewing the world through rose-colored spectacles; and yet, pondering about those spectacles are always tantalizing.

Below are the images I found/took in search of finding a spot in Shanghai where does not have any human influence.

A 17th-century painting showing the city wall of the Old City of Shanghai and the river port outside the wall; hundreds of years ago but still we can see how humans already altered the city.

A picture I took ago in West (or East) Nanjing Road. It’s funny how all of the objects shown in this picture are completely anthropogenic. Nothing here seems natural.

Nearby our school, things hardly seem natural. I tried to roughly photoshop this image with another picture I took in Inner Mongolia:

If all signs of humans or any human-made artifacts are removed, it may look as such. A number of trees on a plain, with much bluer sky as a background.

Week 4 Writing Assignment by Jonghyun Jee

Neural network, as its name indicates, is a model inspired by a neural structure of human—especially visual and auditory cortex. The underlying mechanism of the neural network is similar with that of human cognition, as it creates several layers, puts cells into them, and connects these cells with each other; each cell receives a signal and transmits it to the next neuron. In depth, however, there are a number of characteristics that make an artificial neural network and a biological brain disparate. A human cerebral consists of more than a hundred billion cells, and of course, current technology cannot simulate this number of neurons. Although the model of an artificial neural network itself began by simulating the structure of a biological brain, there are differences in terms of their structures other than the difference between the numbers of neurons of each system.
A biological cell has an all-or-none characteristic; stimulated under its threshold, it will not show any response at all and will react only if the stimulation is above its threshold. Stronger stimulation does not increase the size of its response, but rather increases the frequency of its response, which in fact can be considered somewhat similar to step function, or so-called Dirac Delta function. One problem is that, since it is discrete and thus non-differentiable at x = 0, a model based on the unit step function cannot intelligize itself through deep learning. That is why engineers in the past used other step functions such as sigmoid function and hyperbolic tangent function, which are smooth and differentiable. The sigmoid function is, arguably, similar with a step function; and yet, it is very different from a biological brain, which cannot function without the passage of time. In the case of artificial neural networks, the output value is set regardless of the passage of time, given a certain input value.
The question of whether artificial neural network is similar with human nerve system, in my opinion, should not be stuck to one-to-one comparison between them. The reason why scientists developed the neural network is to do tasks that human brain finds difficult to process. From the evolutionary point of view, human brain has evolved for the purpose of survival and preservation of the species; not for memorizing a hundred pages of text in a glance or performing highly complicated math operations. Artificial intelligence—optimized for these quantitative tasks—will lead us to overcome human limitations and present a new angle on the issues we are dealing with, even to the realm of art.

Weekly Assignment 3 by Jonghyun Jee

  1. Lucid Space Dreams

“Lucid Space Dreams,” one of the world’s first HD virtual reality music visualizers, brings the audience to an imaginary dreamscape filled with a myriad of celestial bodies, underwater creatures, and surreal scenery. In this two-minute long video, a user can experience taking a stroll along the dreamlike pathway. After I tried it on Oculus Quest, the first thing I noticed was that this VR video doesn’t have any user interaction other than turning one’s head. Since its title is not just a generic dream but a lucid dream, it’d be much more interesting if it gives users more interactive options. Another notable feature is that, most of the objects visualized here seem pretty a long way away—and by doing so it minimizes potential parallax. After the pathway ends, I’m cast in the middle of the space as if I’m floating through the outer space; mystic background music also intensifies as the video progresses. It was my favorite part of the project because it gave me the immersive experience that I could hardly ever imagine realistically. Overall, the way how they weaved together a various visual and audio elements was fascinating.

2. In the Eyes of the Animals

“In the Eyes of the Animals,” a sensory visualization of how other species view Grizedale Forest in the North of England, is a new and striking project in terms of not only its graphics but its concept. We can’t see the way how frogs, owls, and mosquitoes see the world; it’s impossible to imagine it even. Virtual Reality, however, allows us to explore what’s beyond our realm of human vision. As most of the stereoscopic issues happen because we are trying to model human vision, it might be an interesting approach to deviate from the traditional ways and get inspiration from non-human vision components. The fact that users cannot see what Grizedale Forest actually looks like was a downer for me. It’d be more interesting if a user can switch between different perspectives, like changing camera filters. The piece is set to a binaural soundscape, the developers said, but I couldn’t tell a stark difference between this audio and other stereo soundscapes (probably I should’ve turned the volume up.)

3. The Big Picture: News in Virtual Reality

A flat TV or a computer/mobile screen has been the only way I could catch up the news. And here comes the project “The Big Picture: News in Virtual Reality,” which presents a vision of how the news of future will be like. This project brings the viewers to an actual scene so that they can see what the news is all about—in this video, the news is about Puerto Rico and Los Angeles. It has a lot of potentials to reshape the current journalistic media, as immersive experience it gives is far greater than being an idle spectator of matters. As this project is a prototype, there’s still room for improvement; sometimes the commentator’s words have passed me by, as I was a way too focused on glancing around what’s happening. If there is a better way of improving this sort of information overload, I’m definitely interested in subscribing this channel.