Week 06: Mojisaic (Midterm Proposal) – Jinzhong

Background

In today’s Internet environment, privacy is a really important topic. When participating in lots of social media, instant messaging apps, etc, we are often asked to upload or be motivated to share our looks — like Twitter, Instagram, or Weibo, WeChat. However, the multimedia sharing is a high threat to our privacy. To hide their personal information from strangers, some users block viewing requests from unfamiliars, or put mosaic on parts of their face. However, the mosaic is not always the best, since it also block the expressions on user’s face. So, I want to find a way in which we can keep our privacy as well as our expressions.

Project

So, here comes my proposed project Mojisaic (Emoji Mosaic) — a network that detects the expressions or emotions on the face to select a right emoji. (It is a image classifier basically). Then, using library like OpenCV to replace the faces in the image by emojis. The network implements CNNs, as well as Facial Feature Point detections, and classify the emotions as happy, sad, angry, crying, etc.

Reference

  1. https://towardsdatascience.com/face-detection-recognition-and-emotion-detection-in-8-lines-of-code-b2ce32d4d5de
  2. https://github.com/priya-dwivedi/face_and_emotion_detection
  3. https://arxiv.org/abs/1503.03832
  4. https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data

Week 05: Train CIFAR-10 CNN – Jinzhong

INTRODUCTION

This week’s assignment is to tr

ain a CIFAR-10 CNN ourself by using tensorflow(keras) with its built-in CIFAR-10 dataset downloader. In this experiment, I mainly tested the settings of batch size, as well as the optimizers to explore how these factors related to the changes in the training result.

MACHINE

The machine I used is Azure Cloud Computing Cluster (Ubuntu 18.04):

  • Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz x2
  • 8GB Memory
  • 4.15.0-1055 Kernel 

NETWORK LAYOUTS

The network has the following architecture:

2DConv -> 2DConv -> 2DMaxPooling -> Dropout ->

2DConv -> 2DConv -> 2DMaxPooling -> Dropout ->

Flatten -> FullyConnected -> Dropout -> FullyConnected -> Dropout 

The architecture looks good to me, it has 4 convolutional layers to extract features from the source images, plus using dropout to question layers in order to find the exact feature point of each type of picture.

EXPERIMENTS

Firstly, I tried to modify the batch size to 64 (the default 1024 is so scary…), and I get the following training outcomes:

The loss is mostly above 1, and the final accuracy is not good enough, so I again narrowed down the scale of batch size to 32, the result as followed was better than the first trial:

2

Now, the accuracy is 4% better than the previous one. So, it gives me a question: is 32 batch size small enough to get a good answer? Next, I repeatitive divided the batch size by 2 to 16 to test if it is true that small batch size does a better job in this scenario:

The outcome is positive, the accuracy is above 70% this time. So, we assume that smaller batch size can accordingly improve accuracy at this point. (But it cannot be always small. We can imagine that 0 batch size contributes nothing…)

The second experiment is to explore whether RMSprop is a proper optimizer in this scenario. So I use a 32 batch size, 10 epoch of training with 2 different optimizers – RMSprop and Adam(lr=0.0002, beta_1=0.5). 

During testing, we superisingly found that Adam optimizer is much better than the previous one in this network architecture. It reaches 70% accuracy only when it steps to the 5th epoch.

And the overall accuracy after 10 epoches of training gets 0.7626.

Week 04: Artificial Neural Network vs. Biological Neural Network

INTRO

When searching Neural Network on the internet, we have got 2 results – ANN (Artificial Neural Network) and BNN (Biological Neural Network). The latter one is the natural cells and connections in lots of creatures as well as human. They are made by (so called) gods’ hands. In this biological neural network, multiple neurons convey informations with bioelectricity under the assistant of some biochemical medium. In hundreds years ago, human has been dreaming about becoming the cepheid of intelligence. They invented machines to release human from labor. And they wanted more — they wanted machines to be smart enough in order to take care of almost everything human needs to do. Just like an ancient Chinese story — 女娲抟土造人 (Nvwa created human according to her own body). Human designed Artificial Neural Network according to Biological Neural Network — to make machines think.

SIMILARITIES

Since ANN is built as the structure of BNN, it basically has similar look as BNN. They are both the composition of neurons (BNN is consist of neurons as cells well ANN is consist of neurons as nodes and edges). Multiple neurons switch informations by electric power to send or receive variables from their ancestors and pass it to their descendants. 

ANN-BNN

(cited from: https://blog.knoldus.com/first-interaction-artificial-neural-network/)

In the picture above, the left side shows a biological neuron, and the right side shows the connections of multiple nodes(neurons) in ANN (we also call this fully-connected layer in this case). Each node stores a very simple and single-dimensional data xi, and outputs the result from an also very simple equation like yi+1=wi*xi+bi. In this scenario, each node (neuron) only does very small part of the job, but when we connect billions of nodes together, it becomes a super powerful self-fitting function after back propagations or reward functions, which help creatures/machines to make logical decisions.

DIFFERENCE

The scale of network is a huge difference. In human nervous system, there are over 100 billion neurons and over 7,000 synaptic connections within for each node (Cited: https://aiimpacts.org/scale-of-the-human-brain/). However, machines only owns few thousands or millions of computational nodes, let alone the limited connections between them. The storage and computational speed of human neural network is much more stronger than that of machines. Furthermore, all human do to create ANN nowadays is to make math formulas to make ANN acts locally. In another word, normal machine learning networks are always right — it is a huge different from human. They are taught by human to be good, but are not eligible to recover from failures. We learn from some essays that proves some network to be self-recoverable and self-extendible. But it should take time.

Week 03: Track Your Head – Jinzhong Yu (jy2122)

PROJECT

DEMO: https://nhibiki-nyu.github.io/AIArts/Week3/dist/

SOURCE: https://github.com/nhibiki-nyu/AIArts/tree/master/Week3/

DESCRIPTION

For the week3 assignment, I made use of BodyPix to detect the position of the player’s face with the help of WebCam. BodyPix is a network that cuts human’s outline from a picture which can also help to detect the human’s position in one picture.

So, in this project, I started the Web Camera to get a real-time picture, and send the picture to the input of network in order to detect the real-time position of player’s face. Then, I integrate the position data with a game – Pong. The real-time position of face can be a controller for the board. When player move the position in front of the camera, the board will shift along with the player.

pong die
pong die

SHORTAGE

When I was building this project, I faced (mainly) two problems:

  • weak computational power of my OLD late 2013 13” Macbook Pro.
  • other objects that in front of the camera might interfere with the detection.

For the first problem, I read one blog from Google posted early this year. It has mentioned that with the help of tfjs, the 2018 15” MBP has framerate of 25/s. So when I run the website on another laptop, it has better performance.

And for the second one, I want to introduce FaceNet, an other network that can distinguish different faces in one picture. So we can just track one face in the camera.

Week 02: ml5js Experiment – Jinzhong Yu (jy2122)

INTRO

For me, ml5js is not a complete new stuff. During last year, I started to contribute to severial open source project, including ml5js. So, it is my second time to ’embrace’ this elegant library.

ML5.JS is a frontend-machine learning library that helps beginners to get in touch with machine learning rapidly. It is based on tensorflow.js but cutting out most of the mistery parts, making it a simple but useful tool for all people. With this library, you might not want to customize your own models (nodes, layers, tensors, etc.) But it is very easy to load pre-trained models and beautifully serve the intelligence of machine learning simply.

INTEREST

When it comes to my interest in ml5, I will talk about two of my favourate networks — SketchRNN and GAN(in this case, I mean DCGAN). The former one use the creativity of RNN to let machine create sketch step by step. And the latter one using deep convolutional GAN to generate picture on its own.

Although, with the limited computational power of browser, the output does not always perfect:

cat generated by SketchRNN
picture generated by DCGAN

The first is generated by SketchRNN (what a lovely cat!). And the second one (emmmm), I do not know what it stands for. Maybe its the deeper thought of machines?

CONCLUSION

Introduce machine learning to the web interface is a significant movement and still needs a lot of efforts. Web is the largest entry of the whole internet and the widest gate between human and machines. So, it could be very fancy and meaningful if web can be equipped with high quality machine learning features.