Week 02 Assignment: ml5.js Experiment–Yujia Liu

Introduction 

Having looked through all the examples on ml5.js, I think that Body pix is really interesting. According to the documentation, Body pix is an open-source machine learning model and its main function is to distinguish pixels and divide them into two parts—one represent the person and the others represent the background.

Experience

The way to use it is really simple. Take my experience as an example, the project used web camera to record my dynamic image as the input and the output was the same image but the background was black. After that I have an idea: if I use a picture of me as the input, will Body pix still work it out? The result is yes. I then used an image of a cartoon mouse to have another experiment and the result is as follow.

Insight 

I found that Body pix can recognize the part of face and hair well, but it was poor at recognizing my finger. Also there was an obvious gap between the outline of my figure and the background. The process of outputting image was not very smooth, but intermittent. Thus it’s not completely instant. The accuracy of the result mainly depends on the model size and output stride, but the size of original image can also have effect on the accuracy. As TensorFlow says on Medium,  “The lower the value of the output stride the higher the accuracy but slower the speed. The larger the value, the larger the size of the layers, and more accurate the model at the cost of speed.”

Application 

At first I had no idea about its significance and how to put it into practice after my experience. It can only remind me of a potential automatic way of separating the selected object from the background on Photoshop. Thus I did some deeper research about the application of Body pix and found that it can be applied to augmented reality and artistic effects on images or videos.

If the issue of accuracy and speed can be solved, I suppose it could be used as a tool of virtual fitting. Since I really hate changing my clothes frequently in the fitting room and different conditions of the environment can cause different outlooks of the same people and clothes. But if we can use this technology to enhance augmented reality, things might be changed. The consumers can see what they look like in new clothes by AR without going to the fitting rooms or even the mall. And also Body pix can change the background of the image, so they can use it to change the light, color or other elements to have a more comprehensive understanding.

Reference:

https://medium.com/tensorflow/introducing-bodypix-real-time-person-segmentation-in-the-browser-with-tensorflow-js-f1948126c2a0

https://ml5js.org/reference/api-BodyPix/

Assignment 01 Crystal

Google translation with computer vision

Background 

When you are in a foreign country, the most difficult problem must be the language, especially for daily dialogue. Therefore we need translation tools to help us communicate with others.

The traditional way of translating is to type the content of what you want to translate into the translation software. On pressing the button, you will get the expected answer. But this traditional approach takes time and delays because you need to spend several minutes typing things so it is not suitable for everyday communication. But if you could translate the words you saw, what you said, or even what you want to say in your mind immediately and directly. It would be a lot more convenient.

And these are all implemented step by step through artificial intelligence technology. One of its progressed functions  is to use a camera to capture text, and the software can recognize the text in the image. The user can select the part on the screen and translate it in one click. The other is instant translation, which only needs to point the camera at the text to translate it on the original dynamic image.

Progression 

So far, since the application of computer vision has enhanced the ability of computers to understand and recognize images, Google’s photo translation function has been further improved. It is no longer limited to text translation, but extends to the content of the image itself. The content of the picture can be automatically described or summarized in the target language, including the wearing and expression of the character. 

Comprehension 

How to achieve this effect? By watching the video about computer vision, my comprehension of the process is as follows. The program analyzes the image layer by layer, and each layer has a different focus. The difference in image brightness can be used to derive the outline of the object. Finally, the results of the analysis of the layers are combined to give the most probable results. Of course, the basis of all analysis must be a huge database. Without a large set of data, there is no basis and support for analysis.

Significance

According to the related materials, the goal of computer vision is to enhance the computer’s ability of recognizing and understanding images until it is infinitely close to the human visual system. The case of google translation shows that the goal is practical and the improvement and application of computer vision will considerably make multi-language communication more convenient. 

Link of my presentation:

Week 01 Assignment: Case Study Presentation–Crystal

The project I’ve found is Watson Beat, which is from IBM. IBM is an American multinational information technology company, and it mainly focuses on computer software and hardware. According to IBM’s website, IBM has created Watson, an open platform powered by machine learning. One of its main functions  is to let people automate the AI lifecycle. Watson Beat is a version of Watson and it focuses on composing melody.

Watson Beat can create complicated compositions on the basis of the input simple notes. What the user needs to do is to play a simple piece of melody as the input. Then Watson Beat will get the sound and generally analyze it in a short period. After that the user can choose different mood of the melody such as dark, romantic, and amped. Finally Watson will output a track based on what it heard from the users.

Here is an example showing how it works:

And the following videos display the compositions created by Watson Beat.

Watson Beat’s development is based on two methods of machine learning——Reinforcement Learning and Deep Belief Network. Generally reinforcement learning is used to get the best possible path in specific situations and maximize the reward. DBN is an unsupervised probabilistic deep learning algorithm, and it can create a function to achieve users’ goals. The role of reinforcement learning here in Watson Beat is using western music theory to build reward functions. DBN can train on a simple input and then create a complicated track.

Even though Watson Beat can create lots of rich and great melody, its main function is still to inspire composers but not to completely create melody and then replace the composers. When musicians or composers are struggling to build the first idea of a song, they can just randomly play some notes. And then Watson Beat will make the melody richer, which might give them inspiration. 

I’ve had a deeper understanding of how artificial intelligence and machine learning indeed benefit people in the creative area, especially of Reinforcement Learning and Deep Belief Network. Artificial intelligence is more like assistant for human in essence, at least in creative field. The full use of artificial intelligence can improve works of art and make them better. Because the analysis of AI is based on the professional theories and large data sets, which makes it more reliable. Human is still in the dominant position to decide what they want artificial intelligence to do to assist them. AI outputs an example or prototype and the artists will make the prototype a complete work. So there’s no need to worry about whether the development of artificial intelligence could replace human in creative activities. 

I’d like to quote Elizabeth Transier, IBM THINKLab Director, a sentence as the end of my blog, since I really believe his opinion about how artificial intelligence is benefitting us human. As he says, “Watson Beat is a great example of how IBM cognitive technologies are starting to augment human capabilities and help us reach new capabilities.”

The link of my presentation: 

https://docs.google.com/presentation/d/1-6eAQqNOhDy0j46je3fcbOvEIIyCvHCV8lTbsZAqhoY/edit#slide=id.p

Reference:

https://medium.com/@anna_seg/the-watson-beat-d7497406a202

https://www.ibm.com/case-studies/ibm-watson-beat

https://www.geeksforgeeks.org/what-is-reinforcement-learning/

https://medium.com/datadriveninvestor/deep-learning-deep-belief-network-dbn-ab715b5b8afc