MLNI Week 1: Zero UI research – Alex Wang

Reading reflection:

Computer vision is a game changing technology that have already entered our daily lives, applications being the scanning of QR codes or the recognition of car number plates replaces physical labour of humans at a parking lot. Just recently our campus and dorms also started the usage of facial recognition as an alternative to scanning student ID cards. Aside from these practical impacts on our daily lives, computer vision also has many applications to the creation of art. Having the power to recognize objects, it gives the computer power to perform specific operations using its understanding of the images, as opposed to traditional image manipulations where the computer only reads in pixel values but not understand what is being processed. I think the most obvious example would be applications that manipulates the human face, after recognizing that it is the face of a human, while leaving the rest of the image alone.

Zero UI project research:

After some research on the recent developments of Zero UI projects, I came across a project by googles advanced technology and projects team which they named project Soli . Project Soli is a chip that uses miniature radars to sense hand gestures and is exactly what I would consider the future of zero UI. Users can now control their devices without physical contact with their devices, and have all their control gestures be natural, as if controlling a physical device.

Technology:

I believe that the chip collects radar information of hand movements, then uses software to make interpretations of what the gesture means, I believe this could definitely benefit from computer vision/machine learning since the interpretation component of this technology requires the computer to predict what gesture the user is trying to input.

Current application:

Project Soli started around 2014. But just recently, Google is planning on releasing their newest phone model called “Google Pixel 4“. This phone is one of the most anticipated phone of 2019, as it planned to incorporate the Soli chip into the phone. There are many leaks and rumors online, building up a lot of tension before the release of the phone, which is expected to be in September/October 2019 which is really just around the corner.

Connection to Zero UI and potential future applications:

I think this technology could be very interesting and useful as it provides a very natural way to interact with machines, just like the ideas of zero UI. It could also be used in creative ways such as new control for gaming, or new tools for art creation. Just the new way of interaction opens up doors for endless possibilities on the application of this new interaction technique.

Videos:

Sources:

https://atap.google.com/soli/

https://www.techradar.com/news/google-pixel-4

https://www.xda-developers.com/google-pixel-4-motion-sense-gestures-leak/

Assignment 01 Crystal

Google translation with computer vision

Background 

When you are in a foreign country, the most difficult problem must be the language, especially for daily dialogue. Therefore we need translation tools to help us communicate with others.

The traditional way of translating is to type the content of what you want to translate into the translation software. On pressing the button, you will get the expected answer. But this traditional approach takes time and delays because you need to spend several minutes typing things so it is not suitable for everyday communication. But if you could translate the words you saw, what you said, or even what you want to say in your mind immediately and directly. It would be a lot more convenient.

And these are all implemented step by step through artificial intelligence technology. One of its progressed functions  is to use a camera to capture text, and the software can recognize the text in the image. The user can select the part on the screen and translate it in one click. The other is instant translation, which only needs to point the camera at the text to translate it on the original dynamic image.

Progression 

So far, since the application of computer vision has enhanced the ability of computers to understand and recognize images, Google’s photo translation function has been further improved. It is no longer limited to text translation, but extends to the content of the image itself. The content of the picture can be automatically described or summarized in the target language, including the wearing and expression of the character. 

Comprehension 

How to achieve this effect? By watching the video about computer vision, my comprehension of the process is as follows. The program analyzes the image layer by layer, and each layer has a different focus. The difference in image brightness can be used to derive the outline of the object. Finally, the results of the analysis of the layers are combined to give the most probable results. Of course, the basis of all analysis must be a huge database. Without a large set of data, there is no basis and support for analysis.

Significance

According to the related materials, the goal of computer vision is to enhance the computer’s ability of recognizing and understanding images until it is infinitely close to the human visual system. The case of google translation shows that the goal is practical and the improvement and application of computer vision will considerably make multi-language communication more convenient. 

Link of my presentation:

Week #1 Assignment Research Work — Lishan Qin

presentation Link: https://docs.google.com/presentation/d/1_2rswKp1qMv_yTluGUt5Egbz3Z8np1Km0AJxH_vETlU/edit#slide=id.p 

Research about the application of computer vision: A Film Directed by an AI Director Eclipse

These readings and TED Talks about computer vision reminded me of an animation I watched in the summer produced by Netflix called “Carol and Tuesday”. It mainly talks about a story taken place on Mars 100 years later from today. Since it’s a story taken place in the future, I saw a lot of visionary sci-fi products in it with great relation to computer vision. One of which I found most bold and fascinating is the application of an AI film director. I found this idea to be so bizarre and ambitious so I did a research on it. It turns out that such application already exists today. AIs have begun directing films for many years, some of the films are even not that bad. In 2016, an AI called Eclipse won an award at Cannes Lions Film Festival for its own short film, challenging viewersby asking the questions “can a film made by machine move you?”. 

What is it?

The film put together a different kind of “film crew” comprising A.I. programs including IBM’s Watson, Microsoft’s Ms_Rinna, Affectiva facial recognition software, custom neural art technology and EEG data. Together, they produced the film “Eclipse,” a striking, ethereal music video that looked like a combination of special effects, photography and live-action. The movie is conceived, directed, and edited all by machine. In the behind the scenes video, we can hear the team members explain how they teach the machine to tell a story.

How did the AIs do it?

Emotional theme

How can an AI grasp and decide an emotional theme of the video? The answer is through the analysis of music(BGM). Music is a crucial part of every film. Thus, knowing the emotional intent in each song is of great importance to the AI. By breaking down the song and analyzing each line and tone of the lyrics with the help of IBM Waston and EEG, the AI is able to lock the emotional theme of the music and decide the emotional theme of the music.

Narratives

Another AI program different from IBM Waston and EEG was applied when creating the narratives of the film. As the most important thing that a director do is to answer questions, the programmer team use Mircrosoft’s AI chat box Ms_Rinna to write the narratives by asking it questions and recoding its answers. Ms_Rinna was used to give the narrative direction on everything of the film from characters to settings on set.

Casting

Using the lyrics and guides and the answers provided by Ms_Rinna, the story of the movie was in place. After that comes the casting. AIs are also in charge of the casting by having all of the actors wearing EEG machine and capturing their performance in close up to get facial recognition and ask the machine to align with the emotional theme and choose the actor. The AI ended up choosing the exact same people the team chose.

Shooting

The AIs are also capable of shooting mostly on its own. With the help of MUSE EGG, IBM, WATSON, AFFECTIVA API and PRENAV drones, the AI also shoot the film on its own.

Editing

Finally, there is editing. Equipped with the ability of image detection, motion detection, the AI is able to process a huge amount of shots and give out dozens of cuts. With open-source library, the AI is also capable of applying different visual filters to interpret the director’s vision.

Thoughts

The reason why I find this project so amazing is that it is such an ambitious program that has combined computer version, big data, machine learning, and many other advanced technologies together, to create an actual piece of artwork that is so eye-opening and revolutionary. It showed us the great potential of the application of AIs in future art creation, proved the possibilities multiple AIs working together can achieve, as well as challenged us to rethink the relation between humans workers and AIs working in the field of art. 

Behind scene video of Eclipse:

https://www.youtube.com/watch?v=XZbcxsHb4Y0

Source: 

Eclipse, the world’s first AI produced short film hits the screens at Cannes

https://variety.com/2017/artisans/production/production-workers-ai-1202447872/ 

https://adage.com/creativity/work/anni-mathison-eclipse-behind-scenes/47918?

Week 1 MLNI – Response to Golan Levin & Presentation Assignment (Cherry Cai)

Reading and Video Assignment Response

According to Golan Levin’s article and his speech, the development of technology and algorithm has been transformed much of the human activities into a form of interactive art. Not only allowing people to “communicate” with the machine in various forms but also see another sort of interpretation of their activities under the scope of the computer vision. Take for instance of the use of body gestures, sounds, and other human behavior as input and through a complicated transformation process, people are able to be their own actor in the new pattern of art and perceive it through innovated ideas. As the machine is able to simulate human behaviors, I wondered whether advanced technology is able to let machine self-generate new form of art without the intervention of human through learning by itself. 

Presentation: Faceless Portraits Transcending Time by AICAN + Dr. Ahmed Elgammal

Faceless Portraits Transcending Time was presented by HG Contemporary, New York during February 13 – March 5, 2019. This was an art collaboration between an artificial intelligence named AICAN and its creator, Dr. Ahmed Elgammal. “The exhibition shows two series of works on canvas portraying uncanny, dream-like imagery generated by AICAN excavating the ageless themes of mortality and representation of the human figure” (HG Contemporary). The two series have examined into two different topics of human-machine collaboration: first, “the joint effort between man and machine as a historically specific moment in the chronology of image-making”; second, “a focus on how artificial intelligence serves as a mirror to reflect human (sub)consciousness”.

    • AICAN

AICAN is a complex algorithm that draws from psychological theories of the brain’s response to aesthetics and using art historical knowledge to create new artwork without human intervention. It can be working without an artist collaborator which will automatically choose a style, subject, composition, colors, and texture of its work. Due to this combination of knowledge with its independent creativity, AICAN facilitates a new way for artists of the past and present to engage in dialogue across centuries of art history.

    • Summary

The best way to avoid being judged on the merits of a work of art is to make it novel and unexpected. While machine learning can chronologically arrange artistic portraits of styles including Renaissance, Baroque, Realism, Impressionism, etc., it can strengthen its ability and create a new form of art through learning by itself. In addition to this remarkable achievement, technologies such as AICAN can predict upcoming art trends based on current popular art techniques and styles, which makes AICAN and similar artificial intelligence a valuable business in the future art field.

Presentation Link

https://docs.google.com/presentation/d/1K2i6qw1hEAVK-20pA37dUQO68Vrh4Hxd4dHyp1hM7CI/edit?usp=sharing

Reference

HG Contemporary, New York, Faceless Portraits Transcending Time. (2019). [PDF]. Available at: https://uploads.strikinglycdn.com/files/3e2cdfa0-8b8f-44ea-a6ca-d12f123e3b0c/AICAN-HG-Catalogue-web.pdf

Bogost, I. (2019). The AI-Art Gold Rush Is Here. [online] The Atlantic. Available at: https://www.theatlantic.com/technology/archive/2019/03/ai-created-art-invades-chelsea-gallery-scene/584134/ 

Week 1 Machine Learning for New Interface case study(Shenshen Lei sl6899)

Introduction of Deepfake

Academic research related to Deepfake lies predominantly within the field of computer vision, a subfield of computer science often grounded in artificial intelligence that focuses on computer processing of digital images and videos. An early landmark project was the Video Rewrite program, published in 1997, which modified existing video footage of a person speaking to depict that person mouthing the words contained in a different audio track.[6] It was the first system to fully automate this kind of facial reanimation, and it did so using machine learning techniques to make connections between the sounds produced by a video’s subject and the shape of their face.

The modern researches focus on creating more realistic and more natural images. The term Deepfake originated around the end of 2017 from a Reddit user named “Deepfake”. It was quickly banned by YouTube and other websites. The reason was that people used materials maliciously to create porn videos while many celebrities were threatened by it. Deepfake then faded out from the entertainment area.

How Deepfake works?

Principle: Training a neural network to restore someone’s distorted face to the original face, and expecting this network to have the ability to restore any face to the face of that person.

Problems:

-The sharpness of picture declines under the algorithm.

-Cannot recognize the face in some rare angles

-Quality of generated faces highly depends on the learning materials.

-The generated face cannot fit in the body in videos.

Improvement–Gan

Gan is a new training method developed by Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen.

In their website, they described the Gan as following “We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality.”

Drawbacks: Gan brings many unpredictable factors.

Latest debate: ZAO

ZAO is an application that uses Deepfake technology to change faces. It maps the user-submitted positive selfies to clips of movie and TV programs, replacing the characters’ faces and generating realistic videos. The application became viral in a few days. However, public cast doubt on the user agreement and worry about the safety of personal information. One provision in the agreement said that the portrait right holder grants ZAO and its affiliates “completely free, irrevocable, permanent, transferable and re-licensable rights” worldwide, including but not limited to: Portrait rights of portrait rights holders contained in portraits, pictures, video materials, etc. Users have to agree with the treaty before using the app. In China, pornography was strictly limited, but the point is that facial payment was widely used. Thought the company quickly changed the agreement and acknowledged that the facial payment technology will not be broke by photo, there will be more potential threats.

presentation: https://drive.google.com/file/d/1CaJ6DsXWfC9OZxWvS1J8ed-MKRM6PZ70/view?usp=sharing

Sources

https://www.bbc.com/zhongwen/simp/chinese-news-49589980

https://en.wikipedia.org/wiki/Deepfake#Academic_research

https://github.com/deepfakes/faceswap

https://arxiv.org/abs/1710.10196