CartoonGan on the Web | Final.documentation- Casey & Eric

Logistics

Group Member  –  Casey, Eric

Github Repo  –  CartoonGAN-Application

Previous Report  –   Midterm Documentation

Proposal  –  Final Proposal

GIF.Sample - OriginalGIF.Sample - ChihiroGIF.Sample - paprikaGIF.Sample - Hayao

Background

Motivation

For this stage of this project, we want to further refine our work done up to this point, including 1) our web interface/API to the CartoonGAN models and functionalities; 2) the web application utilizing CartoonGAN, which would have more layers of interaction and possibility to it with the new features we plan for it to have.

We sincerely hope that through these refining work, CartoonGAN can finally become a powerful and playful tool that can be used by learners, educators, artists, technicians, so that our contribution to the ml5 library would truly help others, and spark more creativity in this fascinating realm.

Methodology & Experiments

Gif Transformation

Developing gif transformation on a web application is a more demanding task than we imagined. Due to the fact that there isn’t any efficient modern gif encoding/decoding libraries, my partner who worked on this functionality went through quite some effort to find usable libraries for working with gifs in our application.

*This could be a potential direction for future contributions.

Front-end wise, we implemented a simple but effective piping algorithm in order to recognize the type of input the user uploaded, and trigger respective strategies accordingly.

Demo gif outputs:

GIF.Sample - Trump.OriginalGIF.Sample - Trump.ChihiroGIF.Sample - Trump.ShinkaiGIF.Sample - Trump.PaprikaGIF.Sample - Trump.HosodaGIF.Sample - Trump.Hayao

Styles: Original – Chihiro – Shinkai – Paprika – Hosoda – Hayao

Some experiments:

This cyberpunk kitty is recorded during one of our experiments with GIF transformation. As shown in the video, the transformation (original style to Miyazaki’s Chihiro style) output is glitchy, resulting from a single frame loss. This could be resulting from issues with GIF encoding and decoding in our web application, as we currently work with GIF in the following way:

GIF  ➡️  binary data ➡️  tensor ➡️  MODEL ➡️  tensor ➡️  binary data  ➡️  GIF

Therefore, encoding issues could largely effect our final outcome. This is a problem that needs to be looked into in the future.

Foreground/Background Transformation

Foreground/background transformation is one of out biggest feature updates to our CartoonGAN web application.

The main methodology we used to develop this feature is to implement BodyPix as a tool to recognize humans from their background, and use that as a mask for the input image. This mask is then used to manipulate the pixel data from the image, so that the cartoonization can be applied to either foreground, background or both depending on the user’s choice.

We hope this could bring our user experience to another level, as we try to bring our users the experience of seeing themselves in the cartoon world of their choice, by either turning themselves into a cartoonized character, or turning their surrounding world into a fusion of their reality and fantasy.

Demo foreground/background outputs:

Foreground –

F/B.Sample - A.in F/B.Sample - A.out

F/B.Sample - C.in F/B.Sample - C.out

Background –

F/B.Sample - B.inF/B.Sample - A.out1

F/B.Sample - B.out2

Social Impact 

ml5 library

We wrapped CartoonGAN into a ml5 library, and submitted a pull request to merge our work into the ml5.

Rull Request

Ml5 Pull Request Screenshot

The reason we included this as part of our project goal is that we hope our work would become real contributions to the creative world out there. Machine learning on the browser is still a relatively new and merging field, the more work and attention it receives, the faster it will grow. Though I am a newbie myself, I really hope that my efforts and contribution could help ml5 grow as an amazing tool collection for the brilliant innovative minds in this realm.

Further Development

There are still work to be done and room for improvement in this project to bring it fully up to our expectations.

Web application wise, GIF transformation is still relatively slow and buggy, due to the insufficiency of existing tools to work with gifs on the browser. We did our best to accommodate these issues, but we still want to look into potential ways of improvement, maybe even new issues to contribute to.

The CartoonGan ml5 library is still a work in progress. Although we have the barebones ready, there’s still work needed. We are currently in progress of building tests, examples, guides and documentation for the library, and designing wise, we still need to improve the library in aspects like error and corner cases handling, image encoding and other input format supports. These are all necessary elements for CartoonGAN to become an easy-to-use and practical library, which is our ultimate hope.

Final Documentation for CartoonGAN – Eric & Casey

Website: https://cartoon.steins.live/

Recap

In the midterm, we have implemented a CartoonGAN on the browser that allows users to upload or take a photo and transform it into a cartoon-like style.  And we have trained two models, one is for Miyazaki and the other is one is for Aku no Hana.

Current Solution

Having done some experiments with generative arts, we decided to continue working on the CartoonGAN project by adding features including GIF transformation, foreground/background only transformation, and exporting the CartoonGAN into an ml5 function/model.

Continue reading “Final Documentation for CartoonGAN – Eric & Casey”

Week 4 Writing Assignment

While neural networks are in many ways inspired by the way the human brain functions and learns, in particular human neurons, as machine learning technologies develop they are increasingly moving away from neuron functions and into a field of their own. Neural networks such as Artificial Neural Networks (ANNs) were originally inspired by the way neurons function, mimicking the way neurons receive a connection, make a decision or decide a function, and make an outgoing connection to the next neuron, or in the case of an ANN, the next layer of the network. Neural networks have also been developed to mimic the plasticity of the human brain, programmed to develop the ability to modify and strengthen certain models based on data received just as the human brain strengthens certain connections and recognizes patterns between data.

However, despite the similarity in structure, human neurons remain much more complex than neural networks and have the ability to adapt and complete different tasks, as well as perform nonlinear calculations and change the speed and transmissions of signals based on different factors. Neural networks remain linear systems, where information is passed through networks and these routes can’t be bypassed to perform these nonlinear, adaptive functions. Machine learning so far is dependent on supervised learning, where pairs of data are fed to a program to teach it to recognize patterns and match data, while most of human learning involves unsupervised and reinforcement learning, or learning through experience. This level of learning is as of yet inaccessible to machine learning programs, which still rely on human checks and supervision to ensure that the programs are matching the correct data and producing useful results.

While the human brain is the ultimate inspiration for machine learning programs, new programs like GAN and RNN are building off of different principles and not necessarily modeled after the human brain. The future for machine learning will likely diverge from neuron-type modeling, but the goal – to create programs that can learn, think, and produce conclusions like human neurons – remains the same.

Sources:

https://medium.com/swlh/do-neural-networks-really-work-like-neurons-667859dbfb4f

https://www.youtube.com/watch?v=P4wI938mx00

Week 02: ml5.js exploration

I really like the “StyleTransfer_Video” example, and I think it could have very interesting artistic uses. The video aspect is fun and interactive, and I like the stylistic possibilities.

https://ml5js.github.io/ml5-examples/p5js/StyleTransfer/StyleTransfer_Video/

How it works

The program takes an image and uses tracking software to match certain stylistic elements to the webcam images. The software applies the color scheme and patterns of the “Style” image to whatever the webcam captures, creating a surprisingly similar style while keeping the webcam aspects identifiable.

Here was the “style” image:

Here are some webcam screencaps of me using this style:

Potential uses

I think it could be a cool public art piece, especially in an area like M50 with a lot of graffiti or outdoor art. The webcam could be displayed on a large screen that places passersby into the art of the location, taking the stylistic inspiration from the art pieces around it. I also think it could be cool way to make “real-time animations” using cartoon or anime styles to stylize webcam footage. If a simple editing software was added to the code, such as slo-mo effects, jump cuts and zoom, and the program could become an interactive game that “directs” people and helps them create their own “animated film.”

I’m also curious how the program would work if the “style” images were screencaps of the webcam itself. Would repeated screencaps of the webcam fed through it as the “style” create trippy, psychedelic video? I would love to find out!

Week 1: Case Study Presentation (AI Arts)

For my case study I analyzed the Google Deep Dream project, a fascinating intersection of data analyzation and art that sprung from a Google images project on image recognition. Developed by Alexander Mordvintsev for the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in 2014, the software was intended to categorize images based on faces and patterns. The software was open source which opened up possibilities for developers to tweak it, teaching the software to recognize various patterns, faces and images with different levels of sensitivity. The software can also be used in the reverse by teaching the network to adjust the original image to create a higher recognition rate for the faces or patterns it detects. The network can continue adjusting the image, going off of patterns found and exaggerating these patterns in each generation of the image, ad infinitum.

The result is highly psychedelic imagery that can be adjusted so that certain patterns are detected, such as dog or cat faces, with a popular version created for “jeweled birds.” The software can be applied to video as well, as seen in Memo Atken’s personalized code:

https://vimeo.com/132462576

Using https://deepdreamgenerator.com/ a version of the software made available online with various filters and settings, I experimented with my own photo (of me as a child) and ran it through various iterations to produce some surrealist Deep Dream images.

Link to my presentation: https://drive.google.com/file/d/1hXeGpJuCXjlElFr1kn5yZVW63Qcd8V5x/view?usp=sharing

Sources:

https://www.fastcompany.com/3048274/heres-what-googles-trippy-deep-dream-ai-does-to-a-video-selfie

https://www.fastcompany.com/3048941/why-googles-deep-dream-ai-hallucinates-in-dog-faces

https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

https://www.psychologytoday.com/us/blog/dreaming-in-the-digital-age/201507/algorithms-dreaming-google-and-the-deep-dream-project