Midterm Documentation (Cassie)

Social Impact

The technical simplicity of this project shows that AI doesn’t have to be scary or complex in order to use it: it can be a simple tool for artists looking to explore new digital mediums to create something visually interesting.

There is also a lot of discussion surrounding AI art in terms of who the artist is: is it the neural network, or is it the person who programmed the work? A lot of people seem to view this debate as very black and white in terms of believing that the artist is solely the neural network or the artist is solely the programmer. Regardless of what your opinion may be surrounding this debate, I think this project is an example of AI art where I would argue that both the programmer and the AI components equally work together to create the outcome. It doesn’t have to be an all-or-nothing scenario: the point of AI is to help us achieve certain outcomes more easily, so why not use it to work together rather than treating it as something that is taking away human creativity?

Further Development

I can see this project taking two different routes if I were to further develop it. The first route is to make it more user-friendly in order to make this kind of art more accessible to other people. In this case, a better interface would absolutely be necessary. The whole hover-to-start setup worked fine for me, but it might not be so intuitive or useful for others. Some kind of countdown before the drawing process starts, as well as an option to save the completed piece or automatically record it rather than having to manually take a screen capture video would make more sense. Additionally, making the artwork customizable from the interface side would be good to add such as being able to change the colors, size of the ellipses, or even change the shape entirely, rather than having to go into the style.js code to change these aspects.

The second route would be to further explore the concept as a personal artistic exploration. This option is definitely more open-ended. I could try and apply more machine learning skills; for example, I still really like the idea of AI generative art, so what if a GAN or DCGAN could make its own similar pieces based on these body movement pieces? This is conceptually interesting to me because it’s like giving a neural network its own set of eyes. It’s like some machine is watching you and can predict your movements, converting the artwork into a statement on privacy in today’s digital world rather than just an exploration of body movement over time.

Full documentation

(with updated background + motivation for new concept): https://docs.google.com/document/d/1DGs7plWL98vslkEo1t7phG4EcR2uOVikXQ4AFmjzsZI/edit?usp=sharing 

Midterm Project Casey&Eric

Social Impact

By bringing the CartoonGAN model into the browser, we make it possible to transfer the real image into the target style. That could bring the users and their loving images into the cartoon they like. And even more, if it is a GIF or video, we can directly convert it into the cartoon style. It makes possible for users to directly get the styled GIF or video. In a word, we are trying to blur the boundary of the cartoon world and the physical world by putting the model running on the browser.

Future Development

In terms of the future plan, we have three things primary aspects, realtime performance, more input formats, and ml5 function wrapping.

Realtime Performance:

There are some potential solutions for realtime performance support. One is to deploy the model and inference service over the server-side. However,  with a powerful computer, we can do the generation process within seconds. However, if we are trying to realtime generating a video or long gif, such a solution will not best fit in the scenario as users might need to wait longer to upload and fetch videos.

Another solution for that is to use native hardware acceleration to boost the edge inference procedure. Thus, here we can use tfjs-node or TensorFlow Lite to play the game. However, we still need to see if such native hardware acceleration will handle real-time inference.

More input formats:

As we have proposed in the previous section, it will make more sense if more formats of input could be supported like Gif or video. As for the gif, it is possible to unpack a gif into a slice of images and do inference over them. After that, we can pack the styled gif back.

However, as for the video, we may do not want to support such functionality over the browser as it will easily crash the system due to a large number of frames a video could contain. We need to seek solutions from the server-side and the native side.

ML5 wrapping:

As the CartoonGAN itself is interesting, we can also export the model into the ml5 structure. In this way, it will best benefit the community by bringing more diverse models.