Week 12: Final Concept Documentation

Link to concept presentation slides: https://drive.google.com/file/d/1jgxeo-knGx7nLrnWBmPZYLOIwz4mdJ8k/view?usp=sharing

Background

For my final project, I’ll be exploring abstract art with Deepdream in two different mediums, video and print. I plan on creating a series of images (printed HD onto posters to allow viewers to focus on the detail) and a zooming Deepdream video. I’ll create the original images with digital tools, creating abstract designs, and then push them one step (or several) further into the abstract with the Deepdream algorithm.

I love creating digital art with tools such as Photoshop, Illustrator, Data moshing and code such as P5, however I’ve realized that there are limitations to these tools in terms of how abstract and deconstructed an image can get. I’ve also been interested in Deepdream styles for a while and the artistic possibilities, and I love the infinite ways Deepdream can transform images and video. In the first week, I presented Deepdream as my case study, using different styles of the Deep Dream generator tool to transform my photos.

Case study slides: https://drive.google.com/file/d/1hXeGpJuCXjlElFr1kn5yZVW63Qcd8V5x/view

I would love to take this exploration to the next level by combining my interest in abstract, digital art with the tools we’ve learned in this course.

Motivation

I’m very interested in playing with the amount of control digital tools give me to create, with Photoshop and Illustrator giving me the most control over the output of the image, coding allowing me to randomize certain aspects and generate new designs on its own, and data moshing simply taking in certain controls to “destroy” files on its own, generating glitchy images.

Created with p5 with randomization techniques and Perlin noise:

Created with datamoshing, filters and layout in P5:

Abstract image created in Photoshop:

However, Deepdream takes away all most of this prediction and control, and while you are able to set certain guidelines, such as “layer” or style, octaves, or the number of times Deepdream goes over the image, iterations and strength, it is impossible to predict what the algorithm will “see” and produce in the image, creating completely unexpected results that would be nearly impossible to achieve in digital editing tools.

References

I’m very inspired by this Deepdream exploration by artist Memo Akten, capturing the eeriness and infinity of Deepdream: https://vimeo.com/132462576

His article (https://medium.com/@memoakten/deepdream-is-blowing-my-mind-6a2c8669c698) explains in depth his fascination with Deepdream, something I share. As Akten writes, while the psychedelic aesthetic itself is mesmerizing, “the poetry behind the scenes is blowing my mind.” Akten details the process of Deepdream, how the neural networks work to recognizevarious aspects of the reference image based on its previous training and confirm them by choosing a group of neurons and modifying “the input image such that it amplifies the activity in that neuron group” allowing it to “see” more of what it recognizes. Akten’s own interest comes from how we perceive these images, as we recognize more to these Deepdream images, seeing dogs, birds, swirls, etc., meaning viewers are doing the same thing as the neural network by reading deeper into these images. This allows us to work with the neural networks to recognize and confirm the images it has found, creating a cycle requiring both AI technology and human interference, to set the guidelines and direct it to amplify the neural network activity as well as perceive these modified images. Essentially, Deepdream is a collaboration between AI and humans interfering with the system to see deeper into images and produce something together.

Week 11: Deepdream exploration

For my deepdream exploration, I created a zoom video that’s a compilation of two deepdream sessions. I created one deepdream with setting mixed4c, then another with the last image picked up from the first video and zoomed in further, also on mixed4c.

deepdream zoom low quality

I had a lot of challenges with this, as at first I didn’t realize I need to change the video file name to save the video (leading to some confusion) and I also had some random errors where img0.jpg wasn’t defined as I didn’t realize that I had to run all the cells together every time for everything to work. I also had trouble with the original photos I tried to use as they were very high quality and I didn’t realize that the session would close down after 30 minutes so the videos kept failing to generate and getting stuck loading at one or more of the cells.

After getting some help from Aven I was able to realize these errors and generate the videos. I plan on using this technique for my final video project, so I’m excited to explore it more!

I also put together the class video for the bigGAN collaboration, available here:

https://www.youtube.com/watch?v=xxrblvC-VOM 

Week 13: What If U Re Beauty(Final Documentation) – Jinzhong

Name

What If U Re Beauty (BeautyMirror)

GitHub

https://github.com/NHibiki-NYU/AIArts/tree/master/Final

Source

Proposal | PaperPretrained ModelPresentation

Tech Specs

In this final project, I mainly separate the work into two parts:

  • backend: to be in charge of computational jobs and static servers.
  • frontend: to interact with the user, take the picture, send it back to the server, and display the result of the GAN network.

So, the pipeline of the whole steps are:

  1. The frontend takes a photo by webCam
  2. The frontend process and compress the image at frontend and post it back to backend
  3. The python server receives the image and transfers it to numpy matrix
  4. The python server passes the matrix as input to TensorFlow backend and gets the result
  5. Again, the backend server transfers the output matrix to image and encode it to base64 string as a result for frontend request
  6. The frontend gets the base64 image and displays it on canvas

Development process

Firstly, I used a method navigator.mediaDevices.getUserMedia to create a context to activate WebCam. Then, I create a <video /> element and set its source to the context. The realtime camera should be displayed on the screen.

After that, by using canvasContext.drawImage(video, x, y, w, h);, the video frame could be extracted to the canvas. When the user clicks, we only need to pause the update of the canvas and send the image on canvas to the backend server.

Obstacles

There are many problems when I want to publish the demo to the world-wide-web.

Unable to connect to HPC

Since HPC in NYU Shanghai does not have a static public IP address, it is not possible to access the machine directly from the outside part of the NYU network. So, I need to find a way to break this NAT environment. There is an open-sourced software called ngrok, which allows users to run client mode on their NAT server and run server mode on their host server which has a public IP bind on it. When the user requests the IP address via the host server, the host server will connect the NAT server through the tunnel. The users, therefore, have access to content on the NAT server without a public IP.

Link To: ngrok

WebCam only be allowed on https

Another obstacle is the WebCam can only be launched on https (Http over TLS). So, I use letsencrypt to issue a certificate for a domain. (I used my own domain in this scenario, but we can also use those public domain services like nip.io or xip.io)

When got certificates and keys from letsencrypt, we need the file privkey.pem for the private key and fullchain.pem for the certificate. The flask server can be start like this

Now the server should be started at :3003 with https encrypted.

Screenshots

Final Project Concept – Katie

For my final project, I am focusing on the theme of human vs. computer perception. This is something I’ve tried to explore through my midterm concept and initial plan of reconstructing humans from image classification of parts. When I talked with Aven, I realized there were other, less convoluted ways of investigating this that would allow the work of the computer to stand out more. He showed me the examples from the duo Shinseungback Kimyonghun that also follow these ideas; specifically I was more inspired by the works FADTCHA (2013) and Cloud Face (2012), which both involve finding human forms in nonhuman objects.  

fadtcha

These works both show the difference in which a face detection algorithm can detect human faces, but humans cannot. Whether or not it’s because CAPTCHA images are very abstract, and whether or not it’s because the clouds are fleeting doesn’t matter; this difference is still exposed.

cloud-face

I wanted to continue with this concept by using a human body-detecting algorithm to find human forms in different spaces where we cannot see them. Because I’m most familiar/comfortable with the ml5.js example resources, I started by using BodyPix to do some initial tests, which was interesting as far as seeing what parts of buildings are seen as body segments, but it’s not a clear idea. Then I tried using PoseNet to see where points of the body could be detected. 

test1

test2

This was a little more helpful, but still has a lot of flaws. These two images were the shots where the highest number of body points could be detected (other shots had anywhere from 1-4 points, but no similar shape to human body), but still this doesn’t seem concrete enough to use as data. I plan on using a different method for body detection—as well as a better quality camera—to continue working toward the final results.

FInal Project Concept Eric & Casey

For this final project, we had two ideas in our minds. I will introduce them one by one:

Refine the CartoonGAN project

This idea is a continuation of our midterm project by adding two functionalities: gif converting and foreground/background only converting.

In detail, the first one requires work on unpacking gif into frames and pipe all frames as a batch to the model. Then the model gives the output value and our app will pack those frames into a new gif.  Through this process, we need to put extra attention to avoid running out of memory. 

As for the second idea, I am planning to combine the BodyPix model with our CartoonGAN model. However, to maximize the efficiency of the computation.  I will apply some WebGL tricks with the BodyPix and CartoonGAN output. So that we can deliver the best user experience.

Generative ART with Sentiment Analysis on both audio and text

The idea is that the text could contain certain emotions so does the audio/speech. What’s more interesting is that even the same content, the text, and the audio could have different emotions. We can use this kind of conflictions to make generative arts.

However, there are some problems remaining here.  First, there is no real-time sentiment analysis during these years. We can find an alternative that uses face expression.

The other thing is how to visualize or say generate the emotions that will attract the audience?  This seems to be a harder and more critical question than the previous one.