Week 14: AI Arts Final Project

Link to video: https://youtu.be/X-HujM0LWVg

Background: 

Without a doubt, the entertainment industry is a big part of everyone’s lives. As we become more and more connected to the larger digital landscape, forms of media such as movies, music, etc. will stimulate our imagination and inspire us even more. Many people use pop culture to reference their everyday lives and I’m no different. As a fan of the sci-fi genre, I have always imagined the night time of Shanghai to be incredibly breath-taking. As I listen to songs of the synthwave genre and look outside my window, I imagine Shanghai as a city from a popular movie such as TRON and Bladerunner.

One of the reasons for this is because gentrification has become a topic among the citizens of the city. The city has been gentrified to create space for shopping malls and high-rise buildings, not to mention that the Chinese government has started to experiment with facial recognition technology and social credit scores. Although the development of technology is extremely important for society as a whole, it is also reminiscent of dystopian science fiction pop cultures, such as 1984 by George Orwell and Bladerunner. Synthwave and the cyberpunk genre display what the future of society could be if technological development continues without regarding the impact on humanity. Therefore in a way, the scenery and aesthetics of the genre are extremely beautiful, however, this is only on the surface. If one were to look past the eerie beauty, they will find that it is not as perfect as it seems.

Motivation:

I wanted to create a video project using the style transfer model that was given to us by training it with a series of different images. The images themselves were cityscapes reimagined to be futuristic.  An example is the image on the top, a frame pulled from one of the videos I incorporated into my project. The style was transferred from a TRON landscape that I pulled from the internet. For a portion of the class, we have focused on style transfer and it really interested me. Is it truly art if you are just imitating a style of something and applying it to something else?  If not, then what could be the purpose? 

My motivation was the exploration of our newfound ability to reproduce styles of popular media through machine learning and produce a video that would help me display sonically and visually what I see in Shanghai’s nighttime. Furthermore, I wanted to explore how style transfer can interact with media and help shape it. In this case, “Can style transfer be used for a music video, to invoke the same feelings or perceptions of the style that is being transferred?”

Methodology:

The methodology for the project mostly involves the usage of style transfer at its core. The important part is training the model with pictures that are most representative of the cyberpunk genre. The images involve something like this:

These images are all images pulled off the internet by searching “cyberpunk cityscape”. But just transferring images were not enough, I wanted to create videos indicative of my vision for Shanghai. So Adobe Premiere, VLC, Logic Pro for the music, and Adobe After effects were all involved in the process of making the video. Devcloud also played an important role in allowing me to complete most of my training and conversion at a manageable time. 

Experiments:

First Iteration:

For my very first iteration of the project, I ended up creating what was a lo-fi video. I trained the models with the original three photos that I have added in this post. The results were decent as you can see from the examples down here:

For the first time, I was actually converting an image into multiple different styles that I trained myself. I thought it was amazing! However, it was now time to find a way to convert the videos that I had into multiple individual frames that ideally result in a high fps video. After consulting with the IMA faculty present at the time, I ended up going with the line of code “saveCanvas”.

This allowed me to save the frames that were being played from the transferred video set. The issue with this technique was that it resulted in a very low fps video that seemed and felt extremely glitchy. When the frames were saving, it also saved the frames that were stuck in the process of transferring style, so for some videos I ended up with more than 500 frames of just static imagery. Not to mention the quality of the style transfer itself didn’t help much. I later added the frames into Adobe Premier and transformed it into a whole video. I then wanted the video to also invoke the feelings of the cyberpunk genre so I ended up downloading a song from the internet called “Low earth orbit” by Mike Noise. The end result was a video that fundamentally did what I wanted it to, but was a very low-quality video of what I needed it to be. 

Second Iteration

For the second iteration, I took into account the advice that critics gave me and incorporated them into my project. But, more than that, I really wanted to increase the quality of the videos that I pulled together. 

This time I started out with experimenting with the inference.py file that is found on our devcloud folder for the style transfers.

The first time I tried, I didn’t quite understand the code that needed to be written. I understood that it was to direct it somewhere to get it to work, but I didn’t realize what exactly I needed to input. So, I ended up failing quite a few times and eventually getting quite disheartened. But then I cracked it after a couple of rounds of failure, then I ended up with ABSOLUTELY STUNNING results.

Results:

The quality exceeded my expectations and provided me with a way to produce breathtaking results. More importantly, I was able to keep the resolution of the original file without the model squashing it into something that was not ideal. 
Later on, I decided I needed to transfer the frames efficiently so that my transferred video would be able to reproduce the high fps rates of the original video. 

I used VLC and played around with the option of Scene Filter that allowed me to automatically save frames to a designated folder. I ended up saving over 500 frames per video, some even reaching 1200 frames. The important part of the process was to collect these frames and upload them onto devcloud and put the style transfer inference.py on it. This way, I was to reproduce 3+ styles for every single video that I had. Safe to say that this process took most of my time as I uploaded, transferred, and downloaded them back due to devcloud’s connection was only pulling at a low 20kbps at times. 

After downloading all the style transferred frames, I proceeded to put them into Adobe After effects and added all the frames from one style and video into an image sequence.  What this allows is the automated process of detecting the fps of the original video and creating a high-res video with the correct order of images. I did this for all 28 folders. I ended up only choosing certain styles and videos for the final product as some of the styles didn’t look different enough while some of them were too different from my intended cyberpunk theme. 

I also implemented an additional video from youtube that had a drone’s perspective of the pearl tower, an iconic architectural piece indicative of Shanghai. I chose the frames from that part of the video (800) frames and added it to my video. This piece had a neat detail, it had a text from the original video style transferred which said “The greatest city of the far east”, which I thought was extremely cool to have so I left it in the final cut. 

I wanted to implement music, so I hopped on my own program called Reason and tried different ideas. 

I tried different presets of sounds and resulted in getting close to what I wanted but not exactly. I later proceeded to use the Logic Pro program on the school’s computers to use the presets on there and resulted in the music that you hear in the final piece. I added reverb, and an EQ to cut out some of the lower frequencies that clashed with the dominant bass. 

After that, it was just a matter of adding them all into Adobe Premier to turn them into a whole video. I added multiple transitions and effects that would allow a smoother transition from video to video. 

Also, I layered the same video on top of each other and changed the opacity so that it would seamlessly change styles from the TRON all blue, to a bit more colorful neon-esque hue from a different style. 

The music would also last a bit longer and I made it so that in the end one by one the instruments would stop playing until one was left and made it fade out. 

The resulting video was what I had hoped for and left me very proud of the work that I did. 

Social Impact:

The social impact of this video is not that significant in my opinion. The video was created as a way for me to show others how my imagination thinks of Shanghai. Therefore, this video is a bit more personal than my midterm project. 

Nevertheless, the project fundamentally challenges what it means to be artistic. I think that the usage of style transfer is extremely important. What is real art when style transfer is involved? Is my video considered art when I incorporated someone else’s style to my project?  I think artistically the machine learning model can produce some amazing results, however, I believe that it also raises some very important questions about the usage of style transfer such as the question of originality and plagiarism.

This model does show a lot of promise if used in the correct way, such as providing an infinite source of inspiration for an artist using his own work as a trained model. An example is Roman Lipski’s Unfinished. I can also see someone with the ability to create his own frame of images that he wants to train the model on, and create his vision of the world around him through style transferred videos. Artistically, I also think that it can help with the creation of movies and music videos. 

Further development:

Some further development for this project could potentially include further refinement of the model. We can try to make the model produce something as close to the original image as possible with minimal loss. Further development could also include incorporating acting and more shots of humans and how they interact with the style transfer model. More aerial shots can also help with the video and produce something much more focused on the cityscape. 

I mostly used night shots of the city into my video. Trying to style transfer images of the city during day-time produced very lackluster products, it mostly turned the screen green or blue and that wasn’t very satisfying. I believe that it has something to do with the style of the image that it was created on. The original that I used to train the model was very dark in overall tonality and thus, it might have helped when I used night pictures of the city. Further looking into the development of day-time shots and transferring it into night time shots can also help in the future. 

References:

Drone video of Pearl Tower https://www.youtube.com/watch?v=NOO8ba58Fps

Drone video of Shanghai https://www.youtube.com/watch?v=4nIxR_k1l30

Final project concept: EB

slides to presentation: https://drive.google.com/open?id=1kigOl5IQ15UO5NGDD3uHTJ4GrR2BLlJ-OUPsYl3Zp-A

Background:

I have always been a fan of the sci-fi genre. As a child, I would daydream about flying cars and neon lights in the city. However, the more I grew up, the more I looked past the aesthetic of the genre and looked at the implications of a cyber-heavy society. Movies such as Blade Runner and Tron all show a dystopian society in which the world is heavily influenced by the lack of human-to-human interaction. This is partly due to the effect of technology. The more unsupervised technological breakthroughs occur, the higher the chances of it affecting our day to day lives for the worst. 

The cyberpunk sub-genre immediately reflects the disparity between humans and machines within our society.  The dystopian aspect of the genre can be seen from a couple of common aesthetic themes present in the sub-genre. Dark skies, unnatural neon lights, near-empty streets and so on. 

These aesthetic choices reflect the dystopian society with a naturally dark and gloomy scenario which is coupled with unnatural man-made neon lights. This is a showcase of how humans have deviated away from the natural world and attempt to replicate the natural with their own creations. 

Motivation:

I want to be able to show the eerie beauty of the genre to everyone else. I want people to see what I see when I walk around a megacity like Shanghai. The future depicted in these media is truly breathtaking, however, a glimpse past the veil of technology shows a terrifying future. 

I want to use the knowledge of machine learning and AI to showcase my vision to others around me. The reason for this is because sometimes words fail me and I can’t explain what I see clearly with words. But thanks to what we have been learning in class, I can finally be able to show what I mean.

Reference:

I will be using StyleTransfer to train my models to develop the city skylines that I want.

I also want to use BodyPix to separate the human bodies from the backgrounds through segment. By doing so, I will be able to implement two different Styletransfers that will help my vision come true. However, to showcase this, I might need to take a video of the city in order to actually what the model can do. 

Week 11 Assignment: Deepdream Experiment (EB)

For this week’s assignment, I decided to experiment further with Deepdream. The concept itself reminded me of a psychedelic-induced trip from contemporary media. Media such as Harold and Kumar, Rick and Morty, as well as others show moments where characters see their world distorted by drugs. In those scenes, the character is usually looking out towards nature, therefore, I chose to use a picture of the rainforest as my initial input.

I thought that the output would result in something interesting and similar to what I have seen in popular media.

I started to play around with the layers first without playing with the other variables. The product of this was actually very interesting. It seemed to me like it produced different styles of the initial image. This image used the mixed 3a layer.

This image used the mixed3b layer.

This image used the mixed4a layer.

This layer used the mixed4c layer. 

This layer used the mixed5a layer.

In terms of exploration, this experiment allowed me to replicate the forms of psychedelic trips that can be seen. Each layer altered the image in a different way, giving them unique styles. It seems as if the different layers chose the same locations on the image to alter, but the way in which they did was interesting to see. 

After this experimentation, I wanted to know what it would look like to produce a video similar to the one we saw in class. Then I came up with this.

https://youtu.be/mVasZounarc

Overall, I think that this experiment gave an interesting insight into what the Deepdream can do. I wonder whether it would be possible to maintain the style of the different layers by using style transfer and training a model based on the layer, and how different it would be compared to just using deep dream.

I also see myself using this as a way to produce interesting images of my cyberpunk cityscapes. The results should be pretty interesting I imagine. 

Week 10: EB’s work

For this week’s assignment, I trained a style transfer model using a cyberpunk style image as my base. I personally fell in love with this genre thanks to movies such as Tron, Beyond the Aquila Rift and others. I wanted to create a model that would translate normal landscapes into something out of one of these movies.

Beginning the project, I didn’t realize that it would take me such a long time to complete. Being on a Windows computer didn’t help as well. I followed along with the slides provided to us however some of the lines of code would not transfer well in the command terminal of the windows system. Therefore, I resorted to transferring my work onto a mac.

From there, I downloaded the finished model and uploaded into the folder that was provided to us. Running the localhost server, I then proceeded to test the finished work out. 

Here are some of my finished work:

Personally, I’m very satisfied with the results. It manages to change normal everyday landscapes into actual Tron images. However, I think that the result created a very digitized version of the landscapes. Understandably, there is a lot of loss and the accuracy might not be quite high, but it is safe to say that the model was able to perform better than expected.

The effect doesn’t really work with portraits as seen in this example. If given pictures of landscapes it works very well. I think that this model has potential and if given enough time to develop, it can potentially create even better examples of said images. 

Overall, I think that I was able to recreate the cyberpunk aesthetic with the model. It went really well however there is a lot of room to improve. A larger sample size would give me a better result and translate the landscapes into a more accurate depiction of cyberpunk/Tron-esque landscapes. This idea can become a reality and is a very feasible idea in my opinion. This is definitely a good idea to look into.  

Week 9: Midterm Update (EB)

Background:

Although I originally planned to create a framework for webVR on A-Frame using posenet, the process turned out to be too difficult and beyond my capabilities and understanding of coding. Although the idea itself is relatively doable compared to my initial proposal, I still needed more time to understand how A-Frame works and the specific coding that goes into the 3D environment. However, I wanted to create something doable and yet creative possibly incorporating sonic elements to the project.

Motivation:

For the midterm, I decided to create an interactive sound visualization experiment using posenet.  I downloaded and used a library called “Simple Tones” containing multiple different sounds of various pitches. The user will use their left wrist to choose what sound they want to play by placing their wrist along the x-axis.  This project was inspired by programs such as Reason and FL Studio as I like to create music in my spare time.

Methodology

I used the professor’s week 3 posenet example 1 as a basis for my project. It already had the code which allows the user to paint circles with their nose. I wanted to incorporate music into the project, so I looked online and came across an open-source library with different simple sounds called “Simple Tones”.  

I wanted the position of my hand in the posenet framework to play sounds. Therefore I decided that the x-axis of my left wrist would be used to determine the pitch.

if (partname == “leftWrist”) {
if (score > 0.8) {
playSound(square, x*3, 0.5);
let randomX = Math.floor(randomNumber(0,windowWidth));
let randomY = Math.floor(randomNumber(0,windowHeight));
console.log(‘x’ + randomX);
console.log(‘y’ + randomY);
graphic.noStroke();
graphic.fill(180, 120, 10);
graphic.ellipse(randomX, randomY, x/7, x/7);

the “playSound” command and its attributes relate to the library that I have in place. Because the x-axis might not have high enough numbers to play certain pitches and sounds, I decided to multiply the number by 3. Left is  high-pitch, while the right is low-pitch.

I ran it by itself and it seemed to work perfectly.

After some experimentation, I also wanted some sort of visual feedback that would represent what is being heard. I altered the graphic.ellipse to follow the x-axis coordinate of the left wrist. The higher the pitch (the more left it was on the axis) – the bigger the circle.

The end result is something like this. The color and sounds that it produces give off the impression of an old movie. 

Experience and difficulties

I really wanted to add a fading effect on the circles, but for some reason, it would always crash when I write a “for” loop. I looked into different ways to produce the fading effect, but I wasn’t able to include it in the code. 

I would also try to work on my visual appearance for the UI. It does seem basic and could use further adjustment. However, currently, this is as much as my coding skills can provide.

This idea and concept did seem to be a very doable task at first, but it required a lot more skill than I expected. However, I did enjoy the process, especially the breakthrough moment when I could hear the sounds reacting to my movement. 

Overall, I have now learned how to use the positioning of a bodypart to do something. Going further, I do want to work on the webVR project and this experience can help in the understanding and implementation.

Social Impact:

In the process of my midterm, I worked on two different projects. The first project was pairing WebVR with posenet in order to develop a means to control the VR experience with the use of the equipment required. The second project was the one I presented in class – Theremin-inspired posenet project. Although I only managed to complete one posenet project, I believe that both projects have a lot of potential for social impact.

First, let’s talk about the WebVR project. The initial idea behind the project was to make VR more inclusive by allowing people without the funds to buy the equipment to experience VR. HTC Vive and other famous brands all cost over 3000RMB to purchase. By allowing posenet to be used inside WebVR, we can allow anyone with an internet connection to experience VR. Obviously, the experience won’t exactly be the same, but it should give a similar enough experience

Secondly, the Theremin-inspired project. I found out about the instrument a while back and thought to myself “What an interesting instrument?”. While the social impact of this project isn’t as important or serious as the previous one,  I can see people using this project to get a feel or understand of the instrument. The theremin differs from traditional instruments in that it is more approachable for children, or anyone for that matter. It is easy to create sounds with the theremin but it has a very steep learning curve. By allowing this kind of project to exist, people of any background can experience music and sound without buying the instrument.

Future Development:

For the first project, I can see the project developing into an add-on that works for every WebVR project. For this to be real, one has to have an extensive understanding of the framework A-Frame. By understanding the framework, one can possibly use it to develop the necessary tools for the external machine learning program to be integrated. 

The machine learning algorithm also needs to be more accurate in order to allow as many functions to be used as possible. 

For the second project, I can see music classes using this project to explain the concept of frequencies and velocities to younger children or those with beginner knowledge in music production. It allows a visual and interactive experience for these people. 

For the future, it can be possible to add the velocity and volume of each point on the x and y-axis to make it more quantifiable for the person who is using it. For those who want to