For this project, I just wanted to play around with style transfer and see if I could get something cool working. I wanted to see the kind of style that transfers onto an image if the base image is just fire. Developing this idea further, I thought I could use the webcam input feed to simulate a “boom” or an explosion, the result of which would be a fire-styled camera output. To play around a bit more, I tried to shake the canvas and add that extra ‘boom’. I also added a little audio effect for fun.
Training
I trained the model based on Aven’s instructions and what we learned in class. I did run into some really weird errors while trying to train my model. Even after deleting and reinstalling the environment, the error persisted. The error message was “No module named moviepy found”. However, a quick run of pip install moviepy
seemed to fix the issue and it worked! I left it for many hours and got the model after. I used this image to train the model for style transfer:
Inference with ml5
The next step was to obviously use ml5 to actually create the end result. I used the same syntax as we followed in Aven’s example in class. After fixing many bugs, and encountering many rounds of my computer freezing completely, I was able to get it running.
Here’s how it looks:
As you can see, the result is not that great. I really expected some cool flames but that didn’t happen, it ended up looking more like a sunset filter kind of picture. Additionally, I even wrote code for the entire canvas to shake when the style is transferred by pressing the spacebar. However, since the style transfer process slows down the computer and the browser so much, the problem was that you could not really see the shaking as it happened. The shaking was something like this.
Result
Style transfer is really cool. I’m guessing I need to tweak some parameters to get results more towards what I want. However, the training time is so large that it’s extremely hard to try out different parameters like number of iterations etc to get the best one. I’d like to maybe use a different image or try other parameters to get something that looks a little more like what I want. Additionally, this technique is not that suitable for working with video input, as it requires a lot of processing power and correspondingly, power.