For this midterm project, I am looking to implement a relation between image classification and word vectors. To do this, I am looking to use ML5.js to create an interactive web page/app which utilizes both ML5’s Image Classification model and ML5’s Word2vec model. In terms of input, I was thinking that I could choose an array of images to use for input or I could use a webcam input. However, finding an array of images which will be suitable will be quite difficult, but the results given by a webcam will also be limited. I considered using a dataset such as cifar-10, but I do not know how I would combine the Python code with the HTML/JS I will be using for the rest of the project. Therefore, while I am creating this application, I will be using webcam input.
My main inspirations for this project come from exploring how Artificial Intelligence deals with data which might have subjective qualities. I am also intrigued with the idea of word vectors and I am interested to see how these relate to the classification of images. I was also inspired by some of the projects done by Google’s Research team, namely Semantris and Google Drawing. When playing Semantris, I noticed that some of the answers I gave did not match the AI’s way of thinking. Therefore, I am looking to further understand this phenomena through the creation of my own project. Google Drawing is an app which asks a user to draw an image they associate with a given word. Then, using this, the AI tries to see how well the drawing matches. If it doesn’t match (according to the AI), then it gives the results which would have been a better match. Then, it overlays the input given with the output. I find that this is an interesting way of conveying this relationship and it is also interesting to see how a drawing made by a human is understood by a machine. While drawing would be quite complicated to work with, I feel that I could do something similar within my project.
Currently, I have been able to configure ML5’s word2vec model and their image classifier model separately, but I am still working on combining them. Along with this, I have not coded with JS in over a year, and so I am still trying to configure the best format for my desired application, and am looking for ways to incorporate user input within my project.