Featured Works
1. Dissection of a CNN
Independent Project for Convolutional Neural Networks for Visual Recognition – Spring 2015
This project explored the internal mechanisms of Convolutional Neural Networks (CNNs), focusing on understanding how layered representations emerge in visual recognition tasks. The study began with a theoretical foundation in convolutional operations, pooling strategies, and activation functions, followed by practical experiments using image classification benchmarks. A key objective was to “open the black box” of CNNs by visualizing feature maps, filter activations, and intermediate representations across convolutional layers. Through systematic dissection, it was observed that early layers primarily capture low-level features such as edges, corners, and simple textures, and deeper layers detect increasingly abstract patterns like object parts and semantic structures. The project also examined the role of hyperparameters, including kernel size, stride, and depth, on accuracy and generalization. Techniques such as occlusion sensitivity analysis and deconvolutional visualization were employed to evaluate how CNNs assign importance to different image regions. Findings highlighted strengths and limitations in CNNs: while highly effective at learning hierarchical representations, they remain vulnerable to noise, adversarial perturbations, and dataset biases. This independent project provided a deeper appreciation of CNN interpretability and laid the groundwork for subsequent research in explainable deep learning models.
2. Multi-resolution Image Recognition
Research Project Summer/Fall 2024
This project investigated how image recognition systems can leverage multi-resolution representations to improve classification tasks’ accuracy, robustness, and efficiency. The central idea was that visual features appear at different spatial scales—fine-grained edges, textures, and contours at high resolutions, and broader semantic structures at lower resolutions. By integrating multi-resolution inputs, the system can better capture local and global patterns, improving performance on complex visual recognition benchmarks. The project implemented a hierarchical pipeline where input images were processed at varying resolutions, with feature maps fused at intermediate network layers. Convolutional Neural Networks (CNNs) were extended with multi-scale kernels, pyramid pooling, and feature concatenation techniques to enable cross-resolution learning. Experiments demonstrated that multi-resolution fusion outperformed single-resolution baselines, particularly in scenarios with cluttered backgrounds, occlusions, and low-quality inputs. Key findings emphasized the importance of balancing resolution trade-offs: while high-resolution inputs improved fine detail recognition, lower resolutions reduced computational costs and enhanced generalization. The research contributed to understanding how scalable, resolution-aware architectures can be applied to domains in medical imaging, remote sensing, and object detection in real-world environments.
3. Plane Rectification on Android
Research Project – Computer Vision: From 3D Reconstruction to Recognition
Fall 2013 / Winter 2014
This project focused on developing a mobile computer vision application for plane rectification, transforming an image of a planar surface into a fronto-parallel view to correct perspective distortions. Implemented on the Android platform, the project demonstrated how rectification techniques enable real-time image correction and facilitate downstream tasks such as text recognition, augmented reality alignment, and 3D reconstruction from monocular images. The research integrated homography estimation methods using feature detection (SIFT/SURF/ORB) and robust point matching via RANSAC to compute transformation matrices. Rectified images were generated by warping input planes into a normalized coordinate space. Performance was tested on images of posters, documents, and building facades under varying perspectives and illumination conditions. Key contributions included adapting computationally intensive rectification algorithms to mobile hardware constraints, optimizing memory usage, and ensuring real-time processing speeds. The project highlighted the potential of deploying advanced computer vision algorithms directly on smartphones, bridging the gap between theoretical 3D reconstruction methods and practical applications in mobile augmented reality, document scanning, and scene understanding.
4. Polyphonic Piano Transcription
Research Project – Machine Learning
This project explored the challenge of automatic music transcription, focusing on converting raw audio recordings of piano performances into symbolic musical notation. Unlike monophonic transcription, polyphonic transcription requires identifying multiple notes played simultaneously, often with overlapping harmonics and complex temporal structures. The research applied machine learning techniques for feature extraction and classification. Mel-frequency cepstral coefficients (MFCCs), chroma features, and spectrogram representations were used to capture harmonic and temporal cues. Various models—including Hidden Markov Models (HMMs), Support Vector Machines (SVMs), and early deep learning architectures—were tested for their ability to predict note onsets and sustain durations. The system was evaluated on annotated datasets of classical piano pieces, achieving promising results in distinguishing concurrent notes and tracking note sequences despite noise and resonance. This project demonstrated how machine learning can address the inherently ambiguous problem of polyphonic transcription, paving the way for applications in music information retrieval, digital sheet music generation, and intelligent music tutoring systems.
5. AI Agent for Light Cycle Racing
Course Project – Artificial Intelligence: Principles and Techniques
This project involved designing and implementing an autonomous AI agent to play the classic Light Cycle Racing game, where players control motorcycles that leave behind impassable trails and aim to trap opponents while avoiding collisions. The focus was on applying AI search algorithms, heuristics, and game-theoretic reasoning to create a competitive and adaptive agent. The agent leveraged state-space search, including minimax with alpha–beta pruning, to anticipate opponent moves and optimize survival strategies. Heuristic evaluation functions incorporated spatial reasoning, board partitioning, and mobility analysis to balance offensive trapping with defensive escape planning. Additionally, reinforcement learning methods were explored to refine strategies through self-play and adapt to different opponent styles. Performance was benchmarked against rule-based and random agents, with the AI consistently demonstrating superior planning depth, situational awareness, and adaptability. The project showcased how core AI principles—search, planning, adversarial reasoning, and learning—can be combined to create intelligent, game-playing agents, providing insights applicable to broader domains such as robotics, navigation, and multi-agent systems.
6. Blood Vessel Segmentation
Course Project – Digital Image Processing
This project focused on developing algorithms for segmenting blood vessels from retinal fundus images, a crucial step in the early detection of diseases such as diabetic retinopathy, glaucoma, and hypertension. The challenge lay in accurately distinguishing thin, low-contrast vessels from surrounding retinal tissue while preserving fine vessel structures. Techniques implemented included contrast enhancement through adaptive histogram equalization, noise reduction via Gaussian filtering, and edge detection using matched filters optimized for vessel-like structures. Morphological operations and region-growing methods were applied to refine vessel connectivity and eliminate false positives. Experimental evaluation was performed on benchmark retinal image datasets, comparing segmentation outputs with expert-annotated ground truth. The project demonstrated the effectiveness of classical image processing pipelines in medical imaging and highlighted limitations in handling complex vessel branching and varying illumination. These insights underscore the potential of integrating machine learning and deep learning methods for future extensions to achieve robust and automated blood vessel segmentation.
7. Camera Forensics
Course Project – Applied Vision and Image Systems
This project investigated techniques in digital image forensics to identify the source camera of a given image and detect potential tampering or manipulation. The focus was on leveraging intrinsic image features and camera-specific artifacts that serve as unique “fingerprints” for forensic analysis. Methods included analyzing sensor pattern noise (SPN), color filter array (CFA) interpolation artifacts, and JPEG compression traces to distinguish between different camera models and verify image authenticity. Statistical approaches and feature extraction techniques were applied to construct forensic signatures, while machine learning classifiers were used to automate camera identification. Experimental evaluation was carried out on datasets of images captured with multiple consumer cameras under varying conditions. The system demonstrated strong performance in distinguishing cameras and detecting inconsistencies indicative of tampering, such as splicing or copy-move operations. The project highlighted how computer vision and statistical signal processing can contribute to reliable image forensics, with applications in digital security, law enforcement, and media verification.
8. WOBBLE
Independent Project – Interactive 3D Visualization
This project explored the use of parallax as a depth perception cue by creating an interactive 3D visualization inspired by animated GIFs that simulate motion-based depth. The system allows users to experience depth perception interactively in a virtual 3D world, with the camera focus automatically locked on the central object to enhance the sense of immersion. Built using Unity, the application enables manual adjustment of camera oscillation frequency, giving users control over the strength and smoothness of the parallax effect. By coupling object-centric focus with controlled camera motion, the project demonstrates how subtle visual cues can be leveraged to create compelling depth illusions without stereoscopic displays. This work highlights the intersection of perception, graphics, and interaction design, showing how relatively simple visual manipulations can produce strong perceptual effects for applications in virtual reality, interactive art, and visualization systems. Demo
9. 3D From Head Tracking
Independent Project – Computer Vision & Interactive Graphics
Inspired by the well-known 2007 head-tracking demonstration, this project implemented a real-time head-tracking system to enhance depth perception in virtual 3D environments. Using a standard webcam and the clmtrackr library, the system detects the user’s head position by extracting eye coordinates and tracking motion continuously. The captured head position dynamically adjusts the rendered scene, simulating a window-like 3D perspective where the viewpoint shifts as the user moves. This effect strengthens depth cues and creates a highly immersive experience without requiring specialized hardware. The system was built using three.js for interactive 3D rendering and tested on Chrome and Firefox for cross-platform compatibility. The project demonstrates the potential of combining lightweight computer vision techniques with web-based graphics frameworks to make immersive depth-enhanced experiences accessible directly in the browser. Demo
10. Nonlinear Control Signal Integration for Dynamic Trajectory Generation
This project addresses the mathematical modeling and simulation of nonlinear control-driven trajectories in autonomous systems. First, by formulating the system dynamics through continuous-time state equations, the study investigates the influence of discrete steering inputs and velocity variations on trajectory evolution. Specifically, the work demonstrates how control signals, defined by piecewise-constant and linearly varying steering angles, generate complex motion patterns such as circular paths, counter-clockwise loops, and spiral trajectories. Furthermore, a parametric velocity sweep experiment reveals the sensitivity of trajectory formation to initial conditions and speed scaling. The integration of the nonlinear system equations provides a rigorous framework for analyzing motion behaviors, which has direct implications in the domains of mobile robotics, intelligent transportation, and control theory. The project will compare the predictive accuracy of the neural network against the ground-truth mathematical model, investigating how factors like network architecture and activation functions (e.g., ReLU, tanh) influence performance.
11. Toward Trustworthy Neural Architectures for Real-Time Decision-Making
This project investigates the design and analysis of neural architectures that balance predictive accuracy with interpretability and fairness. At its core, the framework consists of a prediction module (Pred), a latent representation (h), a decoder (Dec), and an evaluation/correction unit (C), ensuring the final output aligns with ground truth labels. The pipeline emphasizes not only prediction accuracy but also systematic calibration through corrective mechanisms. The study explores activation functions, including leaky ReLU variants, tanh, and sigmoid functions, to evaluate their role in stability, gradient flow, and robustness of deep learning models under noisy or high-dimensional data. Additionally, the project examines energy-based formulations (E) to better understand representation learning and optimization landscapes. The goal is to create a theoretical and practical foundation for real-time, socially responsible AI systems that are robust, fair, and transparent, enabling deployment in domains such as healthcare, autonomous systems, and digital governance. Finally, by integrating mathematical rigor with system-level design, this project advances the field toward explainable, efficient, and accountable AI architectures.
12. Nonlinear State Transition Modeling for Tricycle-Based Robotic Navigation
This project focuses on the mathematical formulation, simulation, and analysis of nonlinear state-transition models for tricycle-based robotic navigation. The tricycle kinematic framework is adopted, in which the system state is characterized by position, orientation, and velocity, while the control vector consists of steering angle and acceleration. However, by deriving and integrating the nonlinear differential equations of motion, the study examines how structured control inputs such as stepwise steering signals and velocity sweeps affect the evolution of system trajectories. The results demonstrate diverse trajectory behaviors, including circular arcs, counter-clockwise loops, and spirals, thereby highlighting the sensitivity of tricycle-type nonholonomic systems to input variations. Through numerical simulations, this work establishes a rigorous link between control signal design and geometric path generation, offering valuable insights for trajectory optimization, motion planning, and stability analysis in autonomous robotic systems and intelligent transportation applications.