My Projects

My research interests are Computer Vision, Robotics and Machine Learning. Following are some of the academic projects, undertaken during my masters and undergraduate career. See Publications of my recent research-work. Also see My Robots page for other robots built by me during my undergraduate. Videos of my projects can be found here.

Major Projects

Joint Semantic Segmentation and 3D Reconstruction

Spring 2002 - Spring 2014

Publications: ECCV'14

Multibody Visual SLAM

Spring 2002 - Spring 2014

Publications: ICCV'11

Moving Object Detection with Monocular Camera

October 2008 - June 2009

MovObjDet.jpgIn another project, I developed a robust and efficient method for detecting independently moving objects in a monocular image sequence. We introduce a novel geometric constraint in two views, capable of detecting moving objects followed by a moving camera in same direction, a so-called degenerate configuration where the commonly used epipolar constraint fails. This is made possible by exploiting the knowledge of camera motion to estimate a bound in image feature position along the epipolar line. A probability framework propagates the uncertainties in the system and recursively updates the probability of a feature being stationary or dynamic. We achieved successful and repeatable detection in various challenging real image sequences. Computation of 3D structure of environment in that framework helps in setting a tighter bound for the geometric constraints, which results in more accurate independent motion detection. The work received very appreciative reviews and resulted in publications at IROS'09 and ROBIO'10. This work has also been integrated to my multibody visual SLAM framework described above. It has been selected for oral presentation at ICVGIP'10 with strong positive reviews. We are currently working on consolidating these results in preparation of submitting to some premiere journal.

Video: vidMSsmall.jpg WMV(13.7MB) Publications: ICVGIP'10,  ROBIO'10,  IROS'09.

Computer Vision CS5765: Cricket Match Video Analysis

Spring, 2009

cric-vid2x2.jpgThe objective of this assignment was to extract different kind of information from a video of a cricket match. Students were given the freedom to decide what all tasks, they can do with the video. We performed several tasks like video shot detection, pitch detection, score detection, person/object detection, non-interrupting ads, ball by ball video segmentation. One special task that we did was to place non-interrupting ads in the video. Most of the advertisements that appear during matches either interrupts the view or appears occasionally like physical advertisements painted on the field. We intend to place advertisements on the cricket field, such that they give natural uninterrupted viewing. Input to the system is only a rectangular image of the ad. The top two images in the adjacent figure shows the ouput of the system for the ads of 'Pepsi' and 'MTV'. Please see the report for more details.
Report: PDF

Computer Vision CS5765: GIST Feature Descriptors on GPU

Spring, 2009

Person followingGIST is a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. It is a holistic statistical signature of the image, yielding abstract scene classification and layout. The procedure is based on a very low dimensional representation of the scene, termed the Spatial Envelope. A set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. These dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g.,streets, highways, coasts) are projected closed together. We have implemented a GIST model (Olivia & Torallaba IJCV-01) to compute the 'gist' descriptor of an image. If a GPU is present in the system, portions of the code that are highly parallel is executed on GPU using Nvidia CUDA API.
Report: PDF

Vison based Collision Avoidance & Occupancy Mapping

Spring, 2009 to present

Vison based Collision AvoidanceThis work aims to find vision based obstacle avoidance and navigation solution for single camera mounted robots. The idea is to generate 2D range information form a monocular camera, similar to the output of a 2D laser scanner, so that it can be used with existing navigation algorithms for lasers. A graph-based image segmentation algorithm (Pedro et al.) is used segment the ground-plane. We make use of fast vanishing point estimation method (Tardif et al.) to improve the segmentation. The ground-plane is parametrized by a homography matrix, which is also used filter points lying of the ground-plane. It is also used to generate a top-down view and a mosaic of the ground plane. Range information similar to a laser, can be extracted from top-view of the navigable area along the FOV of the camera. I then make use of this range information to run a Nearness Diagram (ND) collision avoidance algorithm. We have also used it for occupancy grid mapping of our lab. The work is still in its preliminary stages and we are currently working towards improving the range information, so as to make it feasible for more tasks.

Undergraduate Research

Vision based Person following for Mobile Robots

Spring, 2008

Person followingTo develop socially appropriate skills for robots, it is very important for a robot to be able to follow a person. In this project, I implemented an efficient person following feature using only a single webcam on the robot. The approach has been successfully tested on a mobile robot. Initially the person needs to be introduced to the system. During this phase, it tries to detect human face. Upon successful detection, it learns the color histogram of the upper body (chest area) of the person, which will be just below the detected face. The color histogram thus obtained along with some other analyses is used to track and predict the person's motion.

Video: vidMSsmall.jpg WMV(3.6MB)

Ability to read text and symbols for Mobile Robots

Spring, 2008

MERP navigating using direction labels on room wallsGiving mobile robots the ability to read textual messages and symbols is highly desirable to increase their autonomous capability of navigating in the real world. Apart from a map of the environment, direction symbols, name-plates, room numbers surely can help in robot localization, and is similar to the way, we humans navigate inside office buildings, roads, etc. Main challenge is to find the potential message containing areas and to get a good view of the message before applying conventional OCR techniques over these areas of the image. A mobile robot navigating a small space with the help of written direction messages on wall has been successfully demonstrated.

Video: vidMSsmall.jpg WMV(6.7MB)

Face, Person detection and tracking

Fall, 2007

face detection and trackingIn this project, I studied and implemented various face detection and face recognition algorithms. Face detection using boosted cascade of haar-features, and principal component analysis (PCA) based face recognition using eigenfaces, was implemented on a conventional laptop, onboard a mobile robot, using images acquired from a webcam on the robot. A robust object tracking was also implemented, to help the robot track persons around it. Objective of the project was to enable a mobile robot to detect and identify humans around it, a step towards cognitive social robot.

Video: vidMSsmall.jpg WMV(4.8MB)

MERP: a Mobile Robot Platform

Spring, 2008

MERPMERP is a mobile robot platform designed and developed as a testbed for my robotics learning and research. The design objective was to create a modular, easy to debug robot platform, both at hardware and software level. Over this platform, I have implemented and developed various algorithms (see above), to develop a mobile robot that can interact and assist people in home/office environment. This has been built with cheap off the shelf equipments, which can be afforded by hobbyists and home users. Main components of the paltform includes a onboard laptop, three AVR ATmega uc, a webcam, two dc-geared motors, pan-tilt mechanism for webcam mounting, dc-dc coverters for power management. Onboard laptop performs the higher level tasks like OpenCV for vision routines and Microsoft Speech SDK for speech recognition. For lower level routines like motor control, pan-tilt mechanism, sensor interfacing, I designed separate control boards, using AVR ATmega16 microcontrollers. Communication between different microcontrollers and the onboard laptop is done with UART and I2C serial communication.

Video: vidMSsmall.jpg mpg(47.4MB) | on Vimeo