Debanjan Nandi

Graduate Student
Computer Science and Engineering
The Ohio State University

Contact: debanjan(dot)com@gmail(dot)com
nandi(dot)20@osu(dot)edu

Research Projects

Predicting Human Trajectories with LSTM using an Adaptive Attention Framework [pdf]

Master's Project, Advisor: Dr. James W. Davis , The Ohio State University
Abstract: An unprecedented growth in efforts towards building autonomous vehicles and social robots over the last couple of years in human-centric environments has redefined the importance of understanding human behavior. In this work, we present an approach which predicts future trajectories of people in a crowd using a data-driven architecture. We use a feed-forward, fully differentiable, and jointly trained recurrent neural network (RNN) mixture model augmented with a novel pedestrian weighting scheme to model trajectories of all humans in the crowd. Our integrated attention module has the flexibility to adapt its neighborhood of influence based on the pedestrian’s behavior, and it learns the attention from the data itself.

Title Block Analysis & Retrieval of Information from Scanned Engineering Drawing Images [pdf]

Master's Thesis, Advisor: Dr. Jayanta Mukhopadhyay, IIT Kharagpur, India
Abstract: In this modern digital age, the increasing use of computer-aided design (CAD) and computer-aided manufacturing (CAM) systems has prompted the move from paper-based documentation towards computerized storage and retrieval systems. Document update and revision is efficiently achieved in this computerised form. However, it still remains a nightmare for enterprises working with digital drawings to reap the complete benefits of digitization of documents, especially those related to search and storage using contexts inside these documents, since what the computer sees is only images.
As part of the thesis, we are trying to propose a novel pre-processing operation towards extraction of relevant text information from the title block of the architectural drawing documents and maybe character recognition. The extracted data will be stored as Meta data for the specific image and will be used for processing. The specific title block extraction will be very useful for reprographics industry where large number of architectural documents are present.

A Comparative Analysis of Nitride based Compound Semiconductor Resonant Tunnelling Diode [pdf]

Bachelor's Thesis, Advisor: Dr. Dhrubes Biswas, IIT Kharagpur, India
Abstract: Resonant Tunnelling diodes are commonly formed from heterostructures consisting of layers of GaAs/AlAs, InAs/AlSb, or InAs/GaSb on GaAs or InP substrates, depending upon criteria such as lattice matching. However, III- nitrides such as AlGaN / GaN have gained a lot of interest recently for resonant tunnelling diodes (RTDs). This is primarily because of their wide bandgap, large conduction band offset, and high carrier mobility. It thus helps in creation of larger PVRs and quantum behaviour at much higher temperature than any other III-V systems. Thus such devices can offer higher power, higher frequency room- temperature operations than other members of the III-V family.
In this thesis, I have attempted to analyse the current – voltage characteristics of AlGaN / GaN resonant tunnelling diodes through Matlab and ATLAS simulation. A current – voltage simulation of AlInN / GaN resonant tunnelling diode is done next, followed by a comparison of Al0.2Ga0.8N / GaN and Al0.83In0.17N / GaN RTDs. Polarization charges, which play a crucial role in all III-V nitride based heterostructures, have been taken into consideration for calculations.

Other Select Projects

Visual Question Answering (CSE 5194.01) [pdf]

Our Visual QA Project attempted to run and modify a Keras implementation of the original baseline model for Visual Question Answering. Our model uses Convolutional Neural Network (CNN) for image recognition and a Neural Language Model for modeling questions. Embedding features for the image and the questions are combined using point-wise multiplication and processed together by a Multi Layer Perceptron. Our best model uses GloVe vectors to generate embeddings for each word in the question and then uses a 1D Convolutional Neural Network to generate embeddings for each question. The standard Resnet 152 model, trained on Imagenet, is used to generate image features. This model achieves an accuracy of about 61% compared to the baseline accuracy of 58%.

Recurrent Query Patch Generation for Visual Question Answering (CSE 5194.01) [pdf]

Convolutional Neural Networks have made great strides in object detection and classification, but often at great computational cost. Meanwhile, attention mechanisms in image captioning and question-answering tasks, as well as LSTM networks in language, point to the value of sequential processing for the understanding of complex signals. We propose a network architecture inspired by (but ultimately quite divergent from) biology for the processing of large images in sophisticated tasks.

Human Pose-Estimation Controlled Mario (CSE 5524) [ppt]

Traditional Mario game uses keyboard input to make the Mario jump, crouch, shoot etc. As a team of 3, we developed a MATLAB based application that takes in web-cam input of a person in real time, estimates the pose of the human as standing / crouching / jumping / shooting using computer vision techniques, and moves the Mario accordingly.

EMI Music Data Mining (CSE 5243)

As a team of 2, we designed an algorithm which compares users' demographics, artist and track ratings, answers to questions about their preferences for music, and words that they use to describe EMI artists in order to predict how much they like tracks they have just heard.
We also developed a custom collaborative filtering method which predicts the ratings of a track based on ratings given by other similar users.