Sauradip Nag

Teaching machines to

I am currently a 3rd Year Doctor of Philosophy (PhD) student, focusing on Computer Vision and Deep Learning, in Xiang's Phd Group of Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, England, United Kingdom. My primary supervisor is Prof. Tao (Tony) Xiang, and co-supervisors are Prof. Yi-Zhe Song and Prof. Yongxin Yang. I also work closely with Dr. Xiatian (Eddy) Zhu.

Prior to this, I was a Project Associate in Visualization and Percepion Lab , IIT Madras working on a DRDO Project under Prof. Sukhendu Das which involves finding out hidden location from a unknown environment. I have also collaborated with Prof. Umapada Pal from Indian Statistical Institute (ISI), Kolkata and Prof. Palaihnakhote Shivakumara of University of Malaya , Malyasia on various Computer Vision research problems during undergraduate.

Office : You can primarily find me sitting in Desk 3, Room 01BB01, CVSSP ( Behind AP Plaza ), University of Surrey, UK. Will be happy to meet and talk over some cup of coffee beside the majestic Lake !

Email  /  Resume  /  Google Scholar  /  LinkedIn  /  GitHub  /  Twitter

Profile Photo
Research Interests

I am broadly interested in the field of Computer Vision and Deep Learning. Particularly, I have mostly focused on Visual Scene Understanding (VSU) from images and videos, effective methods of Transfer Learning for VSU, building systems that learn with minimal or no supervision in real and diverse scenarios. I am currently working in Vision-Language Modeling, Few-Shot/Zero-Shot/Meta-Learning, Ego-Centric Video Understanding, Video Action Detection/Recognition, Object Detection, Semantic Segmentation, Depth Estimation. Recently, I have shifted my focus on Generative Modeling, 3D Human Body-Action Modeling.
Collaboration : If you are interested in my research topics and interested in collaboration for some cool ideas kindly contact me via Email/Linkedin.


Education and Research Experience
University of Surrey, United Kingdom

Position : Doctor of Philosophy (PhD) in Electrical Engineering
Under Prof. Tao Xiang and Prof. Yi-Zhe Song
Jul 2020 - Present

Indian Institute of Technology Madras, India

Position : Project Associate in Visualization and Perception Lab and CAIR, DRDO
Under Prof. Sukhendu Das, Prof. K Mitra and Prof. B Ravindran
Jul 2018 - Jul 2020

Indian Statistical Institute Kolkata, India

Position : Research Associate in Computer Vision and Pattern Recognition Unit
Under Dr. Umapada Pal and Dr. P Shivakumara
Jul 2016 - Jul 2018

Kalyani Government Engineering College, India

Position : Bachelor of Technology (B.Tech) in Computer Science and Engineering
Under Dr. Kousik Dasgupta
Thesis : Interacting with Softwares using Gestures
Jul 2014 - Jul 2018

Journal Publications
A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

Sauradip Nag , P Shivakumara , Umapada Pal , Tong Lu , Michael Blumenstein
Pattern Recognition, Elsevier [IF : 7.196]

Abstract / Code / BibTex

An Episodic Learning Network for Text Detection on Human Bodies in Sports Images

P Chowdhury , P Shivakumara , R Ramachandra , Sauradip Nag , Umapada Pal , Tong Lu , Daniel Lopresti
IEEE Transactions on CSVT [IF : 4.6]

Abstract / BibTex

Conference Publications
Multi-Modal Few-Shot Temporal Action Detection
via Vision-Language Meta-Adaptation

Sauradip Nag , Mengmeng Xu , Xiatian Zhu , Juan Perez-Rua , Bernard Ghanem , Yi-Zhe Song , Tao Xiang
ArXiv, 2022

Abstract / Code / ArXiv / BibTex

Post-Processing Temporal Action Detection

Sauradip Nag , Xiatian Zhu , Yi-Zhe Song , Tao Xiang
ArXiv, 2022

Abstract / Code / ArXiv / BibTex

Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning

Sauradip Nag , Xiatian Zhu , Yi-Zhe Song , Tao Xiang
European Conference in Computer Vision (ECCV), 2022
Tel Aviv, Israel [ H5-Index : 187 ]

Abstract / Code / ArXiv / Project Page / BibTex

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Sauradip Nag , Xiatian Zhu , Yi-Zhe Song , Tao Xiang
European Conference in Computer Vision (ECCV), 2022
Tel Aviv, Israel [ H5-Index : 187 ]

Abstract / Code / ArXiv / Project Page / BibTex

Semi-Supervised Temporal Action Detection with Proposal-Free Masking

Sauradip Nag , Xiatian Zhu , Yi-Zhe Song , Tao Xiang
European Conference in Computer Vision (ECCV), 2022
Tel Aviv, Israel [ H5-Index : 187 ]

Abstract / Code / ArXiv / Project Page / BibTex

Few-Shot Temporal Action Localization with Query Adaptive Transformer

Sauradip Nag , Xiatian Zhu , Tao Xiang
British Machine Vision Conference (BMVC), 2021
Manchester, Virtual [ H5-Index : 66 ]

Abstract / Code / ArXiv / BibTex / Slides

What's There in the Dark

Sauradip Nag , Saptakatha Adak , Sukhendu Das
International Conference in Image Processing (ICIP), 2019 (Spotlight Paper)
Taipei, Taiwan [ H5-Index : 45 ]

Abstract / Code / BibTex

Facial Micro-Expression Spotting and Recognition using Time Contrasted Feature with Visual Memory

Sauradip Nag , Ayan Kumar Bhunia , Aishik Konwer, Partha Pratim Roy
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
Brighton, United Kingdom [ H5-Index : 80 ]

Abstract / arXiv / BibTex

CRNN based Jersey-Bib Number/Text Recognition in Sports and Marathon Images

Sauradip Nag , Raghavendra Ramachandra , Palaiahnakote Shivakumara , Umapada Pal , Tong Lu , Mohan Kankanhalli
International Conference on Document Analysis and Recognition (ICDAR), 2019
Sydney, Australia [ H5-Index : 26 ]

Abstract / BibTex

A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification

Sauradip Nag , Palaiahnakote Shivakumara , Wu Yirui , Umapada Pal , Tong Lu
International Conference on Frontiers in Handwriting Recognition (ICFHR), 2018
Niagara Falls, USA [ H5-Index : 18 ]

Abstract / arXiv / BibTex

Workshops and Challenges
Large-Scale Product Retrieval with Weakly Supervised Representation Learning

X Han*, K.W Ng* , Sauradip Nag , Z Qu
eBay eProduct Visual Search Challenge
Fine-Grained Visual Categorization Workshop (FGVC9), CVPR 2022
New Orleans, USA

Abstract / Code / ArXiv / BibTex

How Far Can I Go ? : A Self-Supervised Approach for Deterministic Video Depth Forecasting

Sauradip Nag* , Nisarg Shah* , Anran Qi* , R Ramachandra
Machine Learning for Autonomous Driving Workshop (ML4AD), NeurIPS 2021
Australia, Virtual

Abstract / Code / BibTex

Academic Projects
Temporal Action Localization Visualization Tool

Impressive progress has been reported in recent literature for action recognition. This trend motivates another challenging topic - temporal action localization: given a long untrimmed video, “when does a specific action start and end?” This problem is important because real applications usually involve long untrimmed videos, which can be highly unconstrained in space and time, and one video can contain multiple action instances plus background scenes or other activities. However, there is practically no code available to visualize the results and compare with the ground truth for a given video. The only thing that is available currently is the quantitaive results which is evaluated via the codes given by respective dataset. This is a visualization tool designed to bridge this gap and observe the performance of any pytorch model on Temporal Activity Localization. This has been designed in HTML, CSS , JS and python.


Computer Vision based Robot Locomotion

Vision-based robot navigation has long been a fundamental goal in both robotics and computer vision research. However, we do not require all semantic labelling of a particular environment for a robot to move. It requires only the floor part of a environment to navigate its path as we are dealing with ground based robots. Depth information is particularly useful in predicting how much the robot can move in a particular direction. Hence, the marriage between both the vision tasks provides us a free space map which enables the robot to move freely in a given direction.Hence, we have designed a novel motion control algorithm which enables the robot to naviagte through obstacles in its path. This is implemented in Pytorch.


Autonomous Robot Locomotion in prespecified path

Robotics have helped humans greatly in achieving everyday tasks. Robots are designed to work in any environment and perform task on behalf of humans. They operate under real-world and real-time constraints where sensors and effectors with specific physical characteristics have to be controlled. In many cases, those robots are controlled manually to move from one destination to another. An Unmanned Ground Vehicle (UGV) is a vehicle that operates while in contact with the ground and without an onboard human presence. We have used one of such robots to demonstrate the custom path which user can choose. Currently we have implemented 2 such custom paths. This has been implemented in Python and ROS.


Light-weight Salient Object Detection

Salient object detection is a prevalent computer vision task that has applications ranging from abnormality detection to abnormality processing. Context modelling is an important criterion in the domain of saliency detection. A global context helps in determining the salient object in a given image by contrasting away other objects in the global view of the scene. However, the local context features detects the boundaries of the salient object with higher accuracy in a given region. To incorporate the best of both worlds, our proposed SaLite model uses both global and local contextual features. It is an encoder-decoder based architecture in which the encoder uses a lightweight SqueezeNet and decoder is modelled using convolution layers. This has been implemented in PyTorch.

Code / Arxiv

Boundary Growing Algorithm for Recovery of Torso from Corrupt Face

Automatic face detection has been intensively studied for human related recognition systems. However, there has been a very little work in recovering face from a corrupted face image. We have designed a boundary growing algorithm where we incrementally grew the boundary of the corrupted face image and passed into HAAR cascade classifier to get confidence score. We kept on doing until we reach the maximum confidence. After that we used tailor measurements to recover the torso part of the human. This has been implemented using OpenCV and Python.


BaseLine Remover from Doument Images

This a small Matlab Implementation for Removal of Base Line from document using Edge Directional Kernel stated in paper " Edge enhancement algorithm for low-dose X-ray fluoroscopic imaging " by Lee et al. Here Baselines are Removed using Edge Directional Kernel . BaseLine Removal is an important topic in Document Image Analysis . In this paper Lee et al proposed removal of Noise from XRAY Images using Edge Directional Kernel and High Pass Filter but since our Noise is only Baseline we used a clever trick to implement only the Edge Directional Kernel and the reuslts are quite neat. This method works well for Half Page Document and Cropped Line Images , however if full page images are preprocessed then it may work pretty well.

Code / Paper

Academic Services
  • COM3013: Computational Intelligence, Teaching Assistant (2022-23), University of Surrey

  • Technical Programme Committee:
  • ML for Autonomous Driving Workshop, NeurIPS

  • Conference Review Services:

  • Journal Review Services:
  • Springer Nature Computer Science
  • International Journal of Human-Computer Interaction
  • IEEE Transactions of Circuit Systems and Video Technology
  • IEEE Transactions of Pattern Analysis and Machine Intelligence

Visitor Map

Copyright © Sauradip Nag. Last updated Aug 2022 | Template provided by Dr. Jon Barron