Sauradip Nag

What Goes Around
Comes Around.


I am currently a Postdoctoral Researcher at GrUVi, Simon Fraser University (SFU), Canada. I am working on Multi-Modal Generative Modeling with Prof. Richard (Hao) Zhang and Dr. Ali-Mahdavi Amiri. During my Post-doc I also worked with Prof. Daniel Cohen-Or.

Prior to this, I completed my Doctor of Philosophy (PhD), focusing on Computer Vision and Deep Learning in 2023, from Xiang's Phd Group of Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, England, United Kingdom. I was advised by primary supervisor Prof. Tao (Tony) Xiang, and co-supervisor as Prof. Yi-Zhe Song. During my PhD, I also worked closely with Dr. Xiatian (Eddy) Zhu.

I am actively looking for F.T Research/Applied Scientist roles in Industry


Email  /  CV  /  Google Scholar  /  LinkedIn  /  GitHub  /  Twitter

Profile Photo
Updates


Work Experience
https://sauradip.github.io/
Simon Fraser University, Canada

Position : Post-Doctorate Researcher in GrUVi Lab, School of Computing Science
Worked on 3D Articulation, Segmentation and Hybrid Representations
Feb 2024 - Present

Indian Institute of Technology Madras, India

Position : Research Engineer/Associate in ICSR and CAIR, DRDO
Deployed RGBD-SLAM and Distilled Computer Vision Models on Robots
Jul 2018 - Feb 2020

GISCLE Systems, India

Position : Research Engineer Intern in Autonomous Driving Team.
Curating Data and Training Computer Vision Models for Autonomous Driving
Jan 2018 - Aug 2018


Education
University of Surrey, United Kingdom

Position : Doctor of Philosophy (PhD) in Electrical Engineering
Under Prof. Tao Xiang and Prof. Yi-Zhe Song
Thesis : Towards Efficient Temporal Activity Detection
Jul 2020 - Jan 2024

Kalyani Government Engineering College, India

Position : Bachelor of Technology (B.Tech) in Computer Science and Engineering
Under Dr. Kousik Dasgupta
Thesis : Interacting with Softwares using Gestures
Jul 2014 - Jul 2018


Publications
Articulate That Object Part: 3D Part Articulation via Text and Motion Personalization

Aditya Vora , Sauradip Nag , Kai Wang , Richard (Hao) Zhang
ACM Transaction of Graphics (TOG), 2026 (Provisional Accept) [ H5-Index : 114 ]

Seminal work to solve Controlled 3D Mesh articulation via personalization of Part Motion using Diffusion models. Existing Video Diffusion models still lack articulation awareness, this is a drop-in replacement for it

Abstract / Code / ArXiv / BibTex

Advances in 4D Representation: Geometry, Motion and Interaction

Mingrui Zhao , Sauradip Nag, , Kai Wang , Aditya Vora , Guangda Ji , Peter Chun , Ali-Madhavi Amiri , Richard (Hao) Zhang
Arxiv, 2025

A new survey on 4D representations of objects and scenes. We have dived deep into the literature in the perspective of the underlying geometry, the motion types used for driving the geometry and interaction mechanism with such geometry and motion..

Abstract / Code / ArXiv / BibTex

In-2-4D: Inbetweening from Two Single-View Images to 4D Generation

Sauradip Nag , Daniel Cohen-Or , Richard (Hao) Zhang , Ali-Mahdavi Amiri
ACM SIGGRAPH Asia, 2025
HongKong, China [ H5-Index : 114 ]

The first work to solve Generative 4D Interpolation from sparse view images. We divide the motion trajectory into rigid parts and model the 4D motion with 3D Gaussian Splatting as geometry.

Abstract / Code / ArXiv / BibTex

ASIA: Adaptive 3D Segmentation using Few Image Annotations

Sai Raj Kishore , Aditya Vora* , Sauradip Nag* , Ali-Mahdavi Amiri, Richard (Hao) Zhang
ACM SIGGRAPH Asia, 2025
HongKong, China [ H5-Index : 114 ]

The first work to do adaptive Part Segmentation in 3D Meshes from user annotated Images. Different from semantic Part Segmentations, this offers more user control and editing power to 3D Mesh.

Abstract / Code / ArXiv / BibTex

Cora: Correspondence-aware image editing using few step diffusion

Amir AM.* , Aryan Mikaieli* , Sauradip Nag, , Negar H. , Andrea Tagliasacchi , AM Amiri
ACM SIGGRAPH North America, 2025
Vancouver, Canada [ H5-Index : 114 ]

A new image editing method that enables flexible and accurate edits-like pose changes, object insertions, and background swaps-using only 4 diffusion steps. It uses semantic correspondences between the original and edited image to preserve structure and appearance where needed.

Abstract / Code / ArXiv / BibTex

SMITE: Segment Me in Time

Amir AM. , Sauradip Nag, , Negar H. , Andrea Tagliasacchi , Ghasan H. , Ali-Mahdavi Amiri
International Conference on Learning Representation (ICLR), 2025
Singapore [ H5-Index : 362 ]

A new Customized Video Segmentation approach which follows the segmentations from user annotated images of the object. It uses the emergent properties of Video Diffusion models to segment non-semantic annotations in the video in a consistent manner.

Abstract / Code / ArXiv / BibTex

RespoDiff: Dual-Module Bottleneck Transformation for Responsible T2I Generation

Silpa V.S , Sauradip Nag , Muhammad Awais , Serge Belongie , Anjan Dutta
Neural Information and Processing Systems (NeurIPS), 2025
San Diego, USA [ H5-Index : 371 ]

A Novel Framework for responsible T2I generation via dual transformations on the intermediate representations of diffusion model

Abstract / Code / ArXiv / BibTex

CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

Anindya Mondal , Ayan Banerjee , Sauradip Nag , Josep Lados , Xiatian Zhu , Anjan Dutta
Arxiv, 2025

A Novel High-Instance Image Generation method that uses MLLM as a designer and critic to generate asthetically beautiful images that do not suffer from semantic leakage

Abstract / Code / ArXiv / BibTex

OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

Anindya Mondal* , Sauradip Nag* , Joaquin Prada , Xiatian Zhu , Anjan Dutta
AAAI, 2025
Washington DC, USA [ H5-Index : 232 ]

A novel VLM based zero-shot Image counting approach that uses geometric properties of the image to count objects in dense scenes and suffering partial occlusions. We additionally propose the largest Image counting benchmark OmniCOunt-191 which has 191 Categories and over 30K annotations.

Abstract / Code / ArXiv / BibTex

DiffSED: Diffusion-based Sound Event Detection

Swapnil Bhosale* , Sauradip Nag* , Diptesh Kanojia , Jiankang Deng , Xiatian Zhu
AAAI Conference on Artificial Intelligence (AAAI), 2024 (Oral Paper)
Vancouver, Canada [ H5-Index : 212 ]

This work reformulated the discriminative Sound-Event Detection task into a Generative Learning paradigm using Noise-to-Latent Densoising Diffusion.

Abstract / Code / ArXiv / BibTex

DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion

Sauradip Nag , Xiatian Zhu , Jiankang Deng , Yi-Zhe Song , Tao Xiang
IEEE International Conference on Computer Vision (ICCV), 2023
Paris, France [ H5-Index : 239 ]

This work introduced the first DETR based Diffusion framework for Human Activity Detection task. It introduces a new Noise-to-Proposal denoising paradigm of Diffusion via Transformer Decoder as denoiser. This can be extended to any detection task.

Abstract / Code / ArXiv / BibTex

PersonalTailor: Personalizing 2D Pattern Design from 3D Point Clouds

Sauradip Nag , Anran Qi , Xiatian Zhu , Ariel Shamir
ArXiv, 2023

This work introduced a multi-modal latent-space disentanglement pipeline for 2D Garment Pattern editing from 3D point clouds. Disentangling latent gives the flexibility to add/edit/delete the panel latents individually whose composition forms new Garment Styles.

Abstract / Code / ArXiv / BibTex

Multi-Modal Few-Shot Temporal Action Detection

Sauradip Nag , Mengmeng Xu , Xiatian Zhu , Juan Perez-Rua , Bernard Ghanem , Yi-Zhe Song , Tao Xiang
Arxiv 2023

This work introduced a novel Multi-Modal Few-Shot setting for Human Activity Detection task, where each Support Set consists of both Videos and associated Captions/Text. This work also shows how Video-to-NullText inversion is done, similar to DreamBooth.

Abstract / Code / arXiv / BibTex

Post-Processing Temporal Action Detection

Sauradip Nag , Xiatian Zhu , Yi-Zhe Song , Tao Xiang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Vancouver, Canada [ H5-Index : 389 ]

This work introduced a new Parameter-Free learnable Post-Processing technique for Human Action Detection task. It uses Gaussian Based refinement of start/end points where the refined shift is estimated using Taylor's Expansion.

Abstract / Code / ArXiv / BibTex

Proposal-Free Temporal Action Detection via Global Segmentation Mask

Sauradip Nag , Xiatian Zhu , Yi-Zhe Song , Tao Xiang
European Conference in Computer Vision (ECCV), 2022
Tel Aviv, Israel [ H5-Index : 187 ]

This is the first work that introduces a new Proposal-Free paradigm in Human Action Detection task. It reformulates action start/end regression into a action-mask prediction problem. This makes it 30x faster in training and 2x in inference than existing approaches

Abstract / Code / ArXiv / Project Page / BibTex

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Sauradip Nag , Xiatian Zhu , Yi-Zhe Song , Tao Xiang
European Conference in Computer Vision (ECCV), 2022
Tel Aviv, Israel [ H5-Index : 187 ]

This is the first work that introduces Vision-language models for Zero-Shot Action Detection task. CLIP models off-the-shelf are not meant for detection tasks, it needs a class-agnostic masking to make it generalizable to zero-shot setting which is illustrated in this work.

Abstract / Code / ArXiv / Project Page / BibTex

Semi-Supervised Temporal Action Detection with Proposal-Free Masking

Sauradip Nag , Xiatian Zhu , Yi-Zhe Song , Tao Xiang
European Conference in Computer Vision (ECCV), 2022
Tel Aviv, Israel [ H5-Index : 187 ]

This work showcases that having a two-stage pipeline for Human Action Detection task suffers from Proposal Error-Propagation problem. This work propsoed a new single-stage framework coupled with novel self-supervised pre-training task to curb out this error.

Abstract / Code / ArXiv / Project Page / BibTex

Few-Shot Temporal Action Localization with Query Adaptive Transformer

Sauradip Nag , Xiatian Zhu , Tao Xiang
British Machine Vision Conference (BMVC), 2021
Manchester, Virtual [ H5-Index : 66 ]

Existing Few-Shot Action Detection tasks deal with trimmed video and has different designs for different few-shot algorithms. This work proposed a Model-Agnostic approach to use Untrimmed Video for adapting Query Videos using Support Samples

Abstract / Code / ArXiv / BibTex / Slides

An Episodic Learning Network for Text Detection on Human Bodies in Sports Images

P Chowdhury , P Shivakumara , R Ramachandra , Sauradip Nag , Umapada Pal , Tong Lu , Daniel Lopresti
IEEE Transactions on CSVT [IF : 4.6]

Introduces a new improved Human Centric approach of Detecting Bib Numbers from sports video by taking motion influenced Human Clothing and Camera Pose into consideration.

Abstract / BibTex

A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

Sauradip Nag , P Shivakumara , Umapada Pal , Tong Lu , Michael Blumenstein
Pattern Recognition, Elsevier [IF : 7.196]

Introduces a new way of Detecting Bib Numbers from sports video by taking Human Torso, Skin and Head into consideration.

Abstract / Code / BibTex

What's There in the Dark

Sauradip Nag , Saptakatha Adak , Sukhendu Das
International Conference in Image Processing (ICIP), 2019 (Spotlight Paper)
Taipei, Taiwan [ H5-Index : 45 ]

This is the first work that introduced Semantic Segmentation for Night-Time scenes. This approach used Cycle-GANS as a means to generate Night time segmentations and used a comparator network as a discriminator to distinguish real vs fake night-time sample.

Abstract / Code / BibTex

Facial Micro-Expression Spotting and Recognition using Time Contrasted Feature with Visual Memory

Sauradip Nag , Ayan Kumar Bhunia , Aishik Konwer, Partha Pratim Roy
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
Brighton, United Kingdom [ H5-Index : 80 ]

This work introduced a new spatio-temporal network for Facial Micro-Expression detection task. It introduced a time-constrasted feature extraction module that greatly improved the spotting of micro-expression from inconspicuous facial movements.

Abstract / arXiv / BibTex

CRNN based Jersey-Bib Number/Text Recognition in Sports and Marathon Images

Sauradip Nag , Raghavendra Ramachandra , Palaiahnakote Shivakumara , Umapada Pal , Tong Lu , Mohan Kankanhalli
International Conference on Document Analysis and Recognition (ICDAR), 2019
Sydney, Australia [ H5-Index : 26 ]

This work further improves the Bib Detection from images. It uses a 2D-Human Pose Keypoints to identify different possible locations for Bib numbers and then individually extracts them using LSTM based text recognition pipeline

Abstract / BibTex

A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification

Sauradip Nag , Palaiahnakote Shivakumara , Wu Yirui , Umapada Pal , Tong Lu
International Conference on Frontiers in Handwriting Recognition (ICFHR), 2018
Niagara Falls, USA [ H5-Index : 18 ]

This is the first work that can identify Ethnicity from Handwriting in documents. It uses a Cloud-of-Line distribution based feature representation whose dimension is reduced by a PCA to select the prominent differences and used by SVM for classification of ethnicity.

Abstract / arXiv / BibTex

Workshops and Challenges
DreamPet: Text Driven Controllable 3D Animal Generation using Gaussian Splatting

Vysakh Ramakrishnan , Sauradip Nag , Xiatian Zhu , Amal Dev Parakkat , Anjan Dutta
CV4Animals Workshop, CVPR 2024

This work introduces a RAG based Text-to-3D Animal Generation and model realistic fur or animal skin using 3DGS

Abstract / Code / ArXiv / BibTex

Adaptive-Labeling for Enhancing Remote Sensing Cloud Understanding

Jay Gala , Sauradip Nag , Huichou Huang , Ruirui Liu , Xiatian Zhu
Tackling Climate Change with Machine Learning Workshop, NeurIPS 2023

This work introduces a new algorithm to iteratively improve the existing noisy annotations and extract the best performance from any model via combination of Dynamic Thresholding coupled with FixMatch style optimization.

Abstract / Code / ArXiv / BibTex

Actor-Agnostic Multi-Label Action Recognition with Multi-Modal Query

Anindya Mondal* , Sauradip Nag* , Joaquin M. Prada , Xiatian Zhu , Anjan Dutta
New Ideas in Vision Transformers Workshop (NIVT), ICCV 2023

This work showcases that Action-recognition is not Actor specific if we can make use of Language embeddings. Hence be it Animal or Human action, it is a unified model without any actor specific information requirement for recognition.

Abstract / Code / ArXiv / BibTex

Large-Scale Product Retrieval with Weakly Supervised Representation Learning

X Han*, K.W Ng* , Sauradip Nag , Z Qu
eBay eProduct Visual Search Challenge
Fine-Grained Visual Categorization Workshop (FGVC9), CVPR 2022
New Orleans, USA

We achieved runners up position in this retrieval challenge. Additionally, we proposed novel solutions for mining pseudo-attributes and treat them as labels, some innovative training recipes and novel post-processing solutions for large-scale product retrieval task

Abstract / Code / ArXiv / BibTex

How Far Can I Go ? : A Self-Supervised Approach for Deterministic Video Depth Forecasting

Sauradip Nag* , Nisarg Shah* , Anran Qi* , R Ramachandra
Machine Learning for Autonomous Driving Workshop (ML4AD), NeurIPS 2021
Australia, Virtual

This work introduced the first self-supervised Video Depth Forecasting solution for autonomous driving. It proposed a new Feature Forecasting paradigm of generative modeling for generating future depth maps from rgb frames.

Abstract / Code / BibTex

Academic Services
    Teaching:
  • COM3013: Computational Intelligence (2022), University of Surrey
  • EEEM004: Advanced Topics in Computer Vision (2023), University of Surrey

  • Technical Programme Committee:
  • ML for Autonomous Driving Workshop, NeurIPS
  • Conflict of Interest Coordinator, SIGGRAPH North America

  • Conference Review Services:
  • CVPR, ICCV, ECCV, ICLR, NeurIPS, AAAI, ACCV, Eurographics

  • Journal Review Services:
  • Springer Nature Computer Science
  • International Journal of Human-Computer Interaction
  • IEEE Transactions of Circuit Systems and Video Technology
  • IEEE Transactions of Pattern Analysis and Machine Intelligence
  • IEEE Transactions of Image Processing
  • Elsevier Computer Vision and Image Understanding
  • IEEE Transactions on Visualization and Computer Graphics
Invited Talks
  • Dec, 2023 at Eizen AI : "Modern Approaches in Video Understanding" [Slides]

  • Jan, 2024 at Adobe : "Future of Video Editing"

  • April, 2024 at CMU RI : "CreativeX: Future of Creative Generation/Editing"





Copyright © Sauradip Nag. Last updated Aug 2022 | Template provided by Dr. Jon Barron