Paper-Conference

SetPieceRAG: Domain-Specific RAG for Knowledge-Intensive Soccer VQA with Large Language Models

Young Seon Kim, Jongmin Lee

[CVPR 2026 Workshop (CVSports)] Spotlight A SoccerWiki-grounded, multi-source RAG with CLIP retrieval and LLM ensembling sets state-of-the-art on the SoccerNet VQA benchmark.

SetPieceRAG: Domain-Specific RAG for Knowledge-Intensive Soccer VQA with Large Language Models

Exaone 4.0 VL: Vision-Language Foundation Model for Enterprise AI Agent

Jongmin Lee, LG AI Research

[LG AI Talk Concert 2025] Exaone-4.0-VL, a 32B multimodal foundation model, reaches SOTA on vision-language benchmarks and powers enterprise AI agents.

3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction

Jongmin Lee, Minsu Cho

[NeurIPS 2024] Directly regressing Wigner-D coefficients in the frequency domain enables consistent single-image 3D orientation estimation.

3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction

Learning Rotation-Equivariant Features for Visual Correspondence

Jongmin Lee, Byungjin Kim, Seungwook Kim, Minsu Cho

[CVPR 2023] Group-equivariant CNNs with a group-aligning technique yield rotation-invariant local descriptors for robust visual correspondence.

Learning Rotation-Equivariant Features for Visual Correspondence

Self-Supervised Equivariant Learning for Oriented Keypoint Detection

Jongmin Lee, Byungjin Kim, Minsu Cho

[CVPR 2022] Rotation-equivariant CNNs with dense orientation alignment detect robust oriented keypoints for image matching and pose estimation.

Self-Supervised Equivariant Learning for Oriented Keypoint Detection

Self-supervised Learning of Image Scale and Orientation Estimation

Jongmin Lee, Yoonwoo Jeong, Minsu Cho

[BMVC 2021] Self-supervised histogram alignment estimates patch scale and orientation, boosting image matching and 6-DoF pose estimation.

Self-supervised Learning of Image Scale and Orientation Estimation

Learning to Distill Convolutional Features Into Compact Local Descriptors

Jongmin Lee, Yoonwoo Jeong, Seungwook Kim, Juhong Min, Minsu Cho

[WACV 2021] Distilling multi-level CNN features produces local descriptors up to 100× smaller with state-of-the-art matching performance.

Learning to Compose Hypercolumns for Visual Correspondence

Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho

[ECCV 2020] Dynamic Hyperpixel Flow adaptively composes hypercolumn features per image pair for state-of-the-art semantic correspondence.

Hyperpixel Flow: Semantic Correspondence with Multi-layer Neural Features

Juhong Min, Jongmin Lee, Jean Ponce, Minsu Cho

[ICCV 2019] Hyperpixels fuse multi-level CNN features with Hough voting for real-time, state-of-the-art semantic correspondence on SPair-71k.

Attentive Semantic Alignment with Offset-Aware Correlation Kernels

Paul Hongsuck Seo, Jongmin Lee, Deunsol Jung, Bohyung Han, Minsu Cho

[ECCV 2018] An attention model with an offset-aware correlation kernel filters distractors and captures local transformations for SOTA semantic correspondence.