DisCo-FLoc: Using Dual-Level Visual-Geometric Contrasts to Disambiguate Depth-Aware Visual Floorplan LocalizationSince floorplan data is readily available, long-term persistent, and robust to changes in visual appearance, visual Floorplan Localization (FLoc) has garnered significant attention. Existing methods either ingeniously match geometric priors or utilize sparse semantics to reduce FLoc uncertainty. However, they still suffer from ambiguous FLoc caused by repetitive structures within minimalist floorplans. Moreover, expensive but limited semantic annotations restrict their applicability. To address these issues, we propose using dual-level visual-geometric contrasts to disambiguate depth-aware visual Floc, without requiring additional semantic labels. Our solution begins with a ray regression predictor tailored for ray-casting-based FLoc, predicting high-accuracy FLoc candidates using depth estimation expertise. In addition, a novel contrastive learning method with position-level and orientation-level constraints is proposed to strictly match depth-aware visual features with the corresponding geometric structures in the floorplan. Such matches can effectively eliminate FLoc ambiguity and determine the optimal imaging pose from FLoc candidates. Exhaustive comparative studies on two standard visual Floc benchmarks demonstrate that our method outperforms the state-of-the-art semantic-based method, achieving significant improvements in both robustness and accuracy.
The DisCo-FLoc pipeline: our method begins with a ray regression predictor for candidate pose generation, followed by a visual-geometric contrastive learning module that achieves strict matches between depth-aware visual features and floorplan structures.
Table 1: Comparative studies between our visual FLoc method with baselines on Gibson(f) and Gibson(g) datasets.
| Method (Venue) | Gibson(f) R@ | Gibson(g) R@ | ||||||
|---|---|---|---|---|---|---|---|---|
| 0.1 m | 0.5 m | 1 m | 1 m 30° | 0.1 m | 0.5 m | 1 m | 1 m 30° | |
| PF-net (CoRL 2018) | 0 | 2.0 | 6.9 | 1.2 | 1.0 | 1.9 | 5.6 | 1.9 |
| MCL (ICRA 1999) | 1.6 | 4.9 | 12.1 | 8.2 | 2.3 | 6.2 | 9.7 | 7.3 |
| LASER (CVPR 2022) | 0.4 | 6.7 | 13.0 | 10.4 | 0.7 | 7.0 | 11.8 | 9.5 |
| F³Loc (CVPR 2024) | 4.7 | 28.6 | 36.6 | 35.1 | 4.3 | 26.7 | 33.7 | 32.3 |
| 3DP (ACM MM 2025) | 5.3 | 33.2 | 39.8 | 38.4 | 9.4 | 37.4 | 43.1 | 41.5 |
| RSK (AAAI 2026) | 8.3 | 38.5 | 45.3 | 43.6 | 8.7 | 36.4 | 42.3 | 40.4 |
| 3DP & RSK | 10.9 | 42.7 | 47.9 | 46.5 | 10.7 | 38.8 | 44.4 | 42.8 |
| Ours w/o Dis. | 12.0 | 45.8 | 50.6 | 49.2 | 12.3 | 45.0 | 49.9 | 48.2 |
| Ours | 13.1 | 50.9 | 56.7 | 55.4 | 12.4 | 47.0 | 52.5 | 51.3 |
Table 2: Comparative studies between our visual FLoc method with baselines on the Structured3D (full) dataset.
Oracle indicates localization using ground truth geometry and semantic rays, where semantic labels include doors, windows, and walls.
| Method (Venue) | Structured3D (full) R@ | Sem. | |||
|---|---|---|---|---|---|
| 0.1 m | 0.5 m | 1 m | 1 m 30° | ||
| PF-net (CoRL 2018) | 0.2 | 1.3 | 3.2 | 0.9 | no |
| MCL (ICRA 1999) | 1.3 | 5.2 | 7.8 | 6.4 | |
| LASER (CVPR 2022) | 0.7 | 6.4 | 10.4 | 8.7 | |
| F³Loc (CVPR 2024) | 1.5 | 14.6 | 22.4 | 21.3 | |
| 3DP (ACM MM 2025) | 5.6 | 27.4 | 55.5 | 24.0 | |
| RSK (AAAI 2026) | 6.4 | 28.6 | 56.9 | 25.2 | |
| 3DP & RSK | 6.7 | 26.8 | 54.7 | 24.2 | |
| SemRayLocₛ (ICCV 2025) | 5.4 | 41.9 | 53.5 | 52.6 | yes |
| + 3DP (ACM MM 2025) | 5.5 | 46.6 | 56.2 | 56.7 | |
| + RSK (AAAI 2026) | 6.2 | 48.1 | 59.9 | 58.8 | |
| + 3DP & RSK | 7.1 | 48.9 | 61.5 | 60.0 | |
| SemRayLocᵣ (ICCV 2025) | 5.7 | 45.5 | 58.8 | 57.5 | |
| Ours w/o Dis. | 5.5 | 34.2 | 40.4 | 39.3 | no |
| Ours | 10.0 | 59.0 | 67.0 | 66.0 | |
| Oracle w/ sem | 61.0 | 93.9 | 94.9 | 94.6 | yes |
Input Image
Localization Result
@article{meng2026discofloc,
author = {Shiyong Meng and Tao Zou and Bolei Chen and Chaoxu Mu and Jianxin Wang},
title = {DisCo-FLoc: Using Dual-Level Visual-Geometric Contrasts to Disambiguate Depth-Aware Visual Floorplan Localization},
journal = {arXiv preprint arXiv:2601.01822},
year = {2026},
}