Track: Oral Session 5C: Visual and Spatial Computing

Sun 15 June 7:00 - 7:15 PDT

Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks

Junying Wang ⋅ Hongyuan Zhang ⋅ Yuan Yuan

Recent personalized portrait generation methods, taking a facial image and a textual prompt as inputs, have attracted substantial attention. Although these methods generate high-fidelity portraits, they fail to prevent the generated portraits from being tracked and misused by malicious face recognition systems. To address this, this paper proposes a Customized Portrait Generation framework with facial Adversarial attacks (Adv-CPG). Specifically, to achieve facial privacy protection, we devise a lightweight local ID encryptor and an encryption enhancer. They implement progressive double-layer encryption protection by directly injecting the target identity and adding additional identity guidance, respectively. Furthermore, to accomplish fine-grained and customized portrait generation, we develop a multi-modal image customizer capable of generating controllable fine-grained facial features. To the best of our knowledge, Adv-CPG is the first study that introduces facial adversarial attacks into customized portrait generation. Extensive experiments demonstrate the superiority of Adv-CPG, e.g., the average attack success rate of the proposed Adv-CPG is 28.1% and 2.86% higher compared to the SOTA noise-based attack methods and unconstrained attack methods, respectively.

Sun 15 June 7:15 - 7:30 PDT

Gromov–Wasserstein Problem with Cyclic Symmetry

Shoichiro Takeda ⋅ Yasunori Akagi

We propose novel fast algorithms for the Gromov–Wasserstein problem (GW) using cyclic symmetry of input data. Such GW with cyclic symmetry naturally appears as an object matching task underlying various real-world computer vision applications, e.g., image registration, point cloud registration, stereo matching, and 3D reconstruction. Gradient-based algorithms have been used to solve GW, and our main idea is to use the following remarkable and non-trivial property: By setting the initial solution to have cyclic symmetry, all intermediate solutions and matrices appearing in the gradient-based algorithms have the same cyclic symmetry until convergence. Based on this property, our gradient-based algorithms restrict the solution space to have cyclic symmetry and update only one of the symmetric parts of solutions and matrices at each iteration, which results in fast computation. Furthermore, the original gradient-based algorithms and ours must solve the Optimal Transport problem (OT) at each iteration, but only in ours does this problem exhibit cyclic symmetry. This cyclic OT can be solved efficiently, and as a result, the total computational time of our algorithms is dramatically faster than the original ones. Experiments showed the effectiveness of our algorithms in synthetic and real-world data with strict and approximate cyclic symmetry, respectively.

Sun 15 June 7:30 - 7:45 PDT

Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields

Runfeng Li ⋅ Mikhail Okunev ⋅ Zixuan Guo ⋅ Anh H Duong ⋅ Christian Richardt ⋅ Matthew O’Toole ⋅ James Tompkin

We present a method to reconstruct dynamic scenes from monocular continuous-wave time-of-flight cameras using raw sensor samples that is as accurate as past methods and is 100$\times$ faster. Quickly achieving high-fidelity dynamic 3D reconstruction from a single viewpoint is a significant challenge in computer vision. Recent 3D Gaussian splatting methods often depend on multi-view data to produce satisfactory results and are brittle in their optimizations otherwise.In time-of-flight radiance field reconstruction, the property of interest---depth---is not directly optimized, causing additional challenges.We describe how these problems have a large and underappreciated impact upon the optimization when using a fast primitive-based scene representation like 3D Gaussians.Then, we incorporate two heuristics into our optimization to improve the accuracy of scene geometry for under-constrained time-of-flight Gaussians.Experimental results show that our approach produces accurate reconstructions under constrained sensing conditions, including for fast motions like swinging baseball bats.

Sun 15 June 7:45 - 8:00 PDT

Award Candidate

Zero-Shot Monocular Scene Flow Estimation in the Wild

Yiqing Liang ⋅ Abhishek Badki ⋅ Hang Su ⋅ James Tompkin ⋅ Orazio Gallo

Foundation models have shown generalization across datasets for many low-level vision tasks, like depth estimation, but no such model exists for scene flow.Even though scene flow has wide potential use, it is not used in practice because current predictive models do not generalize well.We solve three challenges to fix this problem.First, we create a method that jointly estimates geometry and motion for accurate prediction.Second, we alleviate scene flow data scarcity with a data recipe that affords us 1M annotated training samples across diverse synthetic scenes.Third, we evaluate different parameterizations for scene flow prediction and identify a natural and effective parameterization.Our resulting model outperforms existing methods as well baselines built on foundation models in term of 3D end-point error, and shows zero-shot generalization to the casually captured videos from DAVIS and the robotic manipulation scenes from RoboTAP.Overall, this makes scene flow prediction significantly more practical for in-the-wild use.

Sun 15 June 8:00 - 8:15 PDT

Award Candidate

3D Student Splatting and Scooping

Jialin Zhu ⋅ Jiangbei Yue ⋅ Feixiang He ⋅ He Wang

Recently, 3D Gaussian Splatting (3DGS) provides a new framework for novel view synthesis, and has spiked a new wave of research in neural rendering and related applications. As 3DGS is becoming a foundational component of many models, any improvement on 3DGS itself can bring huge benefits. To this end, we aim to improve the fundamental paradigm and formulation of 3DGS. We argue that as an unnormalized mixture model, it needs to be neither Gaussians nor splatting. We subsequently propose a new mixture model consisting of flexible Student's t distributions, with both positive (splatting) and negative (scooping) densities. We name our model Student Splatting and Scooping, or SSS. When providing better expressivity, SSS also poses new challenges in learning. Therefore, we also propose a new principled sampling approach for optimization. Through exhaustive evaluation and comparison, across multiple datasets, settings, and metrics, we demonstrate that SSS outperforms existing methods in terms of quality and parameter efficiency, e.g. achieving matching or better quality with similar numbers of components, and obtaining comparable results while reducing the component number by as much as 82%.