;

Capturing the chaotic glory of judo in 3D, from just a single image

Munkhtulga Battogtokh, Rita Borgo

07 October 2022

Technology & Science

In the summer of 2019, right before my final year of undergraduate studies was about to start, I watched the judo world championships with my father. This was the year the International Judo Federation used a 4DREPLAY system for the first time. This system was used to replay the important moments in which players executed successful techniques, from every angle. We were simply awed to see those unrepeatable moments captured in such unprecedented fullness and I was immediately inspired to go a step further and make those images into 3D models. However, nothing really worked. This was the perfect start to my BSc thesis project.

It initially came as a surprise that the 4DREPLAY images, despite providing “plenty” of views, did not suffice for a successful 3D reconstruction when we used existing software. However, we quickly caught up to the fact that 3D reconstruction is a difficult problem in general since a single 2D view corresponds to infinitely many 3D scenes (2D ambiguity). Approaches that use multiple images not only require high-quality images in high quantity but also the metadata of the viewpoints. This high input requirement made those approaches unusable in our context. On the other hand, single-image reconstruction approaches are especially challenged by 2D ambiguity and must rely on initial approximations or learnt biases.

In particular, we refer to differentiable inverse rendering, which optimises an initial 3D approximation against a target image by rendering the model with a differentiable rendering algorithm, and human body 3D reconstruction, which relies on learning with 3D data. While existing human body reconstruction techniques were closest to what we were looking for and worked in some limited settings, we identified clear bottlenecks in their performance due to reliance on 2D-pose annotation and learning. The learning methods had seen massive amounts of motion capture data, but they had never seen judo. Judo was chaotic unlike anything else and served as the perfect stress test for not only human body reconstruction but for all approaches.

Thankfully, we found a way to bypass the challenges by combining the two single-image reconstruction approaches. Inverse rendering allows us to optimise a 3D model directly against the target image and alleviate reliance on the 2D pose and prior bias while human body 3D reconstruction provides the initial approximation that inverse rendering requires. However, simply applying the former atop the latter is under-constrained and violates body shape constraints. Therefore, we proposed our technique of upstream parameter optimisation, with which we exploit the fact that the human body model itself is parameterised and differentiable. More specifically, we backpropagate error from the inverse rendering target further up from the 3D geometry, all the way to the upstream parameters that parameterise the geometry itself (primarily body pose). Furthermore, we propose our second technique of selective inverse rendering, which exploits the naturalistic differentiable rendering algorithm. This technique allows us to focus our optimisation on a selected area of the body (e.g., lower body only). The selection itself is simple and intuitive. One must simply “look” at the area to select by pointing the rendering algorithm’s camera to that area.

With these techniques, we were able to improve the overlap between 3D models and target images in a controlled manner. The results in our short paper presented at EuroGraphics 2022 demonstrate that our techniques achieve improved performance than both the baseline human body reconstruction technique and a concurrent technique that uses stronger constraints for judo images.

Ultimately, our techniques are powerful ways to grip and control the difficult problem of 3D reconstruction. We are interested in extending the techniques to other domains and would love to see them incorporated into existing tools.

This project, originally my BSc thesis, won the Alan Fairbourn Memorial Prize for the Most Meritorious Final Year Project. Together with my supervisor Dr Rita Borgo, we published our findings as a short paper at EuroGraphics 2022.