For a given observational image,
the GST initially sample multiple appropriate camera poses automatically,
which are then employed as conditions to generate corresponding novel view images.
Relative Pose Estimation
For a pair of images representing the same scene,
the GST can infer relative camera poses effectively and demonstrates strong generalization capabilities,
even when the capture or creation conditions of these two images are markedly disparate.