On MTV in 00s there was a program about how clips are shot. They put on a song, during the filming, the singer hears both the rhythm and the words and sings so as to get it. For different plans and episodes, they shoot several videos with the song, sometimes from beginning to end, then they throw out the excess and edit. In general, if you find this program with mtv, then all questions will disappear.
I'm not sure if I understood the question correctly, so sorry if the answer is not what you asked.
Even the simplest and most basic editing software has an option to separate audio tracks from the video. Thus, the audio will also be perfectly in sync with the image, but by itself, and at any time a piece of video can be cut out to insert something else of the same duration as the cut piece. And then the synchronicity of the parts of the video with the singer before and after the cut piece will not suffer.