As you already know, an on-camera video doesn't need to be fully linear, one take. Nor does it need to be scripted. Cutting a single, extemporaneous 10-minute take down to something much shorter is certainly viable. There's also no reason the subject has to face the camera head-on. Looking off-camera and addressing an unseen party makes it all less news-anchor like, and can help give the feeling of being included in a conversation.
Additionally, there's no reason anyone has to appear on-camera. This is a simple example of using text (and a scripted VO--though it can also be done with a casual conversation) to reinforce an audio message...
...and this is an example of how an unscripted long audio recording can become a short video set to relevant still images. (Of course, this subject matter isn't very much like anything you'd be doing, but does help illustrate the potential for a casual conversation set to still images.)