Delays in the sound (latency)

To keep transmission costs to a minimum, the data rate for videoconferencing is generally very low. This means that the vision signals need significant compression to squeeze into the small space available. Compression requires a considerable amount of electronic processing. One penalty to pay for this is the time taken for the vision signals to travel through all the circuitry. The delays are appreciable and can be of the order of 0.25 second. The delays introduced in compressing the sound signals are very much less, as not so much signal processing is needed. The result of this is that sound and vision from a site will be transmitted (and received) out of synchronisation, unless the situation is corrected. Even small errors are objectionable as demonstrated on television by films that are transmitted with a lack of lip synchronisation.

To overcome this problem in videoconferencing, the sound signals are delayed to synchronise them with the vision. Two consequences of this delayed sound are that there can be an appreciable delay introduced when conferencing with a remote site (latency) and that an echo can also be generated. This echo is most objectionable and can render a conference unintelligible. For more detail see Appendix C