Many recent prototypes for video collaboration, digital media sharing and gesture interfaces provide a video signal for display on a screen or surface and capture another video signal through the same screen or surface. The media captured in such systems, for transmission or for gesture user interfaces, needs to be separated from the displayed video. Otherwise, video cross-talk occurs. The prior, widely used temporal multiplexing avoids cross-talk by synchronizing camera capture with screen display so that the camera only captures when the screen does not display signal. This approach suffers from light loss (both displayed and captured) and increased display flicker due to the lower duty cycle of the displayed signal. This paper describes a new method, computational temporal modulation, that temporally modulates the displayed signal. The intentionally mixed signals captured by the camera are subsequently separated using computations. Our approach results in brighter display with less flicker and more signal captured by the camera. Experiments using a prototype collaboration system show good quality cross-talk reduction with light-weight real-time computation.