Computer generated graphics are superimposed onto live video emanating from an endoscope, offering the surgeon visual information that is hiding in the native scene-this describes the classical scenario of augmented reality in minimally invasive surgery. Research efforts have, over the past few decades, pressed considerably against the challenges of infusing a priori knowledge into endoscopic streams. As framed, these contributions emulate perception at the level of the surgeon expert, perpetuating debates on the technical, clinical, and societal viability of the proposition. We herein introduce interactive endoscopy, transforming passive visualization into an interface that allows the surgeon to label noteworthy anatomical features found in the endoscopic video, and have the virtual annotations remember their tissue locations during surgical manipulation. The streamlined interface combines vision-based tool tracking and speech recognition to enable interactive selection and labeling, followed by tissue tracking and optical flow for label persistence. These discrete capabilities have matured rapidly in recent years, promising technical viability of the system; it can help clinicians offload the cognitive demands of visually deciphering soft tissues; and supports societal viability by engaging, rather than emulating, surgeon expertise. Through a video-assisted thoracotomy use case, we develop a proof-of-concept to improve workflow by tracking surgical tools and visualizing tissue, while serving as a bridge to the classical promise of augmented reality in surgery.