Image filtering is helpful to numerous multimedia, computer vision and graphics tasks. Linear translation-invariant filters with manually designed kernels have been widely used. However, their performance suffers from the content-blindness, say identically treating noises, textures and structures. To mitigate the content-blindness, a family of filters, called joint/guided filters, has attracted much attention from the community, the principle of which is transferring the structure in the reference image to the target one. The main drawback of most joint/guided filters comes from the ignorance of structural inconsistency between the reference and target signals that can be like color, infrared and depth images captured under different conditions. Simply adopting such guidances very likely leads to unsatisfactory results. To address the above issues, this paper designs a simple yet effective filter, named as mutually guided image filter (muGIF), which jointly preserves mutual structures, avoids misleading from inconsistent structures and smooths flat regions. The proposed muGIF is very flexible, which can perform in one of dynamic only (self-guided), static/dynamic and dynamic/dynamic modes. Although the objective of muGIF is in nature non-convex, by subtly decomposing the objective, we can solve it effectively and efficiently. The advantages of muGIF in terms of effectiveness and flexibility are demonstrated over other state-of-the-art alternatives on a variety of applications.