In recent years, the intelligent surveillance systems have attracted many application domains, due to the increasing demand on security and safety. Unmanned Areal Vehicles (AUVs) represent the reliable, low-cost solution for mobile sensor node deployment, localization, and collection of measurements. This paper presents a surveillance UAV-based system, aimed at understanding the scene situation by collecting raw data from the environment (by exploiting some possible sensor modalities: CCTV camera, infrared camera, thermal camera, radar, etc.), processing their fusion and yielding a semantic, high-level scenario description. UAV is able to recognize objects and the spatio-temporal relations with other objects and the environment. Moreover, UAV is able to individuate alerting situations and suggest a recommended intervention to humans. A Fuzzy cognitive map model is indeed, injected in the UAV: from the semantic description of the scenario, the UAV is able to deduct casual effect of occurring situations, that enhances the scenario understanding, especially when alarming situations are discovered.