Compute in space, e.g., in miniaturized satellites, requires dealing with special physical and boundary constraints, including the limited energy budget. These constraints impose strict operational conditions on the on-board data processing system and its capability in dealing with sophisticated workloads suchlike Machine Learning (ML). In the meantime, the breakthroughs in ML based on Deep Neural Networks (DNNs) in the last decade promise innovative solutions to expand the functional capabilities of on-board data processing and to drive the space industry forward. Therefore, due to the aforementioned special requirements, performance- and power-efficient, and novel solutions and architectures for deploying ML via, e.g., FPGA-enabled SoC, particularly Commercial-Off-The-Shelf (COTS) solutions, are gaining significant interest in the space industry. Therefore it is essential to conduct extensive benchmarking and feasibility and efficiency analyses in different aspects: such analyses would require the investigation of options for programming and deployment as well as the investigation of various real-world models and datasets. To this end, a research and development activity is funded by the European Space Agency (ESA) General Support Technology Programme and is led by Airbus Defence and Space GmbH with the goal of developing an ML Application Benchmark (MLAB) that covers benchmarking aspects mentioned above. In this invited paper, we provide an overview of the MLAB project and discuss development and progress in various directions, including framework analyses, model, and dataset investigation. We elaborate on a benchmarking methodology developed in the context of this project to enable the analysis of various hardware platforms and options. In the end, focus on a particular use case of aircraft detection as a real-world example and provide an analysis of various performance and accuracy indicators including, accuracy, throughput, latency, and power consumption.