Efficient execution of loops is one of the most important obstacles facing high-performance computer architectures. Loop scheduling involves handling a partially ordered set of operations which are to be performed repetitively over a number of iterations. In this paper we use Petri nets to study loop scheduling, due to their unique power for modeling both partial orders and cycles. The behavior of loops can be modeled by constructing, at compile time, a Petri-net behavior graph which exhibits a repetitive firing sequence known as a cyclic frustum. The main contributions of this paper include: The development of a Petri-net loop model called an SDSP-PN. Loops are first translated into a class of static dataflow graphs known as a static dataflow software pipeline (SDSP) and then the SDSP is translated into an SDSP-PN. When an SDSP-PN is executed according to the earliest firing rule, a cyclic frustum appears in the behavior graph within a bounded number of steps. We show that (1) in an SDSP-PN having one critical cycle, a polynomial bound can be established for the cyclic frustum to occur (for all nodes in the loop) under the earliest firing rule; in an SDSP-PN having multiple critical cycles, a polynomial bound can be established for the cyclic frustum to occur only for nodes on the critical cycles; (2) from a cyclic frustum, a time-optimal schedule for the corresponding loop can be derived. A methodology for integrating resource limitations into our model. We demonstrate how a timed Petrinet model known as an SDSP-SCP-PN can be constructed to model the execution of an SDSP on dataflow architectures having a single clean execution pipeline (SCP). The mechanism of detecting cyclic frustums has been implemented in a prototype compiler testbed. Simulation results on a number of Livermore loops, both with and without loop-carried dependences, have demonstrated that the cyclic frustum for both the SDSP-PN and the SDSP-SCP-PN can be determined at compile-time in O(n)time, where n is the number of instructions in the loop body. This demonstrates the feasibility of determining the cyclic frustum at compile time. We also describe how to determine the minimum amount of storage needed by a loop to maintain its optimal computation rate.