The design of many-core neuromorphic hardware is becoming increasingly complex as these systems are now expected to execute large machine-learning models. A predictable design flow is needed to guarantee real-time performance such as latency and throughput without significantly increasing the buffer requirement of computing cores. Synchronous Data Flow Graphs (SDFGs) have been previously used for predictable mapping of streaming applications to multiprocessor systems. We propose an SDFG-based design flow to map spiking neural networks (SNNs) to many-core neuromorphic hardware with the objective of exploring the tradeoff between throughput and buffer-size requirements. The proposed design flow integrates an iterative partitioning approach based on Kernighan-Lin graph partitioning heuristic to create SNN clusters such that each cluster can be mapped to a core of the hardware. The partitioning approach minimizes inter-cluster spike communication, which improves latency on the shared interconnect of the hardware. Next, the design flow maps clusters to cores using Particle Swarm Optimization (PSO), an evolutionary algorithm, while exploring the design space of throughput and buffer size. Pareto-optimal mappings are retained from the design flow, allowing system designers to select a Pareto mapping that satisfies throughput and buffer-size requirements of the design. We evaluated the developed design flow using five large-scale convolutional neural network (CNN) models. Results demonstrate 63% higher maximum throughput and 10% lower buffer-size requirement compared to state-of-the-art dataflow-based mapping solutions.