Despite widespread adoption, the WireGuard tunneling mechanism available in the Linux kernel is unable to provide high-speed connectivity in a site-to-site setup when routing through a standard single-tunnel configuration. In fact, its capability to scale with the number of available CPU cores is limited, even in the presence of a software architecture that is intrinsically parallel. In this paper we investigate the multi-core scalability properties of WireGuard, identifying current limitations and proposing an improved design that aids effective scaling, reaching a near-linear throughput increase depending on the number of involved CPU cores. Furthermore, we propose a multi-tunnel approach to parallelize stages of the WireGuard pipeline limited to a single core per tunnel and propose a modified architecture tailored to multi-tunnel support. This architecture shows an almost 2x performance improvement over a multi-tunnel deployment of vanilla WireGuard, and supports 18x times the throughput of a single tunnel setup on our machines.