The Hazelcast clustering components are quite sensitive to network latency and perform best between nodes that are on the same LAN (or high-speed backbone). If you want to distribute your servers among multiple sites, your best bet will be to set up separate XMPP domains (or subdomains) and federate them via the S2S protocol. You can then use clustering within each site/domain to provide scaling and redundancy. I have done this with good results in multiple deployments.
If you really want to tune Hazelcast to work over a WAN you might be able to find some additional info in their (HZ) documentation, but several folks have reported that a single cluster does not perform well across multiple sites.
Hope that helps.