Optimize XMAD instruction sequence into a single 32-bit multiply when possible
Unmerged branch/PR from gdkchan. I expect that optimizing the XMAD instruction sequence increases performance in some situations, but I haven't tested it.
Unmerged branch/PR from gdkchan. I expect that optimizing the XMAD instruction sequence increases performance in some situations, but I haven't tested it.