Introduce byte `.B' union type to `union uint512_u'.
Change `CTX.buffer' type from `unsigned char' to `union uint512_u'.
Change `data' argument of `stage2()' to `union uint512_u *'.
Change `g()' arguments to `union uint512_u *' with `RESTRICT'
allowing compiler to optimize more.