bcrypt has now been implemented for both GPUs and for FPGAs. See Bcrypt password cracking extremely slow? Not if you are using hundreds of FPGAs!.
The GPU implementation described there is just barely faster than the CPU implementation. But the FPGA implementation is much more cost-effective, and takes over an order of magnitude less power. But so far it only seems to run on discontinued FPGA boards.
In particular, first they compare two systems each costing on the order of a thousand dollars: a CPU (AMD EPYC 7401P - 24 core, 3.0 GHz), with a high-end GPU (Nvida RTX-2080Ti). Both are pretty slow for bcrypt using work factor 12 (2^12 hashes), e.g. 197 vs 219 hashes/sec.
But the FPGA implementation (using open source code from Jack the Ripper) can do about a thousand work-factor-12 hashes/sec on a single ZTEX 1.15y board, using just 3-5% of the power.