This workflow uses the long 1st pass (Text-to-Image) with advanced SDXL CLIP encoder. In addition to T2I pass it uses an SD Ultimate Upscale with 2x2 tile set for HiRes Fix.
As a result, it fixes most issues with fingers and eyes without a need in special YOLO-based or VL-based nodes to generate a mask for face and hands.
This workflow also uses some minor / secondary nodes from other packages, but they can be replaced with Comfy.Core nodes, except for Anything Everywhere. I recommend installing the Anything nodes to reduce the connection spaghetti.