Fractal Forward is the name of my actual Chess Engine. It’s a strange beast, it don’t do thing the way it used to be, and that’s interesting in many ways.
Forward, as the preceding Chess Engine “Fast Forward”, going deep into the tree, that’s totally classical, and you will find that on any major actual Chess Engine, nothing to worry about, but at actual GPU speed, it go deeper and deeper. But Fast Forward is dumber than the other engines. It see more, but don’t understand. “The sage point to the moon and the idiot look at the finger” ?
Fractal because it consider the tree as a dynamic tree, and moreover it’s understanding of the tree itself as dynamic, changing over time, over iterations, with each in-depth iteration being identical as it’s tree iteration. What does that mean in practice?
Any major Chess Engine actually evaluate positions and intermediate nodes with different algorithms than for the tree by itself, position evaluation, quiescence, quick exchange evaluation, the need to evaluate some nodes more in-depth, each of this is a different algorithm, while clearly trying to do the same thing: see deeper on the tree without parsing it. If Quick Exchange Evaluation works well, or the others, why not using it at the root of the tree? Because they don’t work at all, they just hide that we don’t have processing ressources to parse the tree and have a correct view of what’s happening, and they do marvel at this, at the cost of algorithms and implementation complexity. Something that translates badly on GPU!
On the other side, if we have a good algorithm to travel the tree, why not applying it recurrently on each node? Recurrence, with a deeper and deeper view? Exactly as we could view a sponge closer and closer, it’s an endless tasks, but what’s interesting is that if we have MORE processing power using GPU, and a simple effective algorithm, instead of implementing specific algorithms for differently characterized nodes of the tree, we could just throw the tree at it, and it will grow naturally with a simple and unique view, that is homogenous wether you are at the root or considering a 18-plies deep move.
Fast-Forward search will be a totally dumb search engine: brute-force fixed depth with negamax. It’s only purpose is to search the tree at a maximum speed, not to be a good Chess Engine (won’t be for sure).
I based my work on the work of Srdja, because some of the ideas in it’s implementation are great, it’s based on Glaurung and thus Magic Bitboards, and I found a way to avoid divergence as much as possible, limiting memory read-write and register usage.
The basic pinciple is to be as simple as possible. Negamax will be implemented, but that’s all for the first iteration of the project. Evaluation will be pure material evaluation at first, that will be done during deepening, not during leaf evaluation. So in fact there will be no leaf evaluation per-se on the first iteration.
The main difference with Srdja work is the move generation. Removing move ordering is mandatory for full speed, and thus there’s no need to construct a list of move, instead bbWork+bbMove (16 bytes) variables are enough to obtain the next move in any position, we just have to store them on each depth of the tree for each thread, Bitboard to represent the board as well a bonus/malus or material delta could lay in registers being updated back when descending the tree. It will enable us to have a maximum number of threads flying on the GPU, relying on L1 cache (anbd L2 cache) to store the bbWord+bbMove registers.
Symmetry = few divergence
To do a move or undo it are symmetrical operations in Fast-Forward. Essentially you take the next move using bbWork+bbMove, with 2 little point of divergence.
When you get the next move, you don’t update bbWork+bbMove, you keep them as-is, to be able to get it again (and update register this time) while descending a level of the tree, to undo it.
Having a move, the difference on Bitboard update is just a matter of swapping from to etc, that are fast operations, the slow 64bit logical operations could then be totally identical!
You may add that bonus/malus or material should be added when going deep, and substracted when going back to the root of the tree, no matter!
If your doMove() and unodMove() are essentially identical except a few intermediate instructions, bbWord+bbMove update on one side, some register swapping on the other, you may group these operations together on the same point of the code: we are folding the code to have every core (or each SIMD unit) to follow the same execution path wether the go deep on the tree, or go-back.
Parsing the search tree is essentially that, going forward on the next-level as fast as possible, up to a leaf, then using dynamic evaluation and bonus/malus, going back, eventually cuting a branch using negamax.
The target is to be able to parse a maximum number of node and leaf at a warp speed, and to have warp speed on GPGPU you should limit the number of registers, memory usage and avoid divergence as much as possible.
Because the Engine will never have backward operations, it will always proceed with the same operation, that is search for next move then execute it, only a few instructions will be conditionals to reverse their behaviors to enable to go back near the root of the tree.