Fractal Forward is the name of my actual Chess Engine. It’s a strange beast, it don’t do thing the way it used to be, and that’s interesting in many ways.
Forward, as the preceding Chess Engine “Fast Forward”, going deep into the tree, that’s totally classical, and you will find that on any major actual Chess Engine, nothing to worry about, but at actual GPU speed, it go deeper and deeper. But Fast Forward is dumber than the other engines. It see more, but don’t understand. “The sage point to the moon and the idiot look at the finger” ?
Fractal because it consider the tree as a dynamic tree, and moreover it’s understanding of the tree itself as dynamic, changing over time, over iterations, with each in-depth iteration being identical as it’s tree iteration. What does that mean in practice?
Any major Chess Engine actually evaluate positions and intermediate nodes with different algorithms than for the tree by itself, position evaluation, quiescence, quick exchange evaluation, the need to evaluate some nodes more in-depth, each of this is a different algorithm, while clearly trying to do the same thing: see deeper on the tree without parsing it. If Quick Exchange Evaluation works well, or the others, why not using it at the root of the tree? Because they don’t work at all, they just hide that we don’t have processing ressources to parse the tree and have a correct view of what’s happening, and they do marvel at this, at the cost of algorithms and implementation complexity. Something that translates badly on GPU!
On the other side, if we have a good algorithm to travel the tree, why not applying it recurrently on each node? Recurrence, with a deeper and deeper view? Exactly as we could view a sponge closer and closer, it’s an endless tasks, but what’s interesting is that if we have MORE processing power using GPU, and a simple effective algorithm, instead of implementing specific algorithms for differently characterized nodes of the tree, we could just throw the tree at it, and it will grow naturally with a simple and unique view, that is homogenous wether you are at the root or considering a 18-plies deep move.
I was spied by local and foreign agencies since mid-2000′s, I had to strengthen security on my network, and have to isolate some of my development computers. Given the recent informations Edward Snowden gives us all, I might just have loss my time, since NSA and other agencies have incredibly efficient spying tools.
I am going back to OpenCL development, and don’t want to have them given to foreign companies, or even get examined by foreign agencies…
Through nVidia K20x (and now K40), I have received proposals to run my OpenCL developments on supercomputers based on USA, and paid by US agencies or US army. They never proposed me computers running on my country (Canada). If I would have accepted that to put my (virtual) hands on K20x or K40, I would have offered my code to any US agency, with a possible “leak” to US companies, that may use it, or even patent it or their own.
They are working hard to have insight into my personal work. They might already be able to spy on my main computers. I am clearly thinking about creating a strongly protected computer, non-networked, bought used, without sound I/O, to work on my personal projects for 2014. My code is MY code, not their. Happy 2014 NSA year!
I am back on OpenCL development, having worked for a big Canadian media company in 2013, I will have time on 2014 to work on OpenCL, and I think it’s time for OpenCL to be mainstream!
The signal is the commitment of Apple to OpenCL technology, with the new Mac Pro, and it’s dual GPU. Maybe these 2 GPU are overstated or overrated on many websites, with performance-level ranging from (actual) Radeon R9 280X ($300 street price) to Radeon HD 7990 on customized Mac Pro. This is not expected level of performance expected by a $4000+ computer, especially with non-pro hardware (no ECC for example).
The main point is software. The new Apple’s Final Cut know hot to use OpenCL, across at least 2 GPU, and this is a big news, as it is much more complex to handle multiple OpenCL devices than just one, synchronize them and use them at their bests. Apple is making a strong point with Final Cut to show how OpenCL and multiple GPU could unload CPU, and may offers unprecedented performance-level for software that make good use of OpenCL.
At the same time, Intel offering for their Haswell integrated GPU is mature, with impressive hardware, Iris Pro HD5200 being an incredible iGPU for small dataset, and solid OpenCL drivers. Yes AMD is offering good iGPU, but we are all awaiting them to be built on GCN 1.1, and having same incredible memory/cache bandwidth (sorry AMD you seems to lag behind on iGPU).
nVidia is still playing it’s game with Kepler, that is all but impressive on real-world GENERAL PURPOSE GPU usages, but may unleash a new architecture in 2014 that may put them back into the game. CUDA is dead outside HPC world, OpenCL is leading the GPGPU world, that’s what I was expecting, nVidia must come back with strong OpenCL development tools (based on their current impressive CUDA tools), to re-establish itself as a leader in GPGPU for all. I remember my GeForce 8800GTS 320MB, the first generation of GPGPU, and an impressive performer for it’s time: a game changer.
I wish you all an awesome 2014, I know that for me and OpenCL developers, there will be incredible opportunities
I was in vacation in the New Orleans two weeks ago, and I had the chance to visit the Cemetery Saint-Louis #1, where you could find the Voodoo mastress Marie Laveau, but also the chess prodigy and unofficial world chess champion Paul Morphy.
I appreciated the chess pieces that were offered to his memory. This cemetery is part of the history of New orleans, you must absolutely visit it, during the day, with a guide!
Intel stated that it’s Xeon Phi would be more efficient than GPGPU for HPC. That Xeon Phi would have better linpack/watt and also beter linpack/raw TFlop ratio.
The first 2 supercomputers on the Top500 june’13 list are using Intel Xeon Phi and nVidia K20x respectively, so we could compare their metrics, and especially their efficiency.
The first point is Linpack TFlop/Watt, Xeon Phi delivers 33862 Linpack Tflops with 17808 KW (1.90 Tflop/KW), when K20x delivers 17590 Linpack Tflops with 8209 KW (2.14 Tflop/KW), the K20x architectured supercomputer is 12.7% more power efficient!
The second point is the ratio of RAW processing power compared to Linpack, to evaluate effectivness of an architecture, knowing that GPU aren’t as efficient as CPU, and that Intel claimed that with it’s x86 core, the Xeon Phi will be much more efficient than nVidia K20x, providing real-world performances closer than it’s peak theorical performance.
Alas on this second point, as I expected, Xeon Phi lag behind K20x another time, with 61.7% efficiency, where nVidia GPU obtains 64.9%, proving that the Intel architecture is not mature enough and still needs iterations.
Clearly, the K20x consume less power, and it’s architecture (hardware and software) is more mature and efficient than Intel Xeon Phi!