Lately, I've been doing thought experiment and tonight has been no exception. I set out to investigate actual supercomputers and became aware of China's tianhe-2 system which is the current world record speed holder at 33.86 petaflops (source: https://en.wikipedia.org/wiki/Tianhe-2 and major news outlets from a google search).
I also searched for scientific publications by the professor Henry Markram for the reason that he is investigating how a number of minicolumns (100) work in the mice brain on his 8192 core IBM Blue Gene/Q system which is able to crunch number to the tune of 22.4 teraflops (source: http://www.nature.com/nrn/journal/v7/n2/fig_tab/nrn1848_F1.html). Prof Henry Markram has been successful in modelling the action of 100 columns and about 1 000 000 neurons on this computer.
The men brain host 20 to 23 billions neurons and the woman brain host 19 billions neurons on average (source: http://www.ncbi.nlm.nih.gov/pubmed/9215725) so in order to simulate it, we need a machine able to calculate at the rate of 1 exaflops.
Such a machine would be feasible using 4096 cores Parallella chips. These chips which are not yet designed (a 1024 core chip is in design at the moment) offer 6.4 teraflops of calculation per second (the 1024 core chip offer 1.6 teraflops) and are very competitive with graphics cards because, the current champion graphic system found in the Apple Mac Pro computer offer 7 teraflops in the form of 2 AMD Fire GL cards.
With the Parallella processor, we need 163840 of them to be able to build an exaflops machine and it's a huge undertaking in itself but fortunately, the intellectual property surrounding such chips are available on github so if I wanted, I could take such design along with the design of a 64 bit Arm Core from Altera (http://www.altera.com/devices/fpga/stratix-fpgas/stratix10/stx10-index.jsp?ln=devices_processor&l3=SoCs&l4=Stratix%2010%20SoC) and have 163840 custom chip made by Altera combining a 4096 cores Parallella circuit with a 4 core Arm 64 bit circuit in the same asic die and it's possible to put 4 of them in the average 1U server board.
About the power consumption, such boards are very efficient because I wouldn't need more than 1000W per server. I have input my system parameters on the Parallella calculator (source: http://www.adapteva.com/calculators/epiphany-performance-calculator/) and came out with a power consumption of 42 Megawatts per hour which is 2.5 times the 17 Megawatts of Tianhe-2 for 30 times the performance. The Parallella and the Arm cores are very much power efficient and used in mobile phones, thus the low power consumption.
The parameters used on the calculator are 1073741824 gigaflops, 0.8 gigahertz, 4096 cores, 4 epiphany chip per board and a 2.5 fudge factor for the power consumption of each processors to devise the system consumption.
The hardest part of such a system is designing how processors communicate with each others. Companies make big money from the interconnexion between these processors but they have to pay for a very large team of professional engineers to design the interconnect. In the case of the Tianhe-2, it has cost China about 500$ millions to design the supercomputer, part of the problem has been the reliance on powerful intel chips but also, the 1300 engineers working on the system but they designed more than the interconnect and there are some custom chips included.
Regarding the Parallella architecture, it's the work of 2 engineers and a number of interns and if I had the money to design such a supercomputer, I would enlist the help of a computer engineering professor who would submit the interconnect design problem to its research team.
Getting the chip (Parallella + Arm on the same die) would cost less than 10 millions and we have the power infrastructure needed to power such a system here in Quebec so I figure 20 millions for the system except the building infrastructure needed to host the system.
In retrospect, the human brain is a particularly efficient machine because it takes 42 Megawatts to emulate it.