I recently visited my friend MJ in Taipei City, Tiawan, and we made our way to Guanghua Digital Plaza, an awesome place for new and cheap electronics. I ended up walking out with 18kg of gear for a crypto-currency mining platform which I figured would be easy to convert to a neural network server.
Initial Setup
For about $500 I came home with the beginning of a GPU server:
- Segotep case to hold 6 GPUs
- 3 fans
- Super Flower Computer Inc 850 watt power supply w/ cables
- ASUS ROG Strix H207F Gaming Mother board, with 6 PCIEX slots
- 8 GB Ram
- Pentium CPU
I'm not a hardware guy and its been literally years since I've looked at a BIOS but not having an OS, the Strix BIOS pops right up. It looks amazing with lots of great information, including fan speed and CPU Temperature.
Once back in Tokyo, I bought at 6TB HD for about $180. It turns out the drive mount that came with the case only had two holes that matched the drive. I would have preferred four screws holding the drive, but two will do. I then installed a Ubuntu 17.03. Up and running!
Still not sure how I'm going to run this inside our apartment. At 78 db it is pretty loud.
Adding a GPU
I wasn't sure how the GPU was supposed to connect to the mother board as the PCI 16 slots on the mother board are far from the card mounts on the case. After some googling, it was apparent I needed a riser, something that will connect the PCIEX slot to the PCI16 connector of the GPU. Searching I saw that Tsukumo eX in the Akihabara district of Tokyo might have a riser.
Luckily Tats Beniya works there. He is a fluent English speaker and a super knowledgeable hardware guru. With his help I obtain CUDA 8 compatible GPU and riser. It turns out the Nvidia doesn't sell any cards in Japan. So I bought a locally produced GPU and riser for 18700 yen. Tats also suggested that I spend an extra 1000 yen that would allow me to exchange the GPU for another GPU greater value for up to one month later. I'm glad I did that as I started to have problems later on.
I installed the riser and GPU. I could see right away that there might be some issues with the riser separating from the GPU as there is space between the bottom of the GPU and the case. I could just imagine that over time the riser would become loose and cause problems. The riser uses a USB 3.0 to connect to the PCIEX cable.
After I installed the GPU with riser, about every third reboot I would see a massive amount of PCI errors appear on the screen. It would prevent booting into the OS. Googling the problem a lot of people said it could be related to riser.
Powering down, removing the USB from the PCIEX, powering up, the powering down, attaching the USB, would only fix the problem occasionally. Sometimes it would work for days only for the problem to emerge its ugly head.
Looking through the BIOS I saw this message:
For best performance of your graphics card(s), use the following configuration according to the number of graphics card(s) you want to install:
To use 1 graphics card, we recommend you to install the graphics card on the PCIE_X16_1 slot.
To use 2 graphics cards, we recommend you to install the graphics cards onto the PCIE_X16_1 and PCIex16_2 slots.
So I wasn't sure if my PCI errors were being generated from:
- The riser card
- Something with the mother board (using the PCIEX1_1 slot and not the PCIEX16_1 slot)
- The GPU is faulty
New GPU
Back at Tsukumo eX and speaking with Tats who again was supper helpful.
He had nice solutions to my problems:
To use the PCIEX16_1 slot, the riser attachment card for the PCIEX_1 could be reversed and used PCIEX16_1.
To test if my riser was a problem, I bought a bit more expensive PCIEX16 to PCIEX16 ribbon. I figure as I would be installing up to five more GPUs it would be best to have someway of testing problems are with the risers or the GPUs.
I also traded in my GTX1050 for a GTX1060. I decided to get an ASUS GPU to be the same brand as motherboard manufacture (though Tats said it probably didn't matter). Even better is this GPU has a two fans that only turn on at 60 degrees Celsius. Thus by turning off my three fans on the case, and letting the GPU decide when to cool itself, I dramatically reduced the amount of sound the rig produces. It does mean that I leave the top off case so that the heat can dissipate easier. Nice to have some quiet finally.
Tats also pointed out that this GPU needs its own power supply in addition to the riser power supply. As he expected, my power supply had all the cables I needed.
Finally, as a solution to removing the gap between the bottom of the riser board, Tats suggested adding screws to the bottom of the riser board to rest against the case. As he didn't have non-conducting screws in stock, he advised insulating them with electrical tape.
Altogether the new GPU and riser cost me 34,600 yen.
Alright! A quiet server, up and running without any errors. Now onto getting CUDA installed...