In my previous blog "Build an All-Flash Share Nothing High Available Scale-Out Storage for Private Cloud" . I mentioned I built a All-Flash Share Nothing Storage based on Storage Space Direct (S2D) for Ignite Demo. Each of those physical nodes has 1 Mellanox ConnectX-3 10Gb NIC, which is RDMA capable. As we know we recommend use 10Gb or 40Gb RDMA NIC in production storage space direct deployment. Then I guess most of the people might have this question in mind. How bad if I don't use RDMA but a normal 10Gb NIC?
I perform some tests in my lab environment. Hopefully it could help you address the above question.
Test Environment
Hardware:
- WOSS-H2-10 (DELL 730xd, E5-2630-v3 X2, 128GB Memory, 2TB SATA HDD X2 RAID1, 1 Micron P420m 1400GB PCIe SSD, 2 Mellanox ConnectX-3 56Gb NIC + 2 Mellanox ConnectX-3 10Gb NIC)
- WOSS-H2-12 (DELL 730xd, E5-2630-v3 X2, 128GB Memory, 2TB SATA HDD X2 RAID1, 1 Micron P420m 1400GB PCIe SSD, 2 Mellanox ConnectX-3 56Gb NIC + 2 Mellanox ConnectX-3 10Gb NIC)
- WOSS-H2-14 (DELL 730xd, E5-2630-v3 X2, 128GB Memory, 2TB SATA HDD X2 RAID1, 1 Micron P420m 1400GB PCIe SSD, 2 Mellanox ConnectX-3 56Gb NIC + 2 Mellanox ConnectX-3 10Gb NIC)
Software:
- Windows Server 2016 Technical Preview 4 Datacenter Edition
- 3 PCIe SSD were in the same storage pool. The virtual disk on top of it was configured as two-way mirror, the column number is 1.
Test case:
- Test target: The whole virtual disk "P420mVD10".
- Test Duration: 30min
- IO Size: 4KB
- Read/Write Ratio: 100/0
- Disable all the hardware and software cache
- 16 Threads
- Queue Depth = 10
Test Steps:
- Disable RDMA by running the following cmdlets on all of the three nodes.
- Set-Netoffloadglobalsetting -Networkdirect Disabled
- Update-Smbmultichannelconnection
- Install FIO and run the following command on WOSS-H2-10
- fio.exe --name=4kreadtest --rw=randrw --direct=1 --iodepth=10 --blocksize=4k --ioengine=windowsaio --filename=\\.\PhysicalDrive10 --numjobs=16 --refill_buffers --norandommap --randrepeat=0 --rwmixread=100 --group_reporting --runtime=1805 --thread
- Enable RDMA by running the following cmdlets on all of the three nodes.
- Set-Netoffloadglobalsetting -Networkdirect Enabled
- Update-Smbmultichannelconnection
- Run the following command on WOSS-H2-10
- fio.exe --name=4kreadtest --rw=randrw --direct=1 --iodepth=10 --blocksize=4k --ioengine=windowsaio --filename=\\.\PhysicalDrive10 --numjobs=16 --refill_buffers --norandommap --randrepeat=0 --rwmixread=100 --group_reporting --runtime=1805 --thread
Test Results:
Compare IOPS on Cluster Shared Volume w/ RDMA and w/o RDMA, RDMA can increase IOPS 22.43% and decrease average latency around 18.32%.
If that still doesn't sound like a huge improvement to you, when we look at the consistency of the performance, it shows RDMA can decrease standard deviation of latency around 40%.
That means in terms of the consistency of performance, RDMA is much better than non-RDMA.
For more information, please check the attached the outputs from the above two test cases (including the fio command outputs and performance logs).