It is 2021 and my two year-old NAS boxes are undergoing extended maintenance and upgrades. In this article I will also share experience that I have gained after managing these machines for 2 years and trying to provide uninterrupted service to my users.
1. Throwing money at SATA cables isn’t always S.M.A.R.T.
Pun intended. In past two years I have regularly been using these cables in 3 of my builds:
The first model is super-slim SATA cable made by AKASA. These cables bend easily, take a lot less space and I justified their increased price by improving case airflow significantly. They are also black, which looks better. Out of about 30 cables I had in use, about 10 failed over the course of two years. These cables fail regularly, unexpectedly and in worst style imaginable – instead of simply stopping working they develop intermittent read/write issues and it is up to the error detection mechanism of the operating system or managing software to decide to degrade the drive and inform the administrator of a problem. These issues are normally reported in S.M.A.R.T. as CRC errors and in HDTune the CRC error is accompanied by description along the lines of: “CRC errors are usually indicative of failing cables.” During investigation and reseat I would bump into neighboring drive, which would regularly result in it’s cable stopping working. Having two degraded disks in an array that is only protected against single drive failure is not good. The ease with which these cables bend is not a benefit. During bending these cables it is more likely than not that the cable has been damaged. These slim SATA cables are absolutely terrible and I do not recommend using them at all.
The other pictured product is AliExpress-sourced SATA cable that allows space-saving and airflow optimization through putting multiple SATA cables together and only splitting the ends. These cables regularly arrive DOA or develop same intermittent read/write issues as previously mentioned AKASA slim cables. It is simply not worth it to deal with these problems when managing servers – my advice is to never go for these weird, novelty or designer products into a build that serves multiple people and has clear purpose in it’s design.
I will always use regular, fat-isolation SATA cables moving forward. Those rarely ever failed on me in the past and their management can be done well enough to look decent and work well in terms of airflow.
2. Do not over-build the NAS
My primary NAS has nine 10TB drives. It is running Storage Spaces in PARITY config with 1 column set for parity. It is currently storing about 20TB of data that is being used daily by multitude of users. This suggests the architecture of the build is overly complex. First off, if any of the drives fail, the entire array will lose redundancy and issue with any of the remaining 8 drives will cause all of the data to get lost. Given all drives are made by same manufacturer and bought at bulk, the chance of more than 1 disk failing in short span cannot be ruled out. Additionally, there’s no point in having 90TB of capacity in a box that barely uses 30TB incl. parity. This will be demonstrated below – I have finished maintenance on my secondary NAS where I have made changes to obey this single rule. Whatever is not absolutely required for the projected use of the machine shall not be present.
Unnecessary parts in a NAS still have to be managed and should they encounter failure or issue, can cause service interruption that could have been avoided.
Secondary NAS Upgrades
My secondary NAS had the following configuration prior to the upgrade:
- Intel Pentium G5400 w/ stock cooler
- 2x 8GB DDR4 non-ECC RAM in Dual Channel
- 1x 500GB SATA SSD
- 5x 6TB WD RED EFRX
- 4x 3TB WD RED EFRX
- 9x AKASA slim SATA cables
- 1x Marvell 88SE9215 chip SATA III 4 ports PCI-Express controller from AliExpress
- Various SATA power extension cables
In above list, you can see the Marvell SATA expansion card, a cheap way to plug more SATA drives into a consumer board. With 10 total SATA devices, and only 6 total SATA ports, the board needed the expansion card to provide enough connections. AliExpress provides a lot of offers on SATA cards and one needs to exercise caution. Some chips have lower amount of lanes than they have ports, so some ports share bandwidth. In other cases, those chips stand as middlemen between board and drive, which can cause issues with either OS or raid management tools which often rely on being connected directly to disk. Optional RAID and other smart features are also counterproductive, since they are not needed and increase complexity, which in turn increases chances for failure. Above mentioned chip is optimal – 4 lanes, 4 ports, no smart features, direct SATA passthrough. That being said, this is not enterprise-grade controller and die-hard data hoarders would never touch this. I have had many of these cards in use in past 2 years and have not yet experienced issue with single one. I still trust this product.
The upgrade. I have finally had it with Microsoft’s unbearable determination to install Candy Crush on Windows 10 and install updates w/ automatic and unavoidable restarts, all this against user’s will, without their knowledge or consent. Microsoft Storage Spaces and Microsoft operating system were removed from this box altogether in favor of TrueNAS CORE 12.0. I have come of age where Nix experience is unavoidable at this point and the sooner I learn how to operate and reasonably manage the thing, the better. Following changes have been carried out:
- Removed four 3TB drives. TrueNAS and ZFS do not like mixed capacity disks in single pool and I was not recommended to do that. I had no requirements for additional, standalone pool in this machine, so I decided to stick to rule 2 mentioned above and simply removed these disks.
- Thanks to removing 4 drives, I was able to remove AliExpress Marvell SATA expansion card as it was no longer needed.
- All remaining AKASA SATA cables replaced with regular PremiumCord SATA cables and cables were tied and organized.
- Samsung 850 EVO SATA SSD removed in favor of 500GB WD Blue M.2 NVMe SSDs.
- Stock CPU cooler replaced with Noctua NH-U12S chromax.black. This helped airflow considerably. Push-pull fans in front compartment of Node 804 case no longer suffer from stock CPU cooler breaking the flow and trying to pull some air onto itself. Instead, Noctua cooler is using same airflow direction and accelerates the air on the way to the exhaust fan instead.
- RAM expanded from 2x 8GB to 4x 8GB Dual Channel non-ECC DDR4 memory to satisfy ZFS memory requirements better and give some head room for running jails and VMs in TrueNAS.
- Remaining five 6TB drives configured as 4-disk vdev with raidz2 and 1 drive as hotspare vdev. This means the data is considerably more secure with protection against failure of any 2 drives at once, with 1 drive being ready at all times to begin resilvering and vdev rebuild, should any drive fail.
- A GTX 960 4G inserted in order to test TrueNAS transcoding performance with dedicated GPU accelerator. Since I have been running this GPU in a dedicated Windows PLEX server, I can compare the performance between the two.
Two years ago, I would frown about having 30TB of space on disks and only having 11TB of usable space to occupy. Today, many sleepless nights, recoveries and user calls later, I put no price on peace of mind and data protection against hw failures. This box can satisfy it’s workload requirements with 11TB of space just fine. WD REDs are reporting good health and this unit is ready for setup and deployment. I’m still struggling a little bit with some TrueNAS settings and features. As with any free and open source software, research, fixing and configuration are requirements for success. I am at a point where this inconvenience is more acceptable, than trusting Microsoft and it’s trashware distribution platform with my precious data and workloads.
Secondary NAS has been running TrueNAS for over a week now, it’s working as expected. It hasn’t restarted once, doesn’t complain about mandatory restart schedule, doesn’t restart without permission only to get stuck on the login screen with services not running and most importantly, has not installed anything I haven’t explicitly asked for. It is unfortunate that the bar is set this low for operating systems these days. It’s uptime has already exceeded that of the primary NAS which wasn’t able to obtain more than 6 days of uptime in over 2 years.
Primary NAS will also undergo extended maintenance in following weeks. Apart from reducing number of disks, replacing and managing SATA cables, expanding RAM and introducing caching SSD units, it’s GPU for Plex Transcoding will be upgraded from GTX 960 4G to RTX 2060 6G, which should provide massive increase in transcoding performance of 4K streams. I am undecided yet, if primary NAS will remain on Windows for the time being, strictly for the sake of my competency. I am not liking the idea of moving to TrueNAS with user-facing machine and primary work/media server until I am confident I can manage the machine competently. Time will tell.