In a copy-on-write system, new blocks constantly appear, and old ones are not always suitable for deletion. Often there are situations when the old block is not completely written; some blocks are not needed, “holes” (empty spaces of small size) have appeared, and so on.
When there is not enough space, another allocation mode is turned on – it is more expensive in terms of performance and leads to an increase in fragmentation; in this mode, it simply looks for the first suitable place. In the worst case, when there is no longer a non-breaking segment under the block, it will be broken and written in pieces (which also does not benefit performance).
By default, ~200 meta-slabs are created per Vdev. If something has changed, you need to write down the metadata for these 200 meta-slabs every time for each Vdev. This is written constantly on each Vdev, but before the release, a patch that writes information about changes in meta-slabs in the form of a log to one of the Vdevs and then regularly applies this log. This is somewhat similar to the WAL log of a database. Accordingly, the load on writing information about metal slabs to disk is reduced.
Of course, when the entire space is filled, a problem arises. Still, any copy-on-write file system (and the traditional one, too) allocates a certain percentage of the reserved space for work in this situation, without which there is no way with dynamic allocation.
By default, reading data in ZFS is almost always random, but we can turn random writes into almost sequential ones since we write to a new location each time. Any copy-on-write system, including ZFS, will be an excellent solution if you need a storage system for writing and rare reading. Data is written in groups of transactions (txg, short for transaction groups); you can aggregate information within this group.
There is a feature here: there is Write Throttling – we can use an unlimited amount of RAM to prepare a txg group and, due to this, experience sharp jumps in recording, buffering everything into RAM. Naturally, we are talking about asynchronous recording when we can afford it. Then you can sequentially and very efficiently put the data on the disk.
Suppose synchronous writing and its integrity are not important, for example. If you do not have a large and expensive PostgreSQL but a server for one user, then synchronous writing can be disabled with one setting; it will become equal to asynchronous (zfs set sync=disabled).
Thus, having collected a pool of HDDs, you can use them as cheap SSDs in terms of IOPS. How much IOPS RAM will give, so much will be. At the same time, ZFS ensures integrity in any case – in the event of a power loss, a rollback to the last transaction will occur, and everything will be fine. In the worst case, we lose the last few seconds of recording as long as we have configured the txg_timeout parameter; by default, it is up to 5 seconds.
(before – because there is still a set limit on the buffer size, the data will be written earlier).
The Dependence Of The Speed Of ZFS On The Number Of Disks
One block of data always comes to one Vdev. If we divide the file into small blocks of 128 KB each, then such a block will be on one Vdev. Next, we back up the data using Mirror or something else. Having stuffed the pool with hundreds of Vdevs, we will write to only one of them in one thread.
If you give a multi-threaded load, for example, 1,000 clients, they can use many Vdevs in parallel at once, and the load will be distributed. When adding disks, we will not get a completely linear growth, but the parallel load will be effectively spread over the Vdevs.
Handling Write Requests
When there are many Vdevs and write requests, they are distributed, and there is scheduling and prioritization of requests. You can see which Vdev is being loaded, which block of data, and with what delay. There is a zpool iostat command; it has a bunch of keys for viewing various statistics.
ZFS considers which Vdev was overloaded, where the load was less, and what the media access delay was. If a disk begins to die, it has high latency, then the system will eventually respond to this, for example, by removing it from use. If we use Mirror, ZFS tries to distribute the load and read different blocks from both Vdevs in parallel.
Also Read: Top 10 Benefits Of Indoor Navigation