daos-stack · JoeOster · Apr 25, 2022 · Apr 29, 2022 · Apr 29, 2022 · Apr 29, 2022
@@ -5,10 +5,10 @@ high bandwidth and high IOPS storage containers to applications and enables
 next-generation data-centric workflows combining simulation, data analytics,
 and machine learning.
 
-Unlike the traditional storage stacks that were primarily designed for
+Unlike the traditional storage stacks that are primarily designed for
 rotating media, DAOS is architected from the ground up to exploit new
 NVM technologies and is extremely lightweight since it operates
-End-to-End (E2E) in user space with full OS bypass. DAOS offers a shift
+End-to-End (E2E) in user space with complete OS bypass. DAOS offers a shift
 away from an I/O model designed for block-based and high-latency storage
 to one that inherently supports fine-grained data access and unlocks the
 performance of the next-generation storage technologies.
@@ -30,10 +30,10 @@ directories over the native DAOS API is also available.
 
 DAOS I/O operations are logged and then inserted into a persistent index
 maintained in SCM. Each I/O is tagged with a particular timestamp called
-epoch and is associated with a particular version of the dataset. No
+epoch and associated with a specific dataset version. No
 read-modify-write operations are performed internally. Write operations
 are non-destructive and not sensitive to alignment. Upon read request,
-the DAOS service walks through the persistent index and creates a
+the DAOS service walks through the persistent index. It creates a
 complex scatter-gather Remote Direct Memory Access (RDMA) descriptor to
 reconstruct the data at the requested version directly in the buffer
 provided by the application.
@@ -43,18 +43,18 @@ DAOS service that manages the persistent index via direct load/store.
 Depending on the I/O characteristics, the DAOS service can decide to
 store the I/O in either SCM or NVMe storage. As represented in Figure
 2-1, latency-sensitive I/Os, like application metadata and byte-granular
-data, will typically be stored in the former, whereas checkpoints and
+data will typically be stored in the former, whereas checkpoints and
 bulk data will be stored in the latter. This approach allows DAOS to
 deliver the raw NVMe bandwidth for bulk data by streaming the data to
 NVMe storage and maintaining internal metadata index in SCM. The
 Persistent Memory Development Kit (PMDK) allows managing
-transactional access to SCM and the Storage Performance Development Kit
+transactional access to SCM, and the Storage Performance Development Kit
 (SPDK) enables user-space I/O to NVMe devices.
 
 ![](../admin/media/image1.png)
 Figure 2-1. DAOS Storage
 
-DAOS aims at delivering:
+DAOS aims to deliver:
 
 -   High throughput and IOPS at arbitrary alignment and size
 
@@ -96,34 +96,35 @@ DAOS aims at delivering:
 ## DAOS System
 
 A data center may have hundreds of thousands of compute instances
-interconnected via a scalable high-performance network, where all, or a
-subset of the instances called storage nodes, have direct access to NVM
+interconnected via a scalable, high-performance network, where all, or 
+a subset of the instances called storage nodes have direct access to NVM
 storage. A DAOS installation involves several components that can be
 either collocated or distributed.
 
-A DAOS *system* is identified by a system name, and consists of a set of
+A DAOS *system* is identified by a system name and consists of a set of
 DAOS *storage nodes* connected to the same network. The DAOS storage nodes
 run one DAOS *server* instance per node, which in turn starts one
 DAOS *Engine* process per physical socket. Membership of the DAOS
-servers is recorded into the system map, that assigns a unique integer
+servers is recorded into the system map, which assigns a unique integer
 *rank* to each *Engine* process. Two different DAOS systems comprise
-two disjoint sets of DAOS servers, and do not coordinate with each other.
+two disjoint sets of DAOS servers and do not coordinate.
 
 The DAOS *server* is a multi-tenant daemon running on a Linux instance
 (either natively on the physical node or in a VM or container) of each
 *storage node*. Its *Engine* sub-processes export the locally-attached
 SCM and NVM storage through the network. It listens to a management port
-(addressed by an IP address and a TCP port number), plus one or more fabric
+(addressed by an IP address and a TCP port number) plus one or more fabric
 endpoints (addressed by network URIs).
+
 The DAOS server is configured through a YAML file in /etc/daos,
 including the configuration of its Engine sub-processes.
 The DAOS server startup can be integrated with different daemon management or
-orchestration frameworks (for example a systemd script, a Kubernetes service,
+orchestration frameworks (for example, a systemd script, a Kubernetes service,
 or even via a parallel launcher like pdsh or srun).
 
 Inside a DAOS Engine, the storage is statically partitioned across
 multiple *targets* to optimize concurrency. To avoid contention, each
-target has its private storage, its own pool of service threads, and its
+target has its private storage, its pool of service threads, and its
 dedicated network context that can be directly addressed over the fabric
 independently of the other targets hosted on the same storage node.
 
@@ -133,24 +134,24 @@ independently of the other targets hosted on the same storage node.
 
 !!! note
     When mounting the PMem devices with the `dax` option,
-    the following warning will be logged in dmesg:
-    `EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk`
-    This warning can be safely ignored: It is issued because
-    DAX does not yet support the `reflink` filesystem feature,
+    the following warning is logged using dmesg:
+    `EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk.`
+    This warning can be safely ignored; it is issued because
+    DAX does not yet support the `relink` filesystem feature,
     but DAOS does not use this feature.
 
 * When *N* targets per engine are configured,
   each target is using *1/N* of the capacity of the `fsdax` SCM capacity
   of that socket, independently of the other targets.
 
-* Each target is also using a fraction of the NVMe capacity of the NVMe
-  drives that are attached to this socket. For example, in an engine
+* Each target also uses a fraction of the NVMe capacity of the NVMe
+  drives attached to this socket. For example, in an engine
   with 4 NVMe disks and 16 targets, each target will manage 1/4 of
   a single NVMe disk.
 
 A target does not implement any internal data protection mechanism
 against storage media failure. As a result, a target is a single point
-of failure and the unit of fault.
+of failure and the fault unit.
 A dynamic state is associated with each target: Its state can be either
 "up and running", or "down and not available".
 
@@ -163,7 +164,7 @@ configurable, and depends on the underlying hardware (in particular,
 the number of SCM modules and the number of NVMe SSDs that are served
 by this engine instance). As a best practice, the number of targets
 of an engine should be an integer multiple of the number of NVMe drives
-that are served by this engine.
+this engine serves that.
 
 ## SDK and Tools
 
@@ -173,20 +174,20 @@ to administer a DAOS system and is intended for integration with
 vendor-specific storage management and open-source
 orchestration frameworks. The `dmg` CLI tool is built over the DAOS management
 API. On the other hand, the DAOS library (`libdaos`) implements the
-DAOS storage model. It is primarily targeted at application and I/O
+DAOS storage model. It primarily targets applications and I/O
 middleware developers who want to store datasets in a DAOS system. User
 utilities like the `daos` command are also built over the API to allow
 users to manage datasets from a CLI.
 
 Applications can access datasets stored in DAOS either directly through
-the native DAOS API, through an I/O middleware library (e.g. POSIX
-emulation, MPI-IO, HDF5) or through frameworks like Spark or TensorFlow
+the native DAOS API, through an I/O middleware library (e.g., POSIX
+emulation, MPI-IO, HDF5), or through frameworks like Spark or TensorFlow
 that have already been integrated with the native DAOS storage model.
 
 ## Agent
 
-The DAOS agent is a daemon residing on the client nodes that interacts
+The DAOS agent is a daemon residing on the client nodes interacting
 with the DAOS library to authenticate the application processes. It is a
 trusted entity that can sign the DAOS library credentials using
-certificates. The agent can support different authentication frameworks,
+certificates. The agent can support different authentication frameworks
 and uses a Unix Domain Socket to communicate with the DAOS library.
@@ -10,33 +10,34 @@ attempt to recover the corrupted data using data redundancy mechanisms
 ## End-to-end Data Integrity
 
 In simple terms, end-to-end means that the DAOS Client library will calculate a
-checksum for data that is being sent to the DAOS Server. The DAOS Server will
+checksum for data sent to the DAOS Server. The DAOS Server will
 store the checksum and return it upon data retrieval. Then the client verifies
-the data by calculating a new checksum and comparing to the checksum received
-from the server. There are variations on this approach depending on the type of
+the data by calculating a new checksum and comparing it to the checksum received
+from the server. There are variations in this approach depending on the type of
 data being protected, but the following diagram shows the basic checksum flow.
 ![Basic Checksum Flow](../graph/data_integrity/basic_checksum_flow.png)
 
 ## Configuring
 
 Data integrity is configured for each container.
 See [Storage Model](./storage.md) for more information about how data is
-organized in DAOS. See the Data Integrity in
+organized in DAOS. Also, see the Data Integrity in
 the [Container User Guide](../user/container.md#data-integrity) for details on
-how to setup a container with data integrity.
+setting up a container with data integrity.
 
 ## Keys and Value Objects
 
-Because DAOS is a key/value store, the data for both keys and values is
-protected, however, the approach is slightly different. For the two different
+Because DAOS is a key/value store, the data for both keys and values are
+protected; however, the approach is slightly different. In addition, for the two different
 value types, single and array, the approach is also slightly different.
 
 ### Keys
+
 On an update and fetch, the client calculates a checksum for the data used
 as the distribution and attribute keys and will send it to the server within the
 RPC. The server verifies the keys with the checksum.
 While enumerating keys, the server will calculate checksums for the keys and
-pack within the RPC message to the client. The client will verify the keys
+pack them within the RPC message to the client. Finally, the client will verify the keys
 received.
 
 !!! note
@@ -47,13 +48,14 @@ received.
     has reliable data integrity protection.
 
 ### Values
+
 On an update, the client will calculate a checksum for the data of the value and
 will send it to the server within the RPC. If "server verify" is enabled, the
-server will calculate a new checksum for the value and compare with the checksum
+server will calculate a new checksum for the value and compare it with the checksum
 received from the client to verify the integrity of the value. If the checksums
-don't match, then data corruption has occurred and an error is returned to the
-client indicating that the client should try the update again. Whether "server
-verify" is enabled or not, the server will store the checksum.
+don't match, then data corruption has occurred, and an error is returned to the
+client, indicating that the client should try the update again. Again, when the "server
+verifies" it is enabled, the server will store the checksum.
 See [VOS](https://github.com/daos-stack/daos/blob/release/2.2/src/vos/README.md)
 for more info about checksum management and storage in VOS.
 
@@ -62,40 +64,40 @@ values fetched so the client can verify the values received. If the checksums
 don't match, then the client will fetch from another replica if available in
 an attempt to get uncorrupted data.
 
-There are some slight variations to this approach for the two different types
+There are slight variations to this approach for the two different types
 of values. The following diagram illustrates a basic example.
  (See [Storage Model](storage.md) for more details about the single value
  and array value types)
 
 ![Basic Checksum Flow](../graph/data_integrity/basic_checksum_flow.png)
 
 #### Single Value
+
 A Single Value is an atomic value, meaning that writes to a single value will
 update the entire value and reads retrieve the entire value. Other DAOS features
 such as Erasure Codes might split a Single Value into multiple shards to be
-distributed among multiple storage nodes. Either the whole Single Value (if
+distributed among multiple storage nodes. The whole Single Value (if
 going to a single node) or each shard (if distributed) will have a checksum
 calculated, sent to the server, and stored on the server.
 
 Note that it is possible for a single value, or shard of a single value, to
-be smaller than the checksum derived from it. It is advised that if an
-application needs many small single values to use an Array Type instead.
+be smaller than the checksum derived from it. Therefore, if an
+application needs many small single values, it is advised to use an Array Type instead.
 
 #### Array Values
+
 Unlike Single Values, Array Values can be updated and fetched at any part of
-an array. In addition, updates to an array are versioned, so a fetch can include
+an array. In addition, updates to an array are versioned so that a fetch can include
 parts from multiple versions of the array. Each of these versioned parts of an
-array are called extents. The following diagrams illustrate a couple examples
+array is called extents. The following diagrams illustrate a couple of examples
 (also see [VOS Key Array Stores](https://github.com/daos-stack/daos/blob/release/2.2/src/vos/README.md#key-array-stores) for
 more information):
 
-
 A single extent update (blue line) from index 2-13. A fetched extent (orange
 line) from index 2-6. The fetch is only part of the original extent written.
 
 ![](../graph/data_integrity/array_example_1.png)
 
-
 Many extent updates and different epochs. A fetch from index 2-13 requires parts
 from each extent.
 
@@ -106,9 +108,9 @@ The nature of the array type requires that a more sophisticated approach to
 creating checksums is used. DAOS uses a "chunking" approach where each extent
 will be broken up into "chunks" with a predetermined "chunk size." Checksums
 will be derived from these chunks. Chunks are aligned with an absolute offset
-(starting at 0), not an I/O offset. The following diagram illustrates a chunk
-size configured to be 4 (units is arbitrary in this example). Though not all
-chunks have a full size of 4, an  absolute offset alignment is maintained.
+(starting at 0), not an I/O offset. For example, the following diagram illustrates a chunk
+size configured to be 4 (units are arbitrary in this example). Though not all
+chunks have a total size of 4, an absolute offset alignment is maintained.
 The gray boxes around the extents represent the chunks.
 
 ![](../graph/data_integrity/array_with_chunks.png)
@@ -118,21 +120,24 @@ See [Object Layer](https://github.com/daos-stack/daos/blob/release/2.2/src/objec
 for more details about the checksum process on object update and fetch)
 
 ## Checksum calculations
+
 The actual checksum calculations are done by the
  [isa-l](https://github.com/intel/isa-l)
 and [isa-l_crypto](https://github.com/intel/isa-l_crypto) libraries. However,
-these libraries are abstracted away from much of DAOS and a common checksum
+these libraries are abstracted away from much of DAOS, and a common checksum
 library is used with appropriate adapters to the actual isa-l implementations.
 [common checksum library](https://github.com/daos-stack/daos/blob/release/2.2/src/common/README.md#checksum)
 
 ## Performance Impact
+
 Calculating checksums can be CPU intensive and will impact performance. To
  mitigate performance impact, checksum types with hardware acceleration should
  be chosen. For example, CRC32C is supported by recent Intel CPUs, and many are
  accelerated via SIMD.
 
 ## Quality
-Unit and functional testing is performed at many layers.
+
+Unit and functional testing are performed at many layers.
 
 | Test executable   | What's tested | Key test files |
 | --- | --- | --- |
@@ -142,15 +147,18 @@ Unit and functional testing is performed at many layers.
 | daos_test | daos_obj_update/fetch with checksums enabled. The -z flag can be used for specific checksum tests. Also --csum_type flag can be used to enable  checksums with any of the other daos_tests | src/tests/suite/daos_checksum.c |
 
 ### Running Tests
-**With daos_server not running**
 
-```
+#### With daos_server not running
+
+```bash
 ./commont_test
 ./vos_test -z
 ./srv_checksum_tests
 ```
-**With daos_server running**
-```
+
+#### With daos_server running
+
+```bash
 export DAOS_CSUM_TEST_ALL_TYPE=1
 ./daos_server -z
 ./daos_server -i --csum_type crc64