Züs - Whitepapers - Architecture

Züs Architecture Overview Thomas H. Austin, Saswata Basu, and Züs Team October 2022 1 Introduction Züs seeks to provide resilient storage curated on the blockchain. Miners are responsible for the creation of the blocks, whereas sharders store these blocks long term. Blobbers provide the actual storage for client data, where they are paid in Züs Cloud Network (ZCN) tokens. This document gives an overview of Züs's key protocols. In Section 2, we detail the process of mining for the Züs blockchain. Section 3 describes payment in the Züs network, including token pools (Section 3.2) and markers (Section 3.2). Section 4 gives an overview of the storage protocol and how blobbers are rewarded for their work. Section 5 describes our token bridge protocol for converting between native ZCN tokens and ERC-20 WZCN tokens on the Ethereum blockchain. 2 Mining on the Züs Blockchain We first review some key terms related to mining. A round is the process of producing a single block, divided into one or more subrounds. In general, a block is produced in a single subround for all of the miners. However, if the process times out for a miner, the process restarts and a new subround begins. The two major entities involved are the miners and the sharders; the miners are responsible for the production and verification of blocks, whereas the sharders store the blocks. Miners that are currently taking part in block production are referred to as being in the active set. Periodically, there is a view change round1 , which produces a magic block defining the miners and sharders that will become the new active set for the following view change. The miners that have been selected to start at the next view change are referred to as the incoming miners or as the DKG miners, referring to the distributed key generation process outlined in Section 2.5. Every round, certain miners are selected as the generators responsible for creating the block. There is a priority to the generators, so that the second 1 For the initial launch, the set of miners and sharders will be fixed; we include the discussion on view changes for future reference. 1 "generator's block is only used if the first generator fails to produce a block. All miners serve as verifiers, who replay the generators' blocks to ensure they are valid and then produce signed verification tickets as their guarantee that the block is valid. The miners then tally the votes from all verifiers; once the block has received sufficient votes, the miners send out a notarization including the verification tickets received from other miners. Miners are expected to store the state of the blockchain and a few recent blocks, but are not expected to store the full blockchain history. The state of the blockchain is stored as a Merkle-Patricia Trie (MPT). In contrast to miners, sharders do not participate in producing or voting for blocks, but are tasked with long-term storage of the blocks. Note that an individual sharder does not need to store all blocks, but only the subset of blocks assigned to them. Similarly, sharders may opt to write older, infrequently requested blocks to a hard disk. More recent blocks may be cached following an LRU scheme2 . This design allows sharders to handle the large number of blocks produced for the blockchain. 2.1 Onboarding a New Miner or Sharder When new miners want to join the Züs set of miners, they begin by getting the newest magic block using 0DNS [2]. The magic block is defined as the block from the last view change round; it thus specifies the latest active set of miners and sharders for the network. 0DNS is a centralized entity that gets the magic blocks from the sharders, links them back to the genesis block, and posts the latest set of miners and sharders for the blockchain so that clients can interact with them easily. From the magic block, we get the set of current sharders. The miner then requests the state (which of the 5 phases) of the view change from the sharders. These stages are discussed in more detail in Section 2.5. The new miner sends a transaction to the blockchain staking tokens and requesting to be added to the list of available miners. Once that transaction is accepted, the miner listens to the sharders for when it is included in the list of incoming miners in the contribute phase of a view change. See Section 2.5 for more details on this process. Similarly, a new sharder gets the latest magic block and waits until it is selected in a view change round. Once selected, they request the blocks that they need to store from the other sharders. 2.2 Block Production Protocol In this section, we give a more detailed step-by-step process of our protocol for block production. Züs's protocol is patterned after Dfinity [5]; reviewing their design may offer insight into our own. 2 However, for our initial launch, all sharders receive all the blocks and store the entire history. 2 "Every miner has a secret public/private keypair given to it through the distributed key generation (DKG) process, as detailed in Section 2.1. In every round, a miner takes the following steps: 1. When a new round begins, the miner calculate the precedence of generators for this round using a verifiable random function (VRF). 2. If selected as a generator, the miner produces a block of transactions and sends it to all other miners. 3. The miner then waits to collect block proposals from the generators. 4. After waiting for block proposals, the miner selects the highest priority block received, as determined by the ordering of generators. 5. The miner sends out a signed verification ticket. This verification ticket contains: • Round number. • Block id (block hash). • Verifier id. • Signature. 6. The miner then waits to collect verification tickets. 7. When the miner receives sufficient tickets for a block, it broadcasts a notarization to both the miners and sharders. The notarization contains: • Verification tickets. • Block id. • Round number. 8. The miner begins a new round to work on the next block in the blockchain. Generators are ordered by the priority of their blocks for the round. If the highest priority miner fails to produce a block, the block produced by the second highest priority is used, and so on until the lowest priority generator. In the unlikely event that no generator is able to produce a block (or that a miner has not received blocks from any generators), the round is said to timeout and the round is restarted, beginning a new subround. A new set of generators is selected, and the process repeats as per normal. If a miner receives a notarized block from a different subround, it changes its own subround to match the subround of the notarized block. At this point, the miner can proceed to the next round to make the next block in the blockchain. 3 "Figure 1: Block Notarization and Finalization 2.3 Finalization Process After a miner notarizes a block, it attempts to determine which blocks are finalized; after a block is considered finalized, a miner will not accept any alternate blocks to this block or earlier blocks. The miner begins at the most recently produced block and walks back through the previous block to check for notarizations. If a block has received notarization from one node3 , it is marked as final; by extension, so are all previous blocks in that blockchain. Figure 1 shows the block notarization and finalization process. Each row represents the precedence of blocks, with the top row representing the block produced by the highest priority miner for the round. Each column represents the round number. Each node represents a single block. Yellow blocks are notarized, but not yet finalized; green blocks have been finalized; and grey blocks have been produced but neither notarized nor finalized. The notation “R57,9,M4” indicates that the block was produced in round 57, that it has 9 notarizations, and that it was produced by miner 4. The figure shows that in round 61, miner 3's block failed to get sufficient verification, and the second highest priority block was notarized and finalized in its place. In round 63, the highest priority block producer failed to make a block; therefore, the second highest priority block was again notarized and finalized in its place. 2.4 Merkle Patricia Trees and Recovery When a miner or sharder is missing state, either because it is newly added to the active set or because it was temporarily unavailable, it must sync its state with the rest of the mining network. To do this, the miner or miner requests the MPT updates from the other miners. This section provides a more in-depth description of this data structure and how it is used within Züs. 3 Notarization from one node is sufficient, since every notarization contains verification tickets for 2/3 of the nodes. 4 "Züs uses a consensus algorithm that is speculative resulting in a slightly lagging finality. This trade-off achieves high throughput and is similar to how hardware optimizes the instruction pipeline with speculative branch prediction to move as fast as possible. Operating at a very high speed along with speculative branching requires a very sophisticated data structure to manage the state. Züs uses the same Merkle Patricia Trie (MPT) as Ethereum [4]. An MPT contains a single root node and several intermediate nodes and leaf nodes. The intermediate nodes can be either full nodes or extension nodes4 . All these nodes are persisted into RocksDB [10]. Due to speculation, not every computed node is immediately saved to the database. It is necessary to operate both persisted nodes as well as in-memory nodes. Further, among the in-memory nodes, it is also necessary to have several parallel paths and following each path can result in different state values. To deal with all these different types of nodes and to support persistent and in-memory branches of nodes, an abstraction is created called NodeDB. There are three implementations of this abstraction: 1. MemoryDB. This is used to track any insert/update/delete changes into the current computation. The current computation can be the entire block or a single transaction. This is important because we are able to use this to support transaction rollbacks in case of any execution failure of a transaction without rolling back the entire block. 2. LevelDB. This is used to support multiple simultaneous branches. That is, each speculative branch path will have a separate LevelDB for each block within the branch. A LevelDB is like a linked list with a previous and current members each of which is a NodeDB. A current member is either a MemoryDB or a PersistDB (see below for the later). Previous members can either be a LevelDB pointing go the state as of the prior block that is still in-memory or a PersistDB where any unmodified state beyond this LevelDB is already persisted. 3. PersistDB. This provides an interface to interact with the underlying RocksDB to query and save the MPT nodes. When a block is created (to propose or validate), its state starts with a LevelDB with the prior member pointing to the LevelDB of the prior block and the current member pointing to a brand new MemoryDB. As transactions are executed, the state changes are captured into the MemoryDB. When a block is finalized, its state is persisted. At this time, the level DB of the finalized block is updated with both the current and prior members pointing to the PersistDB. This design ensures that there is no need to traverse back any further. The above two ways of managing the block's database as a block goes through different stages ensures that a) speculative state is accumulated across 4 See the Ethereum wiki [4] for details on these node types. 5 "blocks in the memory b) finalized state is committed to the persistent storage. Older blocks get eventually deleted freeing up memory for the new blocks ensuring a continuous progress of the blockchain without any memory pressure. Since the number of blocks retained in the memory at any given time has some upper bound, the memory requirement depends mostly on the number of transactions per block. MPT is a versioned data structure. That is, any change to any of the leaves results in changes that get propagated all the way up to the root. As a result, as the blockchain progresses nodes that are no longer used quickly get accumulated. As these are persisted, the database grows over time and will have a severe impact on the performance. To deal with this problem, we have a state pruning mechanism that happens in the background. Periodically the state is pruned getting rid of unused MPT nodes. This is done following an algorithm similar to mark and sweep that is used for garbage collection. Each leaf node has an origin field which indicates the block round when that node was introduced. During the “mark” phase, the entire MPT is traversed from the root (as of a given block in the past to have sufficient buffer) and all reachable nodes are updated with an UpdatedVersion indicating the round of the root. Any node that is still reachable from a given root should be retained. During the “sweep” phase, all nodes within the PersistDB are traversed and any node whose UpdatedVersion is less than the root version being pruned is deleted. It should be noted that when traversing the MPT from the root, it is possible to discover any missing nodes, which indicates that the state is not fully available; these nodes are synced from other miners/sharders. There are times when the prior state is not available. This can happen due to a) a miner joining as part of a view change b) a temporary network outage c) a miner going down due to crash, scheduled upgrade and so on. Züs has a robust way to cope with such incomplete state for miners. The Züs blockchain progresses in an optimistic manner without waiting for the entire StateDB to be synced. Sharders, however, must sync all blocks up to the current block. When a prior state is not available multiple things are tried. First, any missing previous blocks are fetched and their state is computed. This process is done only up to a certain number of blocks, currently 10 in our configuration. Anything beyond, the logic falls back to requesting delta state change for the block (this delta change can be securely verified). In addition, delta changes are applied optimistically directly on top of what is already present. This allows being able to compute and validate state changes even when the entire state is not available (so long as the changes don't require retrieving any missing intermediate nodes). Because of all the above mentioned sophisticated algorithms, it is important to make sure that the state changes are done properly by dealing with issues like concurrent updates. It should be noted that the current implementation of MPT is not very general but implemented to satisfy requirements of a global state. For example, a general MPT can contain values on the full nodes. But such a scenario never 6 "arises for the global state. Similarly, all paths used to access the MPT are of fixed size for the global state, where the path is the client id (a 64-byte hash). Adoption of the Global state MPT for other use cases should keep this in mind or make changes appropriately to support additional use cases. 2.5 View Change and Distributed Key Generation Periodically, the active set of miners and sharders is updated. This process is referred to as a view change. Züs's blockchain relies on a publicly verifiable random value calculated with each round of block generation. This is achieved by using distributed key generation (DKG) that provides the ability to recover a group signature using a t-of-n threshold signature scheme. That is, in order to defend against Byzantine and other faults, only a threshold number of signatures are required to reconstruct the group signature that everyone can verify and agree upon. The group signature is used as a seed to create a random number for each round and the signature for a given round is based on the random number from the previous round. Züs uses the Joint-Feldman by Pedersen [9]. This protocol starts with n parties but eventually ends up with q parties, referred to as qualified parties. The qualification happens due to the nature of distributed key generation that can result in Byzantine condition. The original protocol suggests using complaints and revealing individual secret shares to narrow down the qualified set of parties that are all verified to have shared the secrets correctly. The protocol requires broadcasting on the order of O(n2 ) messages for the complaints and revealing. In addition to the verbosity of the protocol, the timing of those messages and the reliability of coordinating them across all the parties can be a challenge. Züs uses a novel approach by using the blockchain itself to solve this problem. All the publicly available information is submitted as transactions to the blockchain; each of the n parties need to initially submit one transaction each on the blockchain with the public information of the protocol. After this, the parties will privately send the party specific information among themselves, which are of the order O(n2 ). However, there is no complaint or revealing. Instead, each party as part of distributing its secret shares to the others will collect digital signature of acknowledgement that they have received and verified their secret share against the public information already available on the blockchain. At the end of distributing the secrets to all the parties, each party ends up with either: • A digital signature verifying a valid distribution, or • no digital signature because the receiving party either did not provide one (Byzantine) or was not responsive (temporary network failure). After some amount of trying to distribute the shares, eventually each party will publish another transaction on the blockchain with an array of either: • digital signatures confirming the receiving of valid secret shares, or 7 "• the secret share value and the corresponding party that did not provide or receive the secret share. That is, the onus is on each party to submit this information to the blockchain. Any party that is not able to gather at least t confirmation signatures is automatically disqualified. For any party that received fewer than the threshold confirmations, their corresponding secret shares are revealed but they cannot be used to reconstruct the secret information of that party. The revealed secret share values can be used by honest parties that genuinely did not receive the secret shares at all or were given incorrect values, as they are now publicly available on the blockchain and verifiable. Using the above logic, the qualified set can be deterministically computed by each party using the publicly verifiable information available on the blockchain. Hence, this design eliminates some of the complexities of identifying the qualified set with the original DKG protocol. 2.5.1 The View Change Process The view change process is divided into 5 phases: in the start phase, the incoming miners are selected; in the contribute phase, they calculate their public and private keys and write the public information to the blockchain; in the share phase, the miners distribute shares of the secret key to all miners; in the publish phase, the miners send their shares and signatures to the smart contract; finally, in the wait phase, the magic block is created for the next view change. Start: During this phase, the smart contract gets the list of all miners who have registered. The list is sorted by stake and n miners are selected to become the incoming miners. If there are more than n miners in the list, only those with the most tokens staked join the set of miners. Once the list is made, it is saved in the smart contract. This design allows miners who are not part of the current view change to request the list from the sharders through a REST API. Contribute: Part of the incoming miners list specifies: • The maximum number of miners, n. • The threshold of signatures needed for the view change to be successful, k, which should be greater than t but less than n. • Threshold of signatures to verify a signature, t. Each miner in the miners list uses the n and t parameters to create a miner public key (MPK), which is sent to the smart contract. Any miner who fails to send an MPK during the contribute period is removed from the incoming miners list. This is done because the MPK will be used to verify the signatures during the share phase and the share or signatures during the publish phase. 8 "Note that there must be at least k signatures, or the process will fail and need to restart. This design provides some redundancy in case of an active set miner later failing to perform its duties. If the minimum amount of k miners are selected during the view change process, up to k − t miners could fail without halting block production. Share: During this phase, the miners only communicate with each other. They use the MPKs they sent to the blockchain to verify the secret shares sent to them from the other miners for every miner in the incoming miners list and send them the share. When a miner receives a share, it uses the published MPK from the miner who sent the share to verify it. Once verified the miner will use its private key to sign a message and send it back to the original miner. The original miner collects the message and signature from all the other miners in the incoming miners list. If a miner doesn't receive a signature it uses the secret share instead for that miner. Publish: Every miner sends the collection of shares or signatures to the smart contract. The smart contract verifies the share and signatures are correct. If enough shares for one miner come in it is removed from the incoming miners list. Likewise, if a miner doesn't publish the shares or signatures then it is also removed from the list. Wait: At the beginning of this phase, a magic block is created for the next view change. Every miner uses the list on the magic block to determine the secret shares used for their personal private key for the verifiable random function (VRF) in the next view change. 3 Payment In this section, we review how payment works in the Züs network. The native token of Züs is ZCN. The total supply of ZCN is capped at 400 million tokens. 3.1 Service Providers, Staking, and Delegates Züs's ecosystem includes a variety of service providers, including miners, sharders, and blobbers. We refer to them generically as service providers. Service providers must stake a certain amount of tokens to ensure their proper behavior; if they do not perform their duties adequately, a portion of these tokens may be slashed (destroyed). When tokens can be unstaked (reclaimed) depends on the role of the service provider. For instance, miners and sharders tokens can only be unstaked during a view change. Blobbers can unstake tokens at anytime for un-allocated storage, but cannot unstake tokens for allocated storage. Similar to protocols such as EOS [6], Züs includes the role of delegates. A delegate is a client that wishes to stake tokens to share in the rewards, but 9 "who does not directly participate in the service provided. They stake tokens on behalf of the service provider, and share in any rewards or punishments that result. When rewards are paid out, the service provider takes a service charge percentage of the total rewards. The remainder of the rewards is divided between the delegates according to the amount of tokens that they have staked. (Note that the service provider itself may also be a delegate if it staked tokens). In some cases, rewards may be divided between multiple types of service providers. For instance, transaction fees and the block reward for a block are divided between miners who produce the block and the sharders who store it by a specified share ratio. 3.2 Token Pools and Markers A client may set aside tokens into a token pool. The tokens in these pools retain a connection to the client, but cannot be accessed except by the rules specified for the pool. For instance, a stake pool holds tokens that a delegate may have staked; the delegate can reclaim them eventually, but the rewards may be slashed if the terms of service are not met. In some cases, a client may wish to set aside funds for some ongoing service that a provider can draw from when they prove work has been done. Züs provides this behavior with token pools and markers. A marker is a message signed by the client authorizing the release of tokens when some condition is met. Token pools and markers have a relationship roughly akin to bank accounts and checks. Not all token pools allow the release of funds through markers, and the nature of the markers may be different for different types of token pools. The markers may include additional data, and thereby can serve as a formal commitment by the service provider to some service for the client. For instance, write markers (discussed in Section 4) commit a blobber to store specific data; if it is later found to be missing that data, it may be penalized. An additional benefit of this design is that the service provider can observe the balance of the token pools, and thereby know that there are sufficient funds available to pay for the service, with two caveats: • The client may decide to close the token pool and, depending on the design of the pool, recover (some portion of) the tokens; however, there is a delay before the client recovers their funds, allowing service providers to redeem any outstanding markers. Depending on the service, the client may need to meet additional conditions before reclaiming their tokens. Section 4 illustrates an example of how this process can work. • If multiple providers can draw from the same token pool, there may be outstanding markers for the other providers that have not yet been redeemed. 10 "4 Storage In Züs's network, storage is provided by specialized entities called blobbers. A blobber is responsible for storing data in exchange for rewards. Our design relies on the use of signed markers, as described in Section 3.2. Critically for storage, when the blobber redeems these markers on the blockchain, they also serve as a public commitment to store the data provided. Our protocol was first outlined in a DAPPCON paper [8], as well as a related technical report [3]. For storage, there are different amounts of storage that must be understood for a blobber: • capacity: The total storage that a blobber physically offers. • staked capacity: The amount of capacity that is also backed by staked tokens, either from the blobber or delegates. • free capacity: The amount of staked capacity that has not already been purchased by a client. • purchased capacity: Of the staked capacity, the amount that clients have currently purchased, whether they are storing anything or not. • used storage: Of the purchased capacity, the amount that the client is currently using. A stake pool stores tokens backing a specific blobber's offer of storage. After the offer of storage expires, the stake pool tokens return to the delegates. Note that a stake pool is actually a collection of delegate pools, where each pool represents the tokens belonging to a specific delegate. A blobber may offer additional capacity at any time. However, capacity can only be lowered if it has not already been staked. Similarly, while a delegate can request to unstake tokens at any time, the request can only be granted when it would not drop the staked capacity below the purchased capacity. To provide resiliency, Züs uses erasure coding. When confidentiality is also needed, the data can be encrypted as well; in this case, we use proxy reencryption (Section 4.12) allowing the data to be re-encrypted by the blobber without revealing the contents of the data. Therefore, we use an encode-thenencrypt setup. 4.1 Storage Structure Before we describe the different actions that can be taken with storage, we first review the organization for the files that a blobber stores. The files system is organized in a Merkle-tree like structure called a Git tree, patterned after Git's organization of files. The root of the Git tree is stored in write markers in an AllocationRoot field. Since the write marker is signed by the client, the write marker can be used to verify the state of the blobber's system. The leaves of the Git tree are metadata files relating to the files being stored, allowing us to 11 "validate a variety of properties of a file against the write marker. These different aspects will be detailed below in this section. 4.1.1 Git Trees A Git tree is a Merkle tree-like structures similar to those used by Git. The leaves of the Git tree are the hashes of the metadata files for the different files stored by the blobber. Any information stored in these files can thus be verified against the write marker. Directories are mappings of file names to the hash values; they are likewise named by their hash values. Verifying that a metadata file is contained in a Git tree is straightforward. With the hash of the metadata file is included, we only need to send the path to the root of the tree. This is essentially a Merkle path. To avoid confusion with the Merkle path of a file's contents, we refer to this path as a Git path. 4.1.2 Write Marker Format A write marker is signed by the client paying for storage, allowing us to verify that the blobber is storing the files that the client wanted stored. Write markers contain the following fields: • AllocationRoot • PreviousAllocationRoot • FileMetaRoot • AllocationID • BlobberID • ClientID • Size • Timestamp • ViewNumber - Indicates the “version” of the data. • Signature - The signature of the client that owns the allocation. The AllocationRoot serves to verify the agreement between the client and the blobber on the file system contents. Since the client signed the write marker, we can be sure that it agrees. Redeeming the write marker on the blockchain serves as the blobber's handshake agreeing to the data stored. 12 "4.1.3 Metadata Files The metadata files are the leaves of the Git tree, and may thus be tied to the write marker. The metadata files currently contain: • AllocationID • Path • Size • ActualFileHash • ValidationRoot - a standard Merkle tree hash of the file contents, used for a client reading data to verify that the data is correct. • MerkleRoot - a modified Merkle tree hash, used specifically for challenges. Challenge hash, also referred to as the FixedMerkleTree hash. • ViewNumber 4.2 Initializing Blobber When a blobber registers to provide storage, it specifies its total storage capacity, its pricing for both reads and writes, and the duration (max offer duration) for how long its pricing is valid. The duration starts from the timestamp of the transaction where the offer of storage was first made. Note that the blobber cannot offer storage immediately. First, there must be enough tokens staked to guarantee service, as discussed in Section 3.1. These tokens do not have to be staked by the blobber itself, although we expect that the blobber will provide at least some of the stake. Other clients may serve as delegates, staking tokens on behalf of the blobber and sharing in the rewards offered. The blobber goes through the same process when it wishes to expand or decrease the storage that it offers, increasing or decreasing the stake amount of tokens needed. A blobber can specify a capacity of 0 if they wish to stop providing storage altogether. It should be noted that a blobber cannot abandon its existing storage agreements. The blobber must maintain those allocations until the user releases them, or until the duration of the storage offer elapses. Currently, all allocation periods are fixed to 1 year. 4.3 New Allocation An allocation is a volume of data associated with a client, and may potentially be stored with many blobbers. To set up a new allocation, a client specifies the price range that they are willing to pay for reads and writes, the size of storage they require, and the duration that they need for the storage (specified as an expire parameter). 13 "For each allocation and geolocation, a client must have two token pools of funds that blobbers can draw on to be rewarded for the work they have done. The pools are: • A write pool, used to pay blobbers for any data uploaded and stored. • A read pool, used to pay blobbers for any reads of the data stored. The read pool is associated with client's wallet so that they can read from any blobbers. The write pool is tied to allocation and its set of specific blobbers. When requesting a new allocation, the client must specify: • The price ranges that it is willing to pay for both reads and writes. • The size of data that it needs to store. • The expiration time for when that storage will no longer be required. • (Optionally) A list of preferred blobbers. 4.4 Uploading Data When data is uploaded, blobbers may have different states, even after a blobber has received a message to commit changes. For instance, a client might crash after sending some blobbers commit messages and fail to send commits to the remaining blobbers. If not handled with care, the client's data could be left in an state where it could not longer be reconstructed. To provide a way to get back to recover to the correct state (S0), even if it is committed on part of the network, we use a two-commit approach. Figure 2 helps to visualize this state for multiple blobbers. The old state (S0) is shown in green, with the new state (S2) in red. The portion in yellow (S1) represents what each blobber considers the “official” state. The first two blobbers know that all other blobbers have received the new data, but not necessarily received the message to pre-commit. They store the changes in a pre-commit directory until either a commit message or a rollback message is received. The other blobbers have received the data, but have not received the pre-commit message, and might discard the data if they do not receive the pre-commit. Once all blobbers have received the pre-commit message, the client sends commit messages to all blobbers. The blobbers may then safely discard the old state and consider S2 to be the new official state. In the event that the commit operation does not succeed on all blobbers, there may be a temporary discrepancy about the official state until the client is able to complete the commit operation, but the allocation will never be left in an unrecoverable state. A ViewNumber field is included in the write marker. This field is set to 0 for the initialization of the allocation and incremented for each write marker accepted by a blobber. We also include save points in the ViewNumber field after a ‘.'. For example “42.3” would be for version 42, save point 3. 14 "Figure 2: State of Data Across Blobbers We use a two marker system, meaning that a blobber precommits data when it receives the corresponding write marker. When a subsequent write marker is received, it commits the precommitted data and precommits the next batch of data. As a result, a blobber never needs to store more than 2 versions of state for a file system. (Note that when a session concludes, the client should commit the last back of precommitted data.) Figure 3 shows the process of a client uploading data to its blobbers. Though our diagram only shows two blobbers, the steps are repeated for all blobbers storing the allocation's data. Here are the steps of the process, following the sequence diagram: In steps 1 and 3, the client requests locks from the blobbers, who respond with their version information in steps 2 and 4. If the blobbers are ahead, the client needs to catch up first. If the blobbers are not in sync, we might need to trigger a rollback, discussed in Section 4.11. Once the client acquires a lock with a blobber, this begins a new session. In steps 5-8, the client sends some operations to the blobbers. Once completed, the client (steps 9 and 12) sends write markers to the blobbers. The blobbers then (steps 10 and 13) commit the previous version (if needed), and precommit the new version. The blobbers then (steps 11 and 14) reply with an acknowledgement. At this point, the client still retains the lock and may send more operations. In steps 15-16, the client sends an additional operation to the blobbers. The client and blobber repeat the process of sending and committing write markers (steps 17-22). Note that write marker write marker b1 sp1 replaces write marker b1 sp0, so that blobber 1 only needs to commit the second write marker to the blockchain 15 "Figure 3: Client Uploading Data to Blobbers 16 "to receive its rewards. Also, note that the client will not send write marker b1 sp1 if it has not received the acknowledgement for write marker b1 sp0. As a result, blobber 1 and blobber 2 should never be more than one version/save point apart for their write markers. Once the client has ensured that all blobbers have moved to the latest version, it (steps 23-34) notifies the blobbers to release the lock and to commit data from the last write marker. On commit (steps 10, 13, 18, 21), the blobbers can delete 2 revisions back (if available), trusting that the client would not have sent a write marker if the blobbers were not in sync for steps 2 and 4. The previous state of the system should not be deleted yet, since we might need to rollback, but we should be able to assume that we won't need to roll back 2 versions. After step 22, the client has verified that its blobbers are in sync. In steps 23-24, the client notifies the blobbers to release the locks. The blobbers may now delete the old data safely. 4.5 Challenges Blobbers and their delegates receive rewards for reads immediately. However, writes are paid through challenges where the blobber must prove that they are storing the data that they are paid to store. However, token rewards for writes are instead transferred to a challenge pool. Tokens in this pool are not made immediately available to the blobber or its delegates, but they may receive those rewards after passing a challenge proving that they are storing the data that they claim. Outsourcing attacks, where the blobber stores their data with another storage provider, are of particular concern. Our protocol ensures that the content provided for verification is 64 kB and the content required to create this verified content is the full file fragment. Our process is illustrated in Figure 4. The file is divided into n 64 kB fragments based on n storage servers. Each of these 64 kB fragments is further divided into 64-byte chunks, so that there are 1024 such chunks in each 64 kB block that can be addressed using an index of 1 to 1024. The data at each of these indexes across the blocks is treated as a continuous message and hashed. Then the 1024 hashes serve as the leaf hashes of the Merkle Tree. The root of this Merkle tree is used to roll up the file hashes further to directory/allocation level. The Merkle proof provides the path from the leaf to the file root and from the file root to the allocation level. In this model, in order to pass a challenge for a file for a given index (between 1 and 1024), a dishonest blobber first needs to download all the content and do the chaining to construct the leaf hash. This approach discourages blobbers from outsourcing the content and faking a challenge response. There are 3 hashes for a file stored with a blobber. The actual file hash is used by a client to verify the checksum of a downloaded file. This hash is the hash of the original file. It is stored in the ActualFileHash 17 "Figure 4: Fixed Merkle Tree 18 "Figure 5: Write Payment Overview field of the file metadata. The validation hash is used by a client who downloads data to verify that the data is correct. It is a Merkle hash, thus allowing segments of data in a file to be validated without needing to download the entire file. This hash is stored in the ValidationRoot field of the file metadata. Finally, the challenge hash is used to verify challenges, and is stored in the MerkleRoot field of the file metadata. Unlike the ValidationRoot, the files are hashed with a modified version of the Merkle tree called a fixed Merkle tree. This structure is designed to make challenges difficult to pass if a blobber is not storing the file locally. Figure 5 shows a high level view of the payment process. A client must have funds committed to a write pool before uploading data. Then, when uploading files to the blobber, the client must include write markers along with the files. Critically, the challenge hash is used to build the MerkleRoot field of the file metadata; thus, when the blobber commits those markers to the blockchain, it serves as the blobbers commitment to the stored data. Redeeming the markers transfers them to the challenge pool. When the blobber is challenged to prove that the data is stored correctly and successfully passes the challenge, the tokens are transferred from the challenge pool to the blobber. When a new block is produced, miners will slash the stake of blobbers who either failed a challenge or who have not responded within the allowed time. Every block also provides a new challenge based off of the VRF (discussed in Section 2). There are 10 validators selected from other blobbers that may verify the challenge (though the blockchain may be configured to require more or less validators). Critically, the validators do not need to have any pre-existing knowledge of the data stored, since it can be verified against the write marker stored by the challenged blobber. At a high level, the challenge protocol involves three phases: 19 "• Using the VRF result, a single block of a file stored by one specific blobber is selected5 . We refer to this stage as the challenge issuance. • In the justification phase, the blobber broadcasts the data to the validators along with the metadata needed to verify the challenge. • Finally, in the judgment phase, the validators share their results. We now detail the justification phase. When a file for a given index (between 1 and 1024) is challenged: 1. The blobber generates the input to the hash function (designated as hash blocks in Figure 4.) 2. The blobber broadcasts to all validators: • the hash block • the Merkle path to the MerkleRoot of the file • the corresponding file metadata • the Git path to the file metadata • the latest signed write marker. 3. The validators verify that the write marker is the latest one committed to the blockchain and that the signature on the write marker is valid. 4. If the write marker is valid, the validators hash the hash block, and verify that the resulting hash and the Merkle path match the MerkleRoot field of the file metadata. They then verify that the hash of the file metadata and its Git path match the AllocationRoot field in the write marker. For a more detailed discussion on the challenge protocol, see [3]. 4.5.1 Time Allowed for Challenges The challenge time (CT ) is the amount of time allowed for a blobber to respond to a challenge. Since the file size may significantly affect the blobber's time to respond to the challenge, we factor that into our formula, shown below: CT = M ∗ F S + K Since it will take longer to provide the data for larger files, the time equals the size of the file (F S) times a fixed multiplier (M), with an additional time allotment constant (K) that represents the outer bound of time it is expected for the first block to be transmitted. 5 Specifically, we use the VRF to randomly select a partition of the blobbers, then randomly select a blobber from that partition, then a random non-empty allocation stored by the blobber, then a random file in that allocation, and finally a random block within that file. At all steps, the VRF provides the random seed. 20 "4.6 Updating Allocation A client can change the size of an allocation. If extending the allocation, the client must negotiate new terms. If a client reduces the size of the allocation, they may continue to use the existing terms of the allocation. 4.6.1 Extending Allocation For a client to extend their allocation, they must have sufficient tokens in their write pool and the blobbers must have sufficient storage capacity. Otherwise the operation will fail. The client will continue to pay the original rate for their first allocation, but will pay the new rate for the extended period. 4.6.2 Reducing/Closing Allocation When the client reduces its allocation, it may reclaim some of its tokens. However, there is a delay allowing blobbers to still claim the tokens for the services that they have already provided. Note that any tokens in the challenge pool are not returned to the client; once they leave the write pool, they are considered to have been paid to the blobber and its delegates. The client may cancel an allocation at any time, though they pay a penalty for doing so. If the client cancels the allocation, then the allocation is finished and the blobbers may stop storing the client's data. 4.7 Adding/Removing Blobbers Occasionally, a blobber may need to be replaced. This replacement might be triggered by the client who owns the data or it could be the result of repeated failed challenges (which the client could observe). In either case, this process is initiated by the client. First, the client writes a transaction to update their allocation to add a new blobber. At this point, the new blobber will accept writes (though it might need to queue them up). However, the new blobber won't respond to reads until it has been able to sync up the data. The client must acquire the data to give to the blobber. The client might already have the data cached locally. If not, they must acquire it, either by reading from the old blobber if it is still available, or by reconstructing the data from the other blobbers. The client then uploads the data to the new blobber. Note that while the client must pay for these writes, they may have previously recovered token from failed challenges if the old blobber was not performing adequately. After the new blobber has been able to sync up, it writes a transaction to cash in their write markers, effectively declaring itself online. At this point, the new blobber is not available for reads and challenges. 21 "Finally, the client writes a transaction to the blockchain to drop the old blobber. The old blobber will no longer be selected for reads or writes, and may safely discard the data. However, it still may redeem outstanding markers. 4.8 Reading from Allocation Similar to how writes are handled, clients write special read markers to pay blobbers for providing data. Section 3.2 details the philosophy behind markers in more depth. Read markers contain the following fields: • ClientID - the reader of the file. • ClientPublicKey • BlobberID • AllocationID • OwnerID - the owner of the allocation. • Timestamp • ReadCounter - used to prevent the read marker being redeemed multiple times. • Signature When the ReadCounter is incremented, the price is determined by multiplying the increase in ReadCounter by the size of the block and the read price. The blobber is paid immediately when the read marker is redeemed. The client reading the data may elect to validate the data that they receive from the blobber against the ValidationRoot field in the file metadata files. The metadata files themselves can be validated against the latest write marker. Note that the metadata files are signed, and provide some degree of validation; however, a malicious blobber could provide a stale metadata file. The sequence diagram in Figure 6 outlines how a file could be verified while it is being downloaded. In steps 1 and 2, the client requests the file from the blobbers. In steps 3 and 4, the blobbers respond with the write marker, the Git path to the metadata file, and the metadata file. The client then (step 5) verifies the metadata files against the write markers and git paths. If everything is valid, the client is ready to receive data. They may now rely on the ValidationRoot values stored in the metadata files. In steps 6 and 7, the blobbers send the first block of data from the file along with the corresponding Merkle paths to the root of the file. In step 8, the client verifies the downloaded data, checking that the blocks match the Merkle paths, and that the Merkle paths match the ValidationRoots. If anything is amiss, the misbehaving blobber can be identified. 22 "Figure 6: Data Validation Steps 9-11 repeat the same process, with one notable difference. Since many of the nodes in the Merkle paths may match the previously sent Merkle tree, the blobbers will only need to send the additional nodes needed for the Merkle path. 4.9 Livestreaming and Videos Züs provides support for livestreaming, allowing a client to upload audio/video data to Züs's network on a continuous basis so that other clients can watch it continuously. We use M3U8 format for our livestreaming. The client providing the data divides the livestream into chunks of a specified duration (configured to one second at the time of this writing) and uploads them to the blobbers. The client viewing the livestream downloads the chunks locally and allows the viewer to watch the livestream. For other videos, our files may be much larger than the files provided for livestreaming. In order to allow the viewer to jump around in the video file, the client viewing the data can download 64 KB data blocks from within the file without needing to download the entire file. Once downloaded, these are converted into a byte stream. 23 "4.10 Blobber Support for General Data Protection Regulation In an effort to give people greater control over their personal data, the European Union introduced the General Data Protection Regulation (GDPR). The Züs network includes functionality to introduce privacy reports about the usage of a customer's data on their request. With Züs, each blobber stores usage statistics in a local database. Therefore, the Züs network promises a best effort, relying on the blobbers to report accurate results. This feature is optional, and a gdpr boolean flag allows smart contracts to find blobbers that support it. For blobbers that do support it, the feature is enabled for all users by default. Of course, blobbers might charge a slightly higher price for this service. 4.11 Repair Protocol With Züs, if a client device fails in mid operation, only a minority of blobbers might have received a write marker for any given update. In the worst case, the client device might have lost the data and be unable to complete the change. For illustration, we refer back to Figure 3. Following the sequence diagram, if blobber 1 commits (step 10), but blobber 2 does not (step 13), it is possible that blobber 1 would be a commit ahead of blobber 2. If there are not enough blobbers to reconstruct the latest state, blobber 1 would need to roll back. The process would be: 1. The client realizes that a rollback is needed. 2. The client sends a rollback message to the affected blobbers indicating what version we are reverting to. 3. The blobber reverts to the previous state, following Git's approach. The newer version of files must still be stored temporarily. Note that the blobber cannot roll back two versions, but as long as the client is following the protocol, that issue should not arise. If the client does not follow the protocol, it is possible that it could corrupt its data. 4.11.1 Challenges and Rollbacks If a blobber has previously committed a write marker to the blockchain and it is rolled back, it is possible that the blobber could be challenged for a file that is no longer relevant. The blobber must store these files until a new write marker is committed to the blockchain with a version number that is the same or higher. 24 "Figure 7: Uploading Data with Proxy Re-Encryption 4.12 Proxy Re-Encryption Proxy re-encryption (PRE) allows a user to store confidential data in the cloud without having to trust the storage provider. The data is encrypted with the data owner's private key; when they wish to share their data, they derive a re-encryption key from their own key pair and the receiver's public key. This re-encryption key allows the data to be re-encrypted for the receiver's public key without ever decrypting the data. As a result, the cloud provider can convert the data without being given an opportunity to read the confidential data. We use the approach outlined in Selvi et al. [11]. Figure 7 shows how data is uploaded when using proxy re-encryption. The client first (1) erasure codes the data into fragments, with one fragment per blobber. For each blobber, the client then (2,5) generates a public/private keypair (if it does not already have a keypair associated with that storage provider). The client then (3,6) encrypts the corresponding fragment with the public key, and (4,7) sends the encrypted data for the storage provider. For data transfer, the client requesting the data must first request the data from the client that owns the data. (For convenience, we will refer to the client owning the data as the seller and the client requesting the data as the buyer, even if the seller does not actually request any compensation for allowing access to their data.) Figure 8 shows an overview of this process. The buyer (1) requests data from the seller, specifying its public key and the details of the data desired. For each 25 "Figure 8: Data Transfer with Proxy Re-Encryption 26 "storage provider, the seller then (2,4) calculates the re-encryption key from the buyer's public key and the keypair associated with the blobber. The seller then (3,5) sends the re-encryption key and the ID of the buyer to the corresponding storage provider. The storage provider retains this information. Once the initial phase is complete, the seller (6) sends a confirmation to the buyer including the list of blobbers. The buyer then (7,10) requests the data from each storage provider, specifying its ID and the data requested. Each storage provider (8,11) re-encrypts the data with the re-encryption key, sending the results to the buyer. The buyer (9,12) decrypts the fragments and (13) reconstructs the original data. 5 Token Bridge protocol In this section, we describe our design for transferring ERC-20 tokens for Züs on the Ethereum blockchain to native tokens on the Züs blockchain. In this discussion, we refer to the ERC-20 [12] tokens as WZCN, and to native Züs tokens as ZCN. Our design is inspired by the Rainbow Bridge protocol [1]; reviewing their design can be beneficial to understanding our own. In our design, an Ethereum smart contract is needed to track the WZCN tokens, provably destroying (burning) them when a client wishes to convert to ZCN, and minting new WZCN when a client has provably destroyed ZCN. We refer to this smart contract as the WZCN mint. On the Züs end, Züs miners must verify special ZCN mint transactions and then generate new ZCN. We also introduce the role of authorizers, trusted entities responsible for verifying transactions and sharing the results between the blockchains. The membership set of the authorizers must be maintained on both blockchains; we omit the details of tracking authorizer identities in this paper. 5.1 Converting WZCN to ZCN Figure 9 details the process of converting WZCN to ZCN. The steps are outlined below: 1. The client owning WZCN writes a transaction to the WZCN mint to burn tokens; we refer to this transaction as the WZCN burn transaction. This transaction contains: • The amount of WZCN burned. • The client's ID on the Züs blockchain. • A sequential nonce. (This nonce is distinct from the nonce used in Ethereum's protocol.) 2. The Ethereum network accepts the transaction and includes it in the blockchain. Note that it will only be accepted if the nonce is one greater 27 "Figure 9: Converting WZCN to ZCN than the previous nonce. The authorizers monitor the Ethereum blockchain for WZCN burn transactions. 3. Each authorizer verifies that the transaction has been accepted on the Ethereum blockchain. If the request is valid, the authorizer sends the client a proof-of-WZCN-burn ticket. This ticket contains: • The Ethereum transaction ID. • The amount of ZCN to be minted. • The client ID to receive the ZCN. • The nonce provided the client in step 1 above. • The authorizer's signature. 4. Once the client has gathered sufficient tickets, they write a ZCN mint transaction containing these tickets. The Züs miners verify the validity of the tickets, checking the signatures and the nonces. If the ZCN mint transaction is valid, the miners accept the transaction and allocate new ZCN to the client's account. 5.2 Converting ZCN to WZCN The reverse process is similar. A client burns ZCN and presents proof to the authorizers, who then authorize the generation of new WZCN tokens. The steps for this process are as follows: 1. The client writes a ZCN burn transaction, destroying ZCN tokens. This transaction includes: 28 "• The amount of ZCN to burn. • The Ethereum address to receive the new WZCN. • The nonce value, using the same sequence as in Section 5.1. 2. The Züs miners accept this transaction if the nonce is valid and the client has sufficient ZCN. 3. Each authorizer monitors the Züs blockchain for a ZCN burn transaction. Once the transaction is accepted, the authorizer sends the client a proofof-ZCN-burn ticket, containing similar information as the proof-of-WZCNburn ticket: • The Züs transaction ID. • The nonce value. • The amount of WZCN to be minted. • The Ethereum address to receive the WZCN. • The authorizer's signature. 4. Once the client has gathered sufficient tickets, they write a transaction to the WZCN mint, including the tickets from the authorizers. 5. The WZCN mint verifies the validity of the tickets, checking the signatures and the nonce values. If the transaction is valid, the WZCN mint creates new WZCN for the client. 5.3 Handling failures It is possible that the process could break down due to a failure at one step. For instance, the client machine might crash, and be unable to collect the proof-ofburn tickets. In these cases, it is possible for a client to request replacement tickets from the authorizers. The client must provide the transaction burning the ZCN or WZCN. The authorizers must verify that the transaction has been accepted, and then can re-issue the tickets. Note that the authorizers are not responsible for tracking the nonces or the funds, instead relying on the Züs miners or the Ethereum smart contract to do that work. 5.4 Paying/Punishing the Authorizers Each authorizer must stake some amount of both ether and ZCN to join the set of authorizers. The details of how authorizers are selected falls within the purview of our governance protocol [7]. Whenever a proof-of-WZCN-burn ticket from the authorizer is included in a ZCN mint transaction, the authorizers are rewarded with transaction fees 29 "paid in ZCN. The amount of these fees are likewise selected by the governance protocol, and not negotiable by either the authorizers or the clients. Similarly, when a proof-of-ZCN-burn ticket from the authorizer is included in a call to the WZCN smart contract, the authorizers are paid in ether by the WZCN mint smart contract. Again, the amount of these transaction fees are specified in the smart contract and non-negotiable. References [1] Rainbow bridge cli, rainbow-bridge/. 2020. URL: https://github.com/near/ [2] 0dns github repo. https://github.com/0chain/0dns. [3] Thomas H. Austin, Paul Merrill, Siva Dirisala, and Saswata Basu. 0chain storage protocol, 2018. [4] Ethereum wiki: Modified merkle patricia trie specification. https://eth. wiki/en/fundamentals/patricia-tree. [5] Timo Hanke, Mahnush Movahedi, and Dominic Williams. DFINITY technology overview series, consensus system. CoRR, abs/1805.04548, 2018. URL: http://arxiv.org/abs/1805.04548, arXiv:1805.04548. [6] Daniel Larimer. Delegated proof-of-stake (dpos), 2014. [7] Paul Merrill, Thomas H. Austin, Justin Rietz, and Jon Pearce. Ping-pong governance: token locking for enabling blockchain self-governance. In International Conference on Mathematical Research for Blockchain Economy (MARBLE), 2019. [8] Paul Merrill, Thomas H. Austin, Jenil Thakker, Younghee Park, and Justin Rietz. Lock and load: A model for free blockchain transactions through token locking. In IEEE International Conference on Decentralized Applications and Infrastructures (DAPPCON). IEEE, 2019. [9] Torben Pryds Pedersen. Non-interactive and information-theoretic secure verifiable secret sharing. In Joan Feigenbaum, editor, Advances in Cryptology — CRYPTO '91, pages 129-140, Berlin, Heidelberg, 1992. Springer Berlin Heidelberg. [10] Rocksdb: A persistent key-value store for fast storage environments. https://rocksdb.org/. [11] S. Sharmila Deva Selvi, Arinjita Paul, Siva Dirisala, Saswata Basu, and C. Pandu Rangan. Sharing of encrypted files in blockchain made simpler. pages 45-60. Springer, 2020. [12] Fabian Vogelsteller and Vitalik Buterin. Eip-20: Erc-20 token standard, 2015. URL: https://eips.ethereum.org/EIPS/eip-20. 30 "