Züs - Whitepapers

Storage on the 0Chain Network: the Blockchain Observable Storage System Paul Merrill and Thomas H. Austin December 8, 2018 Abstract Since its creation by Satoshi Nakomoto, the blockchain has revolutionized cryptocurrencies. The blockchain provides a means of establishing consensus, so that a group of machines can act as a trusted third party without trusting any specific machine. The blockchain is the backbone for many decentralized applications (dApps), providing many services that were traditionally only offered by large, centralized institutions. However, the development of dApps is often held back by the limitations of the blockchain; many mining consensus algorithms are slow, and adding additional functionality to the blockchain can slow it further. 0Chain seeks to provide a blockchain that provides rapid consensus, and yet one that still offers the functionality needed for the development of powerful dApps. This paper seeks to address that tension of concerns for storage. By offloading most of the storage work to a different group of machines than the miners, the miners can focus on rapid consensus. The storage providers, knows as blobbers, provide proof that they are storing the correct contents by writing a Merkle root to the blockchain. Special signed receipts called markers ensure that the blobbers are rewarded for any work that they do; a challenge protocol verifies that the blobbers continue to store the data. 1 Introduction Bitcoin [Nak09] introduced the world to decentralized digital currency. It relies on the blockchain that provides decentralized consensus while guaranteeing both immutability and finality of transactions. Critically, the protocol is designed so that each actor's economic incentives are aligned with the proper behavior according to the protocol. Since its advent, many have realized that the blockchain technology that Bitcoin is built on can be useful in other domains. 0Chain promises a blockchainbased architecture for distributed storage and computation. By owning 0Chain tokens, clients can build distributed applications (dApps) without relying on a single source for providing their storage. Additionally, in the 0Chain economy, 1 "tokens are not consumed when employed, allowing 0Chain clients to continue to use these tokens indefinitely. As part of its architecture, 0Chain offers data storage, which is the focus of this paper. A key distinction of our system from other blockchain storage solutions is that we divorce the role of mining from that of providing storage. Computers that provide storage are referred to as blobbers. Blobbers are neither responsible nor required for mining. In this manner, we lighten the load on our mining network and enable fast transactions on a lightweight blockchain. As the client and blobber interact, the client generates special signed receipts called markers. These markers act like checks that the blobber can later cash in with the blockchain. Once the interaction between client and blobber has concluded, the blobber writes an additional transaction to the blockchain, which redeems the markers for 0Chain tokens and commits the blobber to a Merkle root [Mer87] matching the data stored. The leaves of the Merkle tree must match markers sent from the client, preventing either the client or the blobber from defrauding each other. After a file has been stored, a challenge protocol ensures both that the blobber continues to store the file and continues to be paid for that work. The mining network posts a transaction challenging the blobber to prove that it still possesses the data that it was paid to store. The blobber must provide that data, the relevant system metadata, and the client-signed marker to prove that the right data is stored. The blobber is then rewarded or punished accordingly. With our design, the majority of the work between clients and blobbers happens off-chain. The mining network is only involved enough to ensure that clients pay blobbers for their work and that the blobbers are doing the work that they have been paid to do. This version of our storage solution involves some significant departures from our previous version [AMDB18], largely due to a revised token reward protocol [MA18]. Previously, our design had relied on generating tokens to reward different service providers [RB18]. To trigger these rewards, clients locked tokens at different rates dictated by the 0Chain network. However, configuring these locking rates proved to be problematic. In our new model, clients lock tokens to directly generate token rewards for service providers, allowing the client and blobber to directly negotiate rates without following any dictatesfrom-on-high by the 0Chain network. The end result is a much more flexible and adaptable system. We encourage our readers to first review our revised token reward protocol [MA18] to help understanding of this paper. As a consequence of this change in token rewards, certain attacks are no longer a serious threat. Allowing clients to directly select their blobbers is no longer problematic. We have also taken the opportunity to add in some other improvements, such as reducing the number of required transactions and giving clients more flexibility in the permissions for their data. 2 "2 BOSS Overview Our system is named the Blockchain Observable Storage System (BOSS), emphasizing that the blockchain is able to provide evidence allowing the storage providers' work to be validated. A key distinction of our system from other blockchain storage solutions is that we divorce the role of mining from that of providing storage. Blobbers are neither responsible nor required for mining. In this manner, we lighten the load on our mining network and enable fast transactions on a lightweight blockchain. Our design assumes that the client is using erasure codes to ensure greater resiliency. While this is not a strict requirement, it does enable a client to recover if a blobber proves to be unreliable. From the perspective of the blobbers, they neither know nor care if the data is erasure coded. In our discussion, we will mention erasure codes only as far as it affects any potential attacks that could arise. First, the client and blobber negotiate the price and terms of service, establishing a read price and write price that determines the tokens required per gigabyte of data read or written, respectively. At the end of their interaction, the client has an agreement1 signed by both the client and blobber. The client writes a transaction to the blockchain committing both the client and the blobber to their agreement. The client locks tokens, giving the interest to a blobber reward pool earmarked specifically for the blobber. The blobber will not provide service if the corresponding pool has insufficient funds. Since no other entity has access to the tokens in this pool, the blobber may cache its information about the pool, avoiding the need to check the blockchain for the balance of the pool. To upload a file, a client contacts the blobber and uploads a file, including a signed write marker that authorizes the blobber to invoke the storage smart contract to transfer tokens from the blobber reward pool to a challenge pool; when the blobber later proves that it is providing storage, the tokens are paid from the challenge pool to the blobber. This transaction also serves to commit the blobber to the storage contents, so that the act of arranging payment also makes it possible to catch any cheating on the blobber's part. File downloads work similarly: the client sends a signed read marker to a blobber as part of their request for data. The read marker authorizes the blobber to draw tokens from the blobber reward pool. Unlike with writes, the blobber receives the pay for providing reads immediately. An additional difference is that the client reading the data might not be the owner of the data. The mining network will periodically challenge the blobber to provide randomly chosen blocks of that data. These challenges involve a carrot and stick approach; the blobber is punished if they fail the challenge, but receive their token rewards when they pass the challenge. 1 Contract would be a more appropriate term, but we wish to avoid confusion with smart contracts. 3 "3 Related Work and Relevant Attacks Filecoin [Fil17] discusses many interesting attacks that can arise in blockchainbased storage systems, such as Sybil attacks, outsourcing attacks, and generation attacks. We provide an overview of these attacks to aid discussion on our protocols. An outsourcing attack arises when a blobber claims to store data without actually doing so. The attacker's goal in this case is to be paid for providing more storage than is actually available. For example, if Alice is a blobber paid to store a file, but she knows that Bob is also storing it, she might simply forward any file requests she receives to Bob. Since the cheater must pay the other blobbers for the data, this attack is not likely to be profitable, particularly if erasure codes are used. For instance, if the erasure code settings were 10 by 16, then to be paid for a single read, the cheater would need to pay 10 other blobbers for their shares of the data. Another attack may occur if two blobbers collude, both claiming to store a copy of the same file. For example, Alice and Bob might both be paid to store file123 and file456. However, Alice might offer to store file123 and provide it to Bob on request, as long as Bob provides her with file456. In this manner, they may free up storage to make additional tokens. In essence, collusion attacks are outsourcing attacks that happen using back-channels. A Sybil attack in the context of storage is a form of collusion attack where Alice pretends to be both herself and Bob. The concerns are similar, but the friction in coordinating multiple partners goes away. Finally, generation attacks may arise if a blobber poses as a client to store data that can be regenerated easily or that they know will never be requested. By doing so, they hope to be paid for storing this data without actually needing the resources to do so. (Filecoin's authors are concerned with miners exploiting this approach to increase their voting power on the blockchain, which is not a concern in our design.) Our primary defense against generation attacks is our token reward protocol [MA18]; essentially, blobbers would end up paying themselves for their work. By the design of our token reward protocol, they could lock tokens and pay themselves anyway, so this attack gives them no advantage. Our challenge protocol, discussed in Section 8, offers further protection. Essentially, blobbers are periodically required to provide the files that they store. Moreover, the mechanism for challenging them is also the mechanism that pays them; if a blobber found a way to avoid being challenged, they would never be paid. Previous work has also explored storage systems and the blockchain. In Filecoin [Fil17], miners prove that they are storing data through special proofs-ofspacetime [MO16], which are themselves based on proofs-of-replication [BDG17]. Essentially, Filecoin miners store the data in an encrypted format using a modified form of cipher-block-chaining. This design deliberately introduces latency to their “Storage Market” network, preventing cheating miners from being able to produce the data in time. A second “Retrieval Market” relies on a gossip protocol to provide the needed data off-chain. 4 "Both Sia [VC14] and Storj [WBB+ 16] provide decentralized storage in peerto-peer networks. These systems rely on Merkle trees made of the hashes of file contents, allowing them to (probabilistically) verify that the storage provider is actually storing the data that they claim to store without verifying the entire file. They guarantee the reliability of their systems by using erasure codes to divide data between the data storage providers. MaidSafe ties its currency to data, paying data providers whenever data is requested. Its proof-of-resource [LMI15] mechanism uses zero knowledge proofs to verify that data is actually stored. Unfortunately, details about the proof-ofresource design are scant, making it difficult to evaluate. Miller at al. [MJS+ 14] discuss how to reduce Bitcoin's proof-of-work overhead with a proof-of-retrievability solution, which they use in a modified version of Bitcoin called Permacoin. Essentially, Permacoin miners prove that they are storing some portion of archival data. Since the proof itself provides part of the data, miners inherently serve as backup storage. In contrast, with our approach the storage providers do not serve as miners, and hence can specialize their machines for storage. Ali et al. [ANSF16] show how a distributed public key infrastructure (PKI) can be built on top of a blockchain. Their initial experiments use Namecoin [Nam]. Due to concerns about the low number of miners in the Namecoin network, they migrate their system to the Bitcoin network. Their Blockstack naming and storage system relies on a virtualchain to introduce new functionality without changing the underlying blockchain. While their approach is similar to ours in that storage is done off-chain, there is much less focus on ensuring that the storage providers are actually providing the data. Other approaches, such as Burst's proof-of-capacity (PoC) [GvAS17] and SpaceMint's proof-of-storage [PPK+ 15] use storage for their consensus algorithms. The data itself is not intended to be useful outside of consensus, though Burst has plans to store “dual-use” data in their PoC3 consensus algorithm [GvAS17]. Verifying that a storage provider is actually providing the storage remains a challenging problem. While hash functions can be used to verify the data, transmitting the entire file contents from the provider to the verifier is undesirable, especially for large files. Merkle trees [Mer87] address this issue: only a single block and log n hashes need to be sent across the network. Juels and Kaliski [JJ07] build on this approach with their proofs-of-retrievability (POR) for archival data. PORs embed special sentinel values in the data (hidden by encryption) which the verifier can use to probabilistically verify the data storage. Shi et al. [SSP13] develop a dynamic POR scheme, meaning that instead of storing static, archival data, the data with this scheme can be updated. Shacham and Waters [SW13] discuss how to make these proofs more compact. Provable data proofs (PDPs) [ABC+ 07] are an alternate strategy for verifying data storage. The client maintains secret challenges and responses, which can be kept to a constant amount through the use of homomorphic encryption. As with PORs, work has been done to support dynamic data [EKPT15] and to improve efficiency [APMT08]. 5 "The reliance on secret data to verify storage makes many of these approaches unsuitable for our system. Since the blockchain has no support for secret data, and neither the client nor the blobber can be trusted, there is no way for the validators to determine whether the blobber is cheating or the client lied about the data being stored. However, there are publicly verifiable [ABC+ 07, SSP13] variants of both PORs and PDPs that seek to avoid this limitation. 4 File System Our protocol is agnostic regarding how the blobber stores data. We review our file system, but note that clients and blobbers could agree to use other systems. Our architecture is similar to Git [Git]. In the case of any errors between the client and blobber, following Git's structure facilitates negotiation of the corrections between the two parties. Also, if a file is updated while a client still happens to be reading from it, the blobber is able to easily provide the original file without interruption. (Chacon [Cha08] provides an excellent overview of Git, and may be helpful for understanding the design of our system.) We assume that a client uses erasure coding to provide greater resiliency for their data. Entities in the file system include directories, files, and fragments; a file's metadata contains information about the original file, while the corresponding fragment's metadata includes details about the blobber's share of the original file. Directories are named by the hash of their content. A directory stores a list of the files it contains; for each file it contains: • type, either “file” or “directory”. • name, the name of the file in the directory. • hash, the cryptographic hash of the file contents; this hash identifies the corresponding metadata file or file contents on the system. For files, additional information is stored about the fragment: • merkle root, the Merkle root [Mer87] of the fragment. • size, the size of the fragment in bytes. In addition, the blobber must keep special write markers, which are signed commitments from the client as to the contents of the file system. The client can request the write marker in order to verify the file system contents, freeing the client from needing to store any data on their own system. Write markers also serve to track version information, and to authorize payment to the blobber. Write markers are discussed in more detail in Section 6.3. 6 "A note on block-level storage. Our previous storage paper [AMDB18] mentioned both block-level and file-level storage. Both approaches have merit, but we have elected to emphasize file-level storage in this paper, since our public release will rely on that approach. However, we note that nothing in this protocol strictly requires file-level storage. If desired, the client and blobber can use block storage, and treat the storage as a single file from the perspective of the blockchain. This highlights a strength of our design – as long as the client and blobber are able to meet the public dictates of the protocol, they are free to innovate in the actual storage design and in their interaction. 5 Storage Agreement Negotiation The client and blobber must negotiate a price for writes and a price for reads, both in terms of tokens/gigabyte of data. Other criteria may be negotiated between the client and blobber as needed, allowing the blockchain to serve as a public record of their agreement. Once terms have been established, the client writes a transaction to the blockchain with the terms of their agreement. We refer to this transaction as the storage agreement transaction. This transaction includes: • The id of the client (client id). • The id of the blobber (blobber id). • The allocation id identifying this storage allocation, referring to the data that the blobber stores for the client. This globally unique ID is a function of client id, blobber id, and a timestamp. • The tokens of reward paid to the miner per gigabyte read (read price). • The tokens of reward paid to the miner per gigabyte uploaded (write price). • A params field for any additional requirements. • The signatures of both the client and blobber. • Offer expiration time, to ensure that the client does not invoke an old agreement that is no longer profitable for the blobber. • Storage duration, determining how long the blobber needs to provide storage. After this period has elapsed, the blobber no longer needs to store the client's files; of course, the client and blobber can negotiate to extend the storage period2 . 2 From the perspective of the blockchain, the renewal is treated as a completely new agreement – no special support is needed. The client can generate a write marker (discussed in Section 6.3) to pay the blobber for files that the blobber is already storing. 7 "This transaction also initializes a read counter and write counter for the client and blobber to use in their interactions, both initially set to 0. These values increase with each transaction depending on the amount of data uploaded or downloaded. By calculating the last counter value with the new counter value, the amount of reward the blobber has earned is determined easily. Section 6 and Section 7 review these counters' utility in rewarding the blobber on behalf of the client. This transaction also creates two new pools: 1. The blobber reward pool, containing the interest that the client generated as the rewards for the blobber to store and serve data. 2. The challenge pool; when the blobber verifies that it is storing the data, it may receive some portion of the reward stored in this pool. When the funds in the blobber reward pool are depleted, the client may lock additional tokens to add funds to them. The challenge pool is initially empty, but gains tokens with every write that the client does. (Reads, in contrast, are paid to the blobber directly.) 6 Pay for Writes In the storage contract transaction, the client locks tokens and pays the interest to the blobber reward pool. These tokens represent the blobber's pay for storing the client's data. (A portion of these funds are allocated for the validators, discussed in Section 8.) Blobbers are paid for every file uploaded, and they are expected to store the files until the end of the contract period negotiated with the client. (A client can elect to delete files stored with the blobber, but does not receive any refund for doing so). Note that they are not paid immediately. The funds are set aside in the challenge pool; the blobber receives tokens from this pool upon satisfying a challenge to prove that they are actually storing the data. Figure 1 gives an overview of the process for a client uploading file to a blobber. The steps are as follows: 1. Before uploading data to the blobber, the client must commit tokens to the blobber reward pool. 2. The client and blobber set up a secure connection. At the time of this writing, our implementation relies on HTTPS. 3. The client transfers files and the corresponding metadata. This step may be repeated until all files have been uploaded. 4. The client uploads a signed write marker, which serves as the client's commitment to the file system contents. 8 "Figure 1: Storage Process 5. The blobber calls the storage smart contract to claim its reward. This call commits the blobber to the file system contents, and also stakes some of the blobber's tokens to guarantee its performance. 6. A portion of the funds in the blobber reward pool are transferred to the challenge pool, according to the write price specified in the agreement. 7. Some time later, the blobber is challenged to prove that the files are stored correctly. If the challenge is successful, the blobber receives a portion of the tokens in the challenge pool. This process is repeated until the end of the storage agreement. Section 8 reviews the challenge protocol in detail. A key property of the BOSS design is that the blobber's mechanism to get paid (by writing a transaction in step 5) also commits the blobber to the contents of the file system; the challenge protocol can then detect if the files are not actually stored on the system. The write marker is another important part of our design. It contains a hash of the root of the file system, and hence the client's signature serves as the client's endorsement of the file system contents. Therefore, even though the blobber and client do not necessarily trust each other, their agreement to the file system's expected contents is publicly verifiable. The format of the write marker is discussed in more detail in Section 6.3. 6.1 Blobber Reward Pool Step 1 in Figure 1 allows the client to allocate funds for storing or reading data from the blobber. Typically, this step is done during the contract negotiation, 9 "outlined in Section 5, though the client may decide to lock additional tokens to replenish this pool when it has been depleted. The blobber can verify the funds in this pool, and may refuse to accept data if the client does not have sufficient tokens locked in this pool. Note that the blobber is not directly paid for writes by this pool. Instead, the funds are payable to the challenge pool, discussed in Section 6.2. The blobber will receive any funds sent to this pool, but must periodically prove possession of the file system contents to earn these tokens. It is possible that the tokens in the blobber reward pool are not used by the client before the negotiated end of the terms of service. Once the terms of service have expired, any tokens remaining in this pool are transferred to the client. The client may also write a special cancellation transaction to end the terms of the contract early; the blobber reward pool is then emptied out, with all tokens transferred to the client. This mechanism gives the client an escape in case the blobber fails to live up to the terms of service. We note that the client could abuse this cancellation mechanism in order to cheat the blobber, though the attack is unlikely to be very profitable for storage. A blobber does not commit the client's uploaded file contents until payment has been accepted by the blockchain. On reads, cancellation would mean that the data could no longer be read, making the benefit of this attack very questionable. Nonetheless, we anticipate that our design may be useful for other services. We address the timing issues that arise by delaying the unlocking of funds for 24 hours. In that time, the blobber may redeem any outstanding markers for its rewards as per normal operation. 6.2 Challenge Pool When the blobber writes a transaction (in step 5 of Figure 1), funds are transferred from the blobber reward pool to the challenge pool. The amount of tokens transferred is dictated by the write marker, which includes the amount of data uploaded, and the negotiated write price stored in the agreement. Funds in this pool are paid out to the blobber during the challenge protocol, detailed in Section 8. Once the terms of service have elapsed, and the blobber is no longer required to store the data, any remaining funds are paid to the blobber. (Note, however, that any funds connected to unsatisfied challenges are not released; this design prevents a blobber from “running out the clock” to avoid passing a challenge). A small portion of the tokens in the challenge reward pool is set aside for the validators, a separate group of machines who verify that the blobber is providing the storage that it claims to be providing. The transfer to the challenge pool also stakes some of the blobber's own tokens. The blobber earns the interest on the staked tokens, but runs the risk that the staked tokens could be burned if any challenges are failed. 10 "6.3 Write Markers Write markers serve as the client's commitment to the file system contents, the blobber's proof of storing data required by the client, and a “cashier's check” which the blobber can redeem to get paid. A write marker contains: • The client id, blobber id, and allocation id. • A timestamp, to ensure freshness. • file root, the hash of the root directory on the file system. • prev file root, used for versioning. • write counter, which serves to give the total size to date of the data that the client has uploaded to the blobber for this allocation. • The signature of the client owner verifying the other fields in this marker. The client will not send any additional write markers until the latest write marker has been committed to the blockchain. We note that the client might refuse to upload the write marker after uploading data. In this case, the blobber will simply discard the data after a timeout period. 6.4 Levels of Trust The protocol that we have detailed assumes no trust between the client and blobber. However, if the client and blobber have an established relationship they might elect to use a more relaxed trust mode. Normally, a client will not send additional write markers until the blobber has redeemed the previous write marker. However, the client can relax this requirement and send write markers before older markers have been redeemed, which allows for more graceful recovery should the client crash while uploading data. This approach risks that the blobber then delay redeeming markers until near the agreement's expiration, in the hopes of avoiding being identified as a cheater. Gradations of this trust might be used as well; for instance, a client might send multiple write markers during an active session, but not start a new session until all markers from the previous session have been redeemed. In a yet more relaxed model, the client might avoid validation of storage altogether. With this approach, the funds in the blobber reward pool are made directly payable to the blobber, and the challenge protocol is not performed. The advantage of this setup is that no funds need to be allocated for the validators (discussed in Section 8), and hence the blobber might offer the service more cheaply. However, if the blobber fails to live up to their terms, the client has no recourse. The blobber's cheating is still publicly verifiable, but with no incentive for the validators and no staked funds from the blobber, the blobber will suffer no consequences. This trust model is only appropriate if the client and blobber have a well-established relationship. 11 "Figure 2: Reading from Blobbers 7 Pay for Reads The storage contract transaction includes a negotiated read price. However, unlike with writes, the client reading the data might not be the owner of the data. Figure 2 shows the process for a client reading previously uploaded data from the blobber. The steps are as follows: 1. Before uploading data to the blobber, the owner must commit tokens to the blobber reward pool. 2. The owner of the data gives the client reading the data an authorization ticket permitting the read. 3. The client and blobber set up a secure connection. 4. The client requests a file and sends a signed read marker. If the client is not the owner of the data, the client must also include a “ticket” authorizing the read. 5. The blobber responds with the requested file. (Step 4 and step 5 may be repeated for different files.) 6. The blobber calls the smart contract once the reads are completed, including the read marker and the ticket (if the client is not the owner). The smart contract validates the read marker and checks the authorization ticket (if required); if the read is valid, the smart contract authorizes the release of the owner's funds stored in the blobber reward pool. 12 "7. The smart contract releases tokens from the blobber reward pool and rewards the blobber for the work performed. In the request for data, the client must send a signed read marker, authorizing the blobber to draw rewards from the blobber reward pool. This process is similar to how writes are rewarded, except that the blobber is paid directly, rather than having those tokens transferred to a challenge pool. The format of read markers is detailed in Section 7.1. We note that a blobber might redeem the read markers without providing the data. However, the client will stop providing read markers if the blobber is not providing service. Our design permits data owners to pay for other clients to read their data; The authorization ticket (signed by the owner) grants the client the authority to read data, but also specifies any desired restrictions. The authorization tickets prevent a blobber from reading the data as a mechanism to drain the blobber reward pool. The design of these tickets is discussed in Section 7.2. 7.1 Read Markers Similar to write markers, read markers are used for clients to pay blobbers for their work off chain. The blobbers can redeem these markers on the blockchain. A read marker contains: • The client id of the client requesting the data. This field ensures that the client is authorized to read the data. • The client id of the owner of the data. • The blobber id and allocation id. • timestamp, to ensure freshness. • The path of the file being read, used to verify that the client is authorized to read the file. • The data block range of the blocks of the file that the client requests. • read counter, giving the size of all data read to data. • The client's signature of the other content of the read marker. Unlike with write markers, the blobber is not expected to store the read markers after redeeming them. Also, many read markers may be redeemed by cashing in the latest read marker. The smart contract will release funds corresponding to the difference between the read counter in the marker and the latest read counter redeemed on the blockchain. 13 "7.2 Authorization Tickets Authorization tickets allow the owner of data to share it with other clients, and to pay for their reads. It also specifies the restrictions on the client's ability to read data. The ticket includes: • The owner ID, blobber ID, client ID, and allocation ID • The expiration time (expiration) • The amount of data that can be downloaded (max data) • The path of files that the client can download (optional) • The owner's signature The storage smart contract only accepts read markers that match with the terms given by the authorization ticket. The expiration field ensures that the client does not use the ticket indefinitely; the smart contract will not accept read markers after this point. The max data field specifies the maximum data that can be downloaded during the specified period of time. Lastly, the owner can limit the client's access to a whitelisted subset of files. However, this last restriction depends on the blobber and client not colluding. Otherwise, the client can generate a read marker for a different file as a way of paying the blobber to provide restricted content. 8 Challenge protocol In order to verify that a blobber is actually storing the data that they claim, our protocol relies on the miners periodically issuing challenge requests to the blobbers. This mechanism is also how blobbers are rewarded for storing files, even if the files are not accessed by any clients. When the blobber passes the challenge, they receive newly minted tokens. The actual verification is done by a group of machines called the validators. The validators can be any group of machines, depending on what makes sense in the blockchain ecosystem. Validators are mutually untrusting. In this discussion, we will assume that the validators are a distinct group of machines from the miners and blobbers. At a high level, the challenge protocol involves three phases: 1. The mining network randomly selects the blobber data allocation to be challenged. This process also specifies the validators who will verify the challenge and provides a random seed to be used for the challenges. We refer to this stage as the challenge issuance. 2. In the justification phase, the blobber broadcasts the data to the validators along with the metadata needed to verify the challenge. 14 "3. Finally, in the judgment phase, the validators share their results. Once the validators have agreed on the results of the challenge, they write a transaction to the blockchain indicating whether the test passed. This transaction also pays the validators and rewards the blobber. Selecting validators is a particular challenge. In a cronyism attack, a blobber sends the data to a friendly validator who approves all challenges without validating the data. In an extortion attack, a validator demands additional compensation from the blobber in exchange for passing the challenge. We defend against these attacks by having the mining network randomly select a set of V validators3 . For a challenge to pass, at least N miners must approve the results of the challenge. The difference between these values must be narrow enough to make a successful cronyism attack unlikely, but high enough to prevent an extortion attack. An additional concern is that the validators actually do the verification work. We refer to a validator who does not do the work but who attempts to collect the reward as a freeloader. 8.1 Challenge Issuance Phase The mining network must initially post a transaction to the network randomly challenging a blobber providing storage. This transaction must include: • The allocation of data challenged, identified by allocation id. Note that this should implicitly identify which blobber is challenged. • The list of eligible validators. • A random seed, which determines the indices of the data blocks in that allocation that the blobber must provide. • The latest write marker at the time of the challenge. Conceptually, we can envision this challenge as a roulette wheel, where the number of tokens currently due to the blobber from its challenge pool dictates its number of slices on the wheel. (An alternate approach would be to use the size of the data allocation instead, but this can lead to a subtle attack. A blobber could post an agreement for a negligible price with itself as the client, and then commit to storing large amounts of easily regenerated data. With a commitment to a large enough amount of data, other blobbers would be challenged only with a low probability. By tying the probability of being challenged to the amount of tokens in the challenge pool, we make this attack prohibitively expensive to carry out.) The initial transaction essentially locks a portion of the blobber's stake and reward in a sub-pool specific to this challenge. We follow a “guilty until proven 3 Specifically, the mining network use a random beacon process [HMW18] to ensure that no single miner can unduly influence the random number selection. 15 "innocent” approach, which prevents a blobber from attempting a denial-ofservice attack against a validator in order to avoid punishment. If the blobber never satisfies the challenge, the tokens are effectively burned. 8.2 Justification Phase When the blobber observes the challenge issuance on the blockchain, it broadcasts its proof of storage to the validators with • The file system metadata. • The write marker proving that file system contents match what is stored on the blockchain. • The challenged blocks of data, chosen pseudorandomly using the miner's random seed. • The Merkle paths of those data blocks. Once the validators receive the blobber's data, they each verify the data that they have been sent. 1. The validator verifies that • The file system metadata is valid. • The file system root hash matches the write marker. • The write marker matches the most recent commitment to the blockchain. At this point, the validator has established that the blobber's metadata is valid and matches the blockchain. 2. The validator then calculates the number of blocks on the system from the allocation size. 3. Using the random seed, the validator verifies that the blobber's blocks correspond with the pseudorandom sequence. (This serves to make every block of data on the system equally likely to be challenged, and ensures that the blobber did not try to game the results). 4. For each data block, the blobber verifies that the Merkle path matches up to the file metadata. As part of this process, the validator stores the two penultimate hashes of the Merkle tree; that is, it stores the two hashes that can be hashed together to give the file's Merkle root. We refer to these hashes as the validation proof. At most one of the hashes in the validation proof should have been provided by the blobber. (To ensure this behavior, the inclusion of additional hashes on the Merkle path is an automatic failure.) Therefore, the validator must have done the work to calculate at least one of the two hashes. This validation proof can be verified easily by the other validators. These proofs are an important part of our defense against freeloaders. 16 "8.3 Judgment Phase After the validator has completed its work, it broadcasts the signed hash of its results. We refer to this signed hash as the pre-commit. The hash format is: H = hash(validationP roof List, R) where validationP roof List is a list of the hash pairs serving as validation proofs for each file, and R is a randomly chosen nonce selected by the validator. The validator then waits to collect the pre-commits for the other validators. Once the timeout period has been reached, it broadcasts its validP roof List and its R value to publish its results. No additional pre-commits are accepted at this point. (If less than the minimum number of required signatures is received, it will rebroadcast and wait again). The validator accepts the signatures of all other validators with valid proofs. provided that the other validators submitted valid pre-commits. Since the results are not publicly observable until after the results are completed, a freeloader is not able to provide a valid pre-commit. Each validator submits a transaction to the blockchain with its results. The smart contract accepts the first transaction it receives, and only the first. At this point, the blobber receives its reward and the validators receive payment for their work. The payout amount is pro-rated to match the total payout and the length of the contract. For instance, if blobber Bob's challenge pool contains 12 tokens from Alice for her storage paid over a contract period of 90 days, and the first challenge happens at day 45, Bob will receive 6 tokens for passing the challenge. If Bob is again challenged at day 60, he will receive an additional 2 tokens. On day 90, he will receive the remaining balance of 4 tokens. The validators are paid in a pro-rated manner similar to how the blobber is rewarded. An equal portion of the reward is set aside for every validator, even those that did not participate in the validation. However, the rewards are only distributed to validators who participated in the validation process; the reward for non-participating validators is burned. This design ensures that validators have no incentive to exclude each other, but have a strong incentive to perform the validation work. 8.4 Challenge Failures With the challenge protocol, we are concerned that blobbers could be the victim of a denial-of-service attack when they are challenged. An attacker interested in damaging the network could target challenged blobbers, potentially destroying the blobbers' rewards and staked tokens, until no blobbers are willing to provide the service. As a result, there is no time-out period for challenges. The blobber could contact the validators to complete the challenge at any time. This choice creates a misalignment of incentives, since validators are only paid on successful challenges, and hence might be motivated to collude with a cheating blobber. 17 "To address this concern, we allow blobbers to broadcast a signed confession indicating that they are unable to satisfy the challenge. The validators can then write this message to the blockchain and receive their rewards for their validation work. This transaction releases a portion of the blobber's stake back to them. The client owning the data then receives back the token rewards and the rest of the challenged blobber's stake. With this design, a blobber caught cheating that acts in its own best interests will reward the other parties in the system. 9 Conclusion and Future Work Our current design for BOSS assumes that there is only a single owner of data. In future work, we intend to expand our design to include support for shared ownership of data, an area of particular interest for enterprise customers. We are also in the process of developing a TLA+ spec to formally verify safety and liveness invariants of our storage system. Finally, we intend to explore how our model of locking tokens can be used to incentivize service providers in other areas. References [ABC+ 07] Giuseppe Ateniese, Randal C. Burns, Reza Curtmola, Joseph Herring, Lea Kissner, Zachary N. J. Peterson, and Dawn Xiaodong Song. Provable data possession at untrusted stores. In Conference on Computer and Communications Security, CCS 2007, Alexandria, Virginia, USA, October 28-31, 2007. ACM, 2007. [AMDB18] Thomas H. Austin, Paul Merrill, Siva Dirisala, and Saswata Basu. 0chain storage protocol, 2018. [ANSF16] Muneeb Ali, Jude C. Nelson, Ryan Shea, and Michael J. Freedman. Blockstack: A global naming and storage system secured by blockchains. In USENIX Annual Technical Conference. USENIX Association, 2016. [APMT08] Giuseppe Ateniese, Roberto Di Pietro, Luigi V. Mancini, and Gene Tsudik. Scalable and efficient provable data possession. IACR Cryptology ePrint Archive, 2008, 2008. [BDG17] Juan Benet, David Dalrymple, and Nicola Greco. Proof of replication. Technical report, Protocol Labs, July 2017. [Cha08] Scott Chacon. Git Internals: Source code control and beyond. PeepCode Press, 2008. 18 "[EKPT15] C. Christopher Erway, Alptekin Küpçü, Charalampos Papamanthou, and Roberto Tamassia. Dynamic provable data possession. ACM Trans. Inf. Syst. Secur., 17(4), 2015. [Fil17] Filecoin: A decentralized storage network. Technical report, Protocol Labs, August 2017. [Git] Git – fast version control. https://git-scm.com/, accessed August, 2018. [GvAS17] Seán Gauld, Franz von Ancoina, and Robert Stadler. The burst dymaxion: An arbitrary scalable, energy efficient and anonymous transaction network based on colored tangle. https://dymaxion. burst.cryptoguru.org/The-Burst-Dymaxion-1.00.pdf, 2017. [HMW18] Timo Hanke, Mahnush Movahedi, and Dominic Williams. DFINITY technology overview series, consensus system. CoRR, abs/1805.04548, 2018. [JJ07] Ari Juels and Burton S. Kaliski Jr. Pors: proofs of retrievability for large files. In Conference on Computer and Communications Security, CCS 2007, Alexandria, Virginia, USA, October 28-31, 2007. ACM, 2007. [LMI15] Nick Lambert, Qi Ma, and David Irvine. Safecoin: The decentralised network token. https://docs.maidsafe.net/Whitepapers/pdf/ Safecoin.pdf, 2015. [MA18] Paul Merrill and Thomas H. Austin. 0chain token reward protocol, 2018. [Mer87] Ralph C. Merkle. A digital signature based on a conventional encryption function. In Advances in Cryptology - CRYPTO, A Conference on the Theory and Applications of Cryptographic Techniques. Springer, 1987. [MJS+ 14] Andrew Miller, Ari Juels, Elaine Shi, Bryan Parno, and Jonathan Katz. Permacoin: Repurposing bitcoin work for data preservation. In Symposium on Security and Privacy. IEEE Computer Society, 2014. [MO16] Tal Moran and Ilan Orlov. Proofs of space-time and rational proofs of storage. IACR Cryptology ePrint Archive, 2016, 2016. [Nak09] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2009. [Nam] Namecoin homepage. https://namecoin.org/, accessed August 2018. 19 "[PPK+ 15] Sunoo Park, Krzysztof Pietrzak, Albert Kwon, Joël Alwen, Georg Fuchsbauer, and Peter Gazi. Spacemint: A cryptocurrency based on proofs of space. IACR Cryptology ePrint Archive, 2015, 2015. [RB18] Justin Rietz and Saswata Basu. 0chain economic protocol, 2018. [SSP13] Elaine Shi, Emil Stefanov, and Charalampos Papamanthou. Practical dynamic proofs of retrievability. In Conference on Computer and Communications Security, CCS'13, Berlin, Germany, November 4-8, 2013. ACM, 2013. [SW13] Hovav Shacham and Brent Waters. Compact proofs of retrievability. J. Cryptology, 26(3), 2013. [VC14] David Vorick and Luck Champine. Sia: Simple decentralized storage. Technical report, Nebulous Inc., November 2014. [WBB+ 16] Shawn Wilkinson, Tome Boshevski, Josh Brandoff, James Prestwich, Gordon Hall, Patrick Gerbes, Philip Hutchins, and Chris Pollard. Storj: A peer-to-peer cloud storage network. Technical report, Storj Labs Inc., December 2016. 20 "