What is Filecoin, How it Works? Understanding Technical & Economic Aspects of Filecoin.
Written by: Vasa (Vaibhav Saini)
An In-depth Study on Filecoin Protocols
Filecoin introduces the concept of a decentralized storage network (DSN). DSN is a scheme that describes a network of independent clients and storage providers. DSN aggregates storage offered by multiple independent storage providers and self-coordinate to provide data storage and data retrieval to clients. Coordination is decentralized and does not require trusted parties: the secure operation of these systems is achieved through protocols that coordinate and verify operations carried out by individual parties. DSNs can employ different strategies for coordination, including Byzantine Agreement, gossip protocols, or CRDTs, depending on the requirements of the system.
DSN involves the implementation of 3 functions: put, get, and manage. Put allows clients to store data under unique identifiers. Get allows clients to retrieve data using the identifier. Manage orchestrates the network by measuring space available for rent, auditing providers, and repairing possible data faults. The Manage protocol is run by storage providers often in conjunction with clients or a network of auditors(this involves Byzantine Faults which are discussed below).
DSN has several properties. The first 2 are essentially required.
Data integrity means that the client always receives the same data that he puts to storage, and storage providers can’t convince a client to take faulty data.
Retrievability simply means that the client will be able to retrieve his data over time.
Optional Properties of a DSN:
Public verifiability allows everyone on the network to verify that the data is being stored without knowing the data itself.
Auditability allows verifying that the data was stored for the right duration of time.
Incentive-compatibility strives to reward good service providers and punish bad ones.
Achieving Confidentiality: Clients that desire for their data to be stored privately, must encrypt their data before submitting them to the network.
There are 2 types of possible faults which a DSN should tolerate:
Management faults: These faults are byzantine faults caused by participants(storage providers, clients & auditors) in the Manage protocol. A DSN scheme relies on the fault tolerance of its underlining Manage protocol. Violations on the faults tolerance assumptions for management faults can compromise liveness and safety of the system. For example, consider a DSN scheme, where the Manage protocol requires Byzantine Agreement(as the nodes can lie about their audits) to audit storage providers(if they are storing all the data that they are supposed to store according to the agreed conditions). In such protocol, the network receives proofs of storage from storage providers and runs Byzantine Agreement(BA) to agree on the validity of these proofs. If the BA tolerates up to f faults out of n total nodes, then our DSN can tolerate f < n/2 faulty nodes. On violations of these assumptions, audits can be compromised, thus rendering the whole system useless.
Storage Faults: Storage faults are byzantine faults that prevent clients from retrieving the data: i.e. Storage Miners lose their pieces, Retrieval Miners stop serving pieces. A successful Put execution is (f,m)-tolerant if it results in its input data being stored in m independent storage providers (out of n total) and it can tolerate up to f byzantine providers. The parameters f and m depend on protocol implementation; protocol designers can fix f and m or leave the choice to the user, extending Put(data) into Put(data,f,m). A Get execution on stored data is successful if there are fewer than f faulty storage providers. For example, consider a simple scheme, where thePutprotocol is designed such that each storage provider stores all of the data. In this scheme m=n and f=m−1. Is it always f=m−1? No, some schemes can be designed using erasure coding, where each storage providers store a special portion of the data, such that x out of m storage providers are required to retrieve the data; in this case f=m−x.
The Filecoin DSN protocol can be implemented on top of any consensus protocol that allows for verification of the Filecoin’s proofs. Proof-of-Work schemes often require solving puzzles whose solutions are not reusable or require a substantial amount wasteful computation to find.
Non-reusable Work: Most permissionless blockchains require miners to solve a hard computational puzzle, such as inverting a hash function. Often the solutions to these puzzles are useless and do not have any inherent value beyond securing the network. Some blockchains like Ethereum(executing smart contract logic) and Primecoin(finding new prime numbers) have tried to use some of the computational power to do useful work.
Wasteful Work: Solving hard puzzles can be really expensive in terms of the cost of machinery and energy consumed, especially if these puzzles solely rely on computational power. When the mining algorithm is embarrassingly parallel, then the prevalent factor to solve the puzzle is computational power.
Attempts to reduce waste: Ideally, the majority of a network’s resources should be spent on useful work. Some efforts require miners to use more energy-efficient solutions. For example, Spacemint requires miners to dedicate disk space rather than computation; while more energy efficient, these disks are still “wasted” since they are filled with random data. Other efforts replace hard to solve puzzles with a traditional byzantine agreement based on Proof-of-Stake, where stakeholders vote on the next block proportional to their share of currency in the system.
So, instead of wasteful Proof-of-Work computation, the work Filecoin miners do generating Proof-of-Spacetime is what allows them to participate in the consensus.
Useful work: We consider the work done by the miners in a consensus protocol to be useful, if the outcome of the computation is valuable to the network, beyond securing the blockchain.
Filecoin proposes a useful work consensus protocol, where the probability that the network elects a miner to create a new block (we refer to this as the voting power of the miner) is proportional to their storage currently in use in relation to the rest of the network. Filecoin protocol is designed such that miners would rather invest in storage than in computing power to parallelize the mining computation. Miners offer storage and re-use the computation for proof that data is being stored to participate in the consensus.
Power Fault Tolerance: In this technical report, Power Fault Tolerance is presented as an abstraction that re-frames byzantine faults in terms of participants’ influence over the outcome of the protocol. Every participant controls some power of which n is the total power in the network, and f is the fraction of power controlled by faulty or adversarial participants.
Power in Filecoin: In Filecoin, the power p of miner M at time t is the sum of the M’s storage assignments. The influence I of M is the fraction of M’s power over the total power in the network. In Filecoin, power has the following properties:
Public: The total amount of storage currently in use in the network is public. By reading the blockchain, anyone can calculate the storage assignments of each miner — hence anyone can calculate the power of each miner and the total amount of power at any point in time.
Publicly Verifiable: For each storage assignment, miners are required to generate Proofs-of-Spacetime, proving that the service is being provided. By reading the blockchain, anyone can verify if the power claimed by a miner is correct.
Variable: At any point in time, miners can add new storage to the network by pledging with a new sector and filling the sector. In this way, miners can change their amount of power they have through time.
We also need mechanisms to prevent three types of attacks that malicious miners could exploit to get rewarded for storage they are not providing: Sybil attack, Outsourcing attacks, Generation attacks.
Sybil Attacks: Malicious miners could pretend to store (and get paid for) more copies than the ones physically stored by creating multiple Sybil identities, but storing the data only once.
Outsourcing Attacks: Malicious miners could commit to store more data than the amount they can physically store, relying on quickly fetching data from other storage providers.
Generation Attacks: Malicious miners could claim to be storing a large amount of data which they are instead efficiently generating on-demand using a small program. If the program is smaller than the allegedly stored data, this inflates the malicious miner’s likelihood of winning a block reward in Filecoin, which is proportional to the miner’s storage currently in use.
Storage providers must convince their clients that they stored the data they were paid to store. In practice, storage providers will generate Proofs-of-Storage(PoS) that the blockchain network(or the clients themselves) verifies.
To make the act of storage publicly verifiable, Filecoin introduces two consensus algorithms: Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt). If you are new to consensus algorithms or want to know more about them, then head here:
Proof-of-Replication(PoRep) is a novel Proof-of-Storage which allows a server (i.e. the prover P) to convince a user (i.e. the verifier V) that some data D has been replicated to its own uniquely dedicated physical storage. Our scheme is an interactive protocol, where the prover P: (a) commits to store n distinct replicas(physically independent copies) of some data D, and then (b) convinces the verifier V, that P is indeed storing each of the replicas via a challenge/response protocol. PoRep improves on PoR and PDP schemes, preventing Sybil Attacks, Outsourcing Attacks, and Generation Attacks.
Proof-of-Spacetime: Proof-of-Storage schemes allow a user to check if a storage provider is storing the outsourced data at the time of the challenge. How can we use PoS schemes to prove that some data was being stored throughout a period of time? A natural answer to this question is to require the user to repeatedly (e.g. every minute) send challenges to the storage provider. However, the communication complexity required in each interaction can be the bottleneck in systems such as Filecoin, where storage providers are required to submit their proofs to the blockchain network.
To address this question, we introduce a new proof, Proof-of-Spacetime, where a verifier can check if a prover is storing her/his outsourced data for a range of time. The intuition is to require the prover to
generate sequential Proofs-of-Storage(in our case Proof-of-Replication), as a way to determine time.
recursively compose the executions to generate short proof
The prover receives a random challenge(c) from the verifier and generates Proofs-of-Replication in sequence, using the output of proof as an input of the other for a specified amount of iterations t. Thus ensuring that all of the work done is reusable(as discussed above).
PoSt & PoRep uses zk-SNARKS, making proofs are very short and easy to verify.
To know more about Practical implementation of PoSt refer to the Whitepaper.
Smart Contracts enable users of Filecoin to write stateful programs that can spend tokens, request storage/retrieval of data in the markets and validate storage proofs. Users can interact with the smart contracts by sending transactions to the ledger that trigger function calls in the contract. We extend the Smart Contract system to support Filecoin specific operations (e.g. market operations, proof verification).
Filecoin supports contracts specific to data storage, as well as more generic smart contracts:
File Contracts: We allow users to program the conditions for which they are offering or providing storage services. There are several examples worth mentioning: (1) contracting miners: clients can specify in advance the miners offering the service without participating in the market, (2) payment strategies: clients can design different reward strategies for the miners, for example a contract can pay the miner increasingly higher through time, another contract can set the price of storage informed by a trusted oracle, (3) ticketing services: a contract could allow a miner to deposit tokens and to pay for storage/retrieval on behalf of their users, (4) more complex operations: clients can create contracts that allow for data update.
Smart Contracts: Users can associate programs to their transactions like in other systems (as in Ethereum) which do not directly depend on the use of storage. We foresee applications such as: decentralized naming systems, asset tracking, and crowdsale platforms.
Bridges are tools that aim at connecting different blockchains; while still work in progress, we plan to support cross chain interaction in order to bring the Filecoin storage in other blockchain-based platforms as well as bringing functionalities from other platforms into Filecoin.
Filecoin in other platforms: Other blockchain systems such as Bitcoin, Zcash and in particular Ethereum and Tezos, allow developers to write smart contracts; however, these platforms provide very little storage capability and at a very high cost. We plan to provide a bridge to bring storage and retrieval support to these platforms. We note that IPFS is already in use by several smart contracts (and protocol tokens) as a way to reference and distribute content. Adding support to Filecoin would allow these systems to guarantee storage of IPFS content in exchange of Filecoin tokens.
Other platforms in Filecoin: We plan to provide bridges to connect other blockchain services with Filecoin. For example, integration with Zcash would allow support for sending requests for storing data in privacy.
Here we list a few potential issues which are not well discussed in the whitepaper.
Retrieval market scalability: The micropayments system(retrieval market) creates a lot of overhead on the retrieval protocol. In order to achieve retrieval speeds that match today's centralized infrastructure, filecoin and hence IPFS needs a lot of adoption so as to create a dense state channel network.
Censorship(Illegal Content): As we have seen in the past with Napster and the pirate bay, lack of censorship will eventually lead to Illegal content over the network, effectively bringing the dark web to the surface. Possible solutions could be AI-powered protocols that learn over time and automatically detect Illegal content and take necessary actions. But in order for the network to be a democratic one, the protocol needs to governed by the users itself(thus introducing byzantine behavior) to decide if the content needs some action or not.
So, summing up Censorship is an issue that is different for different people, and it needs a more personalized approach rather than a central open. Filecoin’s job is to create a market place for data management, rather than coming up with censorship governing policy. So, this “personalized” censorship layer can be shifted to the applications, building on the filecoin.
Here we list some possible improvements in the filecoin protocol.
Tahor-LAFS Encryption Scheme: When adding value, the client first encrypts it (with a symmetric key), then splits it into segments of manageable sizes, and then erasure-encodes these for redundancy. So, for example, a “2-of-3” erasure-encoding means that the segment is split into a total of 3 pieces, but any 2 of them are enough to reconstruct the original (read more about ZFEC). These segments then become shares, which are stored on particular Storage nodes. Storage nodes are a data repository for shares; users do not rely on them for integrity or confidentiality of the data.
Ultimately, the encryption-key and some information to help find the right Storage nodes become part of the “capability string” (read more about the encoding process). The important point is that a capability string is both necessary and sufficient to retrieve a value from the Grid — the case where this will fail is when too many nodes have become unavailable (or gone offline) and you can no longer retrieve enough shares.
There are write-capabilities, read-capabilities and verify capabilities; one can be diminished into the “less authoritative” capabilities offline. That is, someone with a write-capability can turn it into a read-capability (without interacting with a server). A verify-capability can confirm the existence and integrity of value, but not decrypt the contents. It is possible to put both mutable and immutable values into the Grid; naturally, immutable values don’t have a write-capability at all.