What is the InterPlanetary File（IPFS）?

2022-09-22, 03:45

TL;DR

🔹 IPFS is a hypermedia server protocol used to share and store data.

🔹 IPFS stands for InterPlanetary File System, which uses Content addressing rather than location-based access to data and files.

🔹 Relevant contents are identified using a cryptographic hash to obtain content from various sources.

🔹 Every time a content is updated with new materials, a new file is created with its own new hash, with IPFS retaining the previous version.

Introduction

The internet is composed of tons upon tons of data. From TikTok videos to YouTube streams, Instagram photos, Facebook posts, Wikipedia pages, and the continuous quintillion bytes of data shared daily on the internet, it begs the question, where do we store all of this data?

The data storage process that the internet utilizes is mainly server-based, which could either be physical or virtual.
Elaborate facilities called cloud platforms or server farms are used. These facilities house thousands of storage and computation hardware arranged and connected to a central server.

An Internet user needing information on those servers will make an HTTPS connection from his browser to the relevant server, which then serves the access request, retrieves the appropriate data, and loads it on his browser.

This process of accessing files by connecting to servers that locate where it is on the internet is termed "location-based addressing." However, there are several shortcomings to central server method of storing and accessing data.

The innovation of trustless systems couldn't have come at a better time. These systems eliminate the need for a significant third party, and one such system is the InterPlanetary File System (IPFS).

What is IPFS?

IPFS, the short form of InterPlanetary file system, is a storage system that lets you store files and keep track of their versions over time.

IPFS uses the distributed storage system model and does everything central servers do but without the reliance on a central system. This makes it safer and more resistant to attacks, downtimes, and censorship while allowing for a more decentralized internet.

Created by Juan Benet and introduced in 2016, IPFS has witnessed several improvements. Individuals and organizations have adopted it to share files and information without barriers.

How does IPFS work?
There are three basic principles IPFS operates on:

Content Addressing
Content linking using directed acyclic graphs (DAGs)

Content discovery using distributed hash tables (DHTs)

These three principles contribute to the enablement of the IPFS ecosystem. Let us briefly explain them one after the other:

Content Addressing
IPFS uses its Content addressing the ability to identify required data by its Content rather than by where it's located.

For instance, if your friend is at the convenience store, and you tell him to pick up a pack of well-known mints for you (which, coincidentally, is usually placed on the left side closest to the cashier). That is an example of Content addressing because you're explicitly asking for what it is.

On the other hand, if you were to ask for your mints using location, you'd say, "Please pick up what's usually closest to the cashier on the left, a few inches from her arm."

If the mints got replaced that day with, say, dental floss, it would be no fault of your friend what he returns with.

That scenario can happen between your computer and the internet. Right now, Content is majorly found by location.

On the other hand, every piece of Content that uses IPFS has a content identifier (CID), that is, its hash. Every hash is unique to the Content it comes from, and every time new data is added, a new file is created with a new CID while the previous version is retained on IPFS. It allows for the immutable storage of a file's entire history on IPFS.

Many distributed systems use Content addressing through hashes to identify content and link it together. It is worth noting that the basic data structures in these systems are not necessarily interoperable.

This is where the Interplanetary Linked Data (IPLD) project saves the day. IPLD provides several links to data, and users are also given the option to make their linkages using fundamental data structures that can be held on IPFS. The data can be unified across distributed systems thanks to IPLD's translation between hash-linked data structures.

Directed acyclic graphs (DAGs)

Distributed systems like IPFS use a data structure called directed acyclic graphs (DAGs). They use Merkle DAGs, where each node has a unique identifier, a hash of the node's Content.

While Merkle DAG can be structured differently, IPFS uses one optimized for representing directories and files.

To build a Merkle DAG representation of your Content, IPFS often first splits your Content into blocks. This allows the coming together of different parts of the file from various sources enabling quicker authentication.

Distributed hash tables (DHTs)

IPFS uses a distributed hash table (DHT) to find which peers possess the content you seek. A hash table is simply a database of keys to relevant values. A distributed hash table is a hash table split across all participating peers in a distributed network. To find Content, those peers are "asked."

Once you've received confirmation on which peers store the blocks that comprise the Content you seek, you again use the DHT to discover the current location of those peers through a process called routing.

After discovering the location of your Content through the use of Content addressing, you are ready to connect to the Content and get it.
When you obtain the Content, it is cached by your computer, and you also become a provider of that Content until you decide to clear away your cache.

If you choose to, you could select to store a copy of the file and become a permanent contributor and provider for it. You can keep this as long as possible and with as much content as you choose.

Merits of IPFS

1 Peer-to-peer simplicity - IPFS uses a DHT, or distributed hash table, to store data. When a user has a hash, he asks the peer network which node contains the Content in that hash and downloads the content directly from that node without recourse to a third party.

2 Improved safety - Due to the decentralized nature of nodes, it is tough to guess what data is stored on which node on IPFS.

Central servers can easily be targeted by hackers to steal or corrupt data, an occurrence preventable by IPFS. Governments can also censor information and internet platforms easily, an act which is already being done all over the world. Not long ago, Turkey censored Wikipedia and Nigeria banned the social media platform Twitter. All these were possible because they all knew where exactly the data was and where they needed to target.

3 Immutable - The total transformation and upgrade of a content can be traced thanks to the immutable nature of IPFS. In as much as any action on a content is undeletable, it gives a great level of transparency to content and assurance to users.

Demerits of IPFS

1 Hard to set up- The process involved in setting up IPFS is very technical and requires a certain level of technical knowledge to follow. This can discourage most lay people and keep the technology restricted to only techies limiting its potential and popularity.

2 Expensive to maintain- Running IPFS processes on your computer consumes a huge amount of bandwidth and storage space. Storing copies of content that will serve other seekers also requires a lot of bandwidth. Without any strong economic incentive, this might not be appealing to, or affordable for everyone.

3 Data reliability - Enabling the storage of private data is not one of the strong points of IPFS. Such data are hard to de-duplicate efficiently or intelligibly cache. The claim of a peer to possess a content is also not verified leading to concerns over the reliability of the data when it is obtained.

Conclusion

A few big companies centrally controlling most of the world's data is no longer a viable option, with our lives becoming increasingly digitized.

A more reliable and secure alternative for data storage is needed. As we move from the web2 to the more decentralized web3 atmosphere, technologies like IPFS are a necessary part of the move.

While it still needs a few upgrades here and there, especially in its economics, its ability to provide an immutable, decentralized, and reliable system that protects you from censorship, loss of access to needed data, and data manipulation makes it a winner.

The reduction and distribution of the control tech giants have over the internet and data today, leading to a more user-focused and democratic internet, puts the icing on the cake.

Author: M. Olatunji, Gate.io Researcher
* This article represents only the views of the observers and does not constitute any investment suggestions.
*Gate.io reserves all rights to this article. Reposting of the article will be permitted provided Gate.io is referenced. In all other cases, legal action will be taken due to copyright infringement.

分享一下