– Sharding in blockchain attempts to improve decentralized network throughput and many blockchain protocols’ scaling potential.
– As development continues and sharding is tested and implemented, it could be a good option to solve the blockchain trilemma for the first time in history.
– Sharding seems like an ideal solution to the scaling problem, but there are currently several issues in the path to its success, like operational complexities and latency.
I have really enjoyed digging into NEAR’s Nightshade documentation and its skeleton. NEAR officially announced during 4Q 2021 that the protocol is entering Phase 0 of its sharding implementation(Simple Nightshade), focused on promoting a complete and accurate network decentralization by splitting the current network into four shards with the primary goal of increasing its performance.
Until now, two of the main problems connected to scaling a network have been directly related to three different aspects: i) more bandwidth and ii) more computational power: both necessary to process transactions faster and complete blocks. Let’s not forget about another relevant one: iii) storage, to keep data since it will also be more demanding even if the number of transactions processed remains constant.
But, could sharding be the holy grail to solve the blockchain trilemma, once and for all?
The blockchain trilemma encapsulates the challenge of creating i) secure, ii) scalable, and iii) decentralized networks without sacrificing any of these attributes.
Three of the core elements available on NEAR protocol position this technology heading the list on scalability and network efficiency:
- Carbon neutral emissions
- Entering in this “Nightshade – Phase 0”, officially positions NEAR on the list of real decentralized protocols.
Nevertheless, sharding represents a true challenge in many ways. Defining what this process aims to achieve may help to understand the complexities of the task:
Sharding involves creating a second layer on the network main-net where computation is off-loaded, promoting speed and scalability without making changes to the base chain (or Beacon chain in Ethereum, relay Chain on Polkadot).
Therefore, the information contained in each sharded block in the new branch on the chain would be partially validated, not entirely, making the process faster and more secure. The validators in the shard will only need to store their local part of the global state and execute transactions connected to the part of that state.
Contrary to how it works in other blockchain systems where the nodes may download and validate a full copy of the information contained on the network, the “sharded nodes” will only have to process “chunks” of information. This exponentially contributes to the performance of the whole network for obvious reasons.
Data processing and availability:
It is a challenge when sharding a blockchain into “branches” to ensure the availability of information and its integrity. On a classic DLT system, the “mother blockchain,” it is possible to rely on the integrity of the network via the full nodes (those who download every entire block of the chain and validate every transaction).
The sharding process implies that the information will be processed in sharded blocks that do not contain a full copy of the “raw/complete data” stored initially on the beacon chain.
The main idea is that in a sharded blockchain the majority of the participants will not be capable of downloading a full copy of the network.
We may say then that the availability and integrity of the information might be compromised somehow: without having the property of downloading and validating the complete transaction history and information related to a shard, the participant won’t be capable of verifying that this piece of “chunked” information does or does not correspond to the data recorded and stored in the main-net.
Integrity, availability, and confidentiality are the three main attributes that must be guaranteed to ensure that a system can be considered secure.
Another concern connected to data privacy may arise: from a data protection standpoint (the EU`s GDPR or the CCPA in the US), the data subject, the natural person who is the owner of the personal data, has the right to access all the information that has been collected about a specific individual.
What would happen in the case that an interested party would require access by exercising the right on a sharded environment? Since the information stored in a sharded block is not a full copy of what was recorded on the original block in the beacon chain… How could the responsible or the data processor answer this requirement?
Only in some specific cases allowed by the privacy framework could deny the data subject the right of access, subject to a reasoned and well-justified situation (a legal obligation, in case that by disclosing this information damage could be materialized towards a third party, etc.)
Furthermore, the information must be delivered in an accessible format without limits to its understanding.
An effective technique capable of allowing access to the dataset would have to be implemented to guarantee respect to the individual (data subject) rights. As pointed out earlier, in particular regarding access to personal information.
An interesting approach taken by Nightshade consists in separating the nodes into two main categories:
- Light nodes: those that can only download block headers, meaning that they will use parts (Merkle proofs) of the information contained in those blocks to validate the transactions they are interested in.
- Full nodes: those capable of downloading all the information contained in every entire block and are in charge of validating every transaction.
Figure 1: Merkle tree model. Source – NIghtshade documentation
A different approach to processing data
Something worth mentioning is Machina, a solution created to provide immediate and retrievable packages of information on-chain. Using Nightshade as a base to scale up data proportionally to the number of validators joining the network.
Machina uses file-based storage to make on-chain information accessible and available online while respecting the blockchain`s security measures like consensus and decentralization.
The solution gives a unique identifier to a data set. This means that the validator will only have to approve or disapprove the integrity of the data by a specific short text string.
A brilliant solution to a difficult new challenge
What is needed to be solved?
One of the main concerns in a sharded blockchain is the possibility of suffering malicious attacks that could compromise the validity of the information contained in those new sharded blocks. The possibility that the block validators could approve, by error, invalid blocks containing false information would no longer be a non-possible scenario.
In a classic blockchain, this problem seems very difficult to be materialized since the odds are minuscule and the difficulty in executing it is incredibly high; it is up to the nodes to verify the information and, with this, increase the trust in the network.
The rules of the game in a sharded chain are entirely different: think about the possibility of creating a system that makes those sharded blocks inter-operate, making them communicate with each other. The NEAR blockchain works asynchronously, meaning that the information that affects multiple shards is executed in this manner. Opposingly, an asynchronous system operated with the information sent to the sharded blocks and processed simultaneously.
Since, somehow, the integrity and availability of the information might be “compromised” due to the impossibility of accessing the whole dataset and its underlying original information… How could a valid process detect incorrect or corrupted data? And how should it be implemented?
The problem in a sharded environment underlies the possibility of a malicious attack materializing on a specific block, considering that it is theoretically probable to perform it by corrupting all the participants in the network that maintain the shard, including the validators.
Nevertheless, a malicious attack in these terms has not been performed yet, considering that no sharded environment has been live and running enough time.
Again, the potential solution to this problem that may undermine the stability of the whole network relies on the process of “anonymization.”
The General Data Protection Regulation (GDPR) defines anonymous information as: “…information that does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable”. The GDPR does not apply to anonymized information.
By anonymizing the underlying information that shows which validators are assigned to which sharded block, the probability of adaptative corruptions gets reduced. The process to corrupt a shard will have to be run, not knowing which participant may have created or validated a block or a chunk.
In a similar fashion to what happens in the Data Protection field with how sensitive information might be stored in the blockchain, the process of anonymization presents itself as a viable option to consider regarding data corruption in sharded blocks.
It’s safe to affirm that Nightshade in the NEAR protocol is positioned as a viable alternative to grow and tackle the exponential growth and expansion of blockchain technology. It may help improve the speed in the network and the scalability and propose an exciting solution on how to work with sensitive data blocks in DLT environments.
Disclaimer: This article contains the writer’s opinions and should not be considered investment or legal advice. Readers should do their own research.