This article will decode this technology using mathematics, illustrating how it can prove the truth of knowledge without revealing any information.
Compiled by: DODO Research
Author: 0xAlpha Co-founder @DeriProtocol
Zk-SNARK, or zero-knowledge succinct non-interactive argument of knowledge, enables a verifier to confirm that a prover possesses certain knowledge (called witnesses) that satisfies specific relationships without revealing any information about the knowledge itself.
Generating a zk-SNARK for a specific problem involves the following four stages:
The first three steps merely transform the representation of the original statement. The last step projects the statement from the third step into an encrypted space using homomorphic encryption, allowing the prover to verify the mirrorred statements therein. Given that this projection utilizes asymmetric encryption, it is not feasible to backtrack from the statement in step 3 to the original statement, ensuring zero-knowledge exposure.
The mathematical background required to read this article is comparable to the algebra level expected of first-year STEM college students. The only concept that might be challenging might be elliptic curve cryptography. For those unfamiliar with this, you can think of it as an exponential function with a special base, keeping in mind that its inverse remains unsolved.
This article will follow the following notation rules:
Matrices: Bold uppercase letters, e.g. A; written in explicit forms \
Vectors: Lowercase letter with arrows, written in explicit forms [ ]; all vectors are column vectors by default \
Scalars: Represented by regular letters
Let’s take the following question as an example:
f(x)=x^3+x^2+5 (1)
Suppose Alice wants to prove that she knows a truth. We’ll walk through the entire procedure Alice needs to take to generate a zk-SNARK for her proof.
This question may seem too simple because it doesn’t involve control flow (e.g., if-then-else). However, it is not difficult to convert functions with control flow into arithmetic expressions. For example, consider the following problem (or “function” in programming languages):
f(x, z):
if(z==1):
return xxx+xx+5
else:
return xx+5
Rewriting this problem into the following arithmetic expression (assuming z belongs to (0,1)) is no more complex than equation (1) .
f(x,z)=(z-1)(x^2+5)+z(x^3+x^2+5)
In this article, we will continue to use equation (1) as the basis for our discussion.
Equation (1) can be broken down into the following basic arithmetic operations:
This corresponds to the following arithmetic circuit:
We also refer to equation (2) as a set of 4 “primary constraints”, each constraint describing the relationship of an arithmetic gate in the circuit. In general, a set of n primary constraints can be summarized as a quadratic arithmetic program (QAP), which will be explained next.
First, let’s define a “witness” vector as follows:
where y, s1, s2, and s3 are defined as in equation (2).
Typically, for a problem with m inputs and n arithmetic gates (i.e. n-1 intermediate steps and the final output), this witness vector will be (m+n+1) dimensional. In real-world cases, the number n can be extremely large. For example, for hash functions, n is usually in the thousands.
Next, let us construct three n(m+n+1) matrices A,B,C so that we can rewrite equation (2) as follows:
Equation (3) is just another representation of equation (2): each set (ai, bi, ci) corresponds to a gate in the circuit. We can see this by explicitly expanding the matrix equation:
So LHS=RHS is exactly the same as equation (2), and each row of the matrix equation corresponds to a primary constraint (i.e., an arithmetic gate in the circuit).
We skipped the steps of building (A, B, C), which are actually relatively straightforward. We only need to convert them into matrix according to the corresponding primary constraints to determine the content of each row of (A, B, C), that is, (ai, bi, ci). Taking the first row as an example: we can easily see that to make the first primary constraint x multiplied by x equal to s1, we need to multiply [0,1,0,0,0], [0, 1,0,0,0] and [0,0,0,1,0,0] by the vector s.
Thus, we have successfully converted the arithmetic circuit into a matrix equation, namely equation (3). At the same time, we also changed the object that needs to be proved to be mastered to the witness vector s.
Step 3: Polynomials
Now, let’s construct an n(n+m+*1) matrix PA, which defines a set of polynomials regarding z:
Ensuring that the functiontakes on the following values at {1, 2, 3, 4} satisfies the conditions:
Then we can rewrite A as:
These are four sets of linear equations involving six variables that can be solved using traditional methods. However, a more efficient way to solve PA in this case is Lagrangian interpolation, which exploits the peculiarities of the problem. Here we skip the process of solving PA, which is a bit tedious but straightforward.
Similarly, we construct PB and PC separately for B and C. Then we can rewrite the matrix equationas follows:
Observing each row separately, it’s evident that these four rows correspond to the same expression evaluated at four different points. Therefore, the above matrix equation is equivalent to:
Rewrite equation (4) as:
where i(z)=1 is just a trivial zeroth degree polynomial in z used to unify the equations - all terms are quadratic. The necessity of this will become clear shortly. Note that this equation only contains the addition and multiplication of polynomials in z.
Please note that arithmetic operations, addition (+) and multiplication (X), are also mapped to corresponding operations in the finite field of elliptic curve points. The symbols and are used to avoid confusion and indicate that these are redefined field operations.
Next, we’ll explain practical steps in more detail.
The general theory of elliptic curves goes far beyond the scope of this article. For the purposes herein, an elliptic curve is defined as mappings from a prime field Fp to the function E, where E includes points such that y^2=x^3+ax+b (called elliptic curve points, or ECPs for short).
Please note that while we’ve been discussing regular arithmetic thus far, when transitioning to a prime field, arithmetic operations on numbers are performed using modular arithmetic. For example, a+b=a+b(mod p), where p is the order of Fp.
On the other hand, the addition of two elliptic curve points is defined as shown below:
Specifically, the addition of P and Q of two ECPs:
Find the intersection of line PQ and curve R and define it as -(P+Q)
Flip to the “mirror” point R’ of R on the curve.
Therefore we define the addition of elliptic curve points P+Q=R’:
Note that this rule also applies to the special case where the addition of one ECP is used, in which case the tangent to that point will be used.
Now suppose we randomly select a point G and map the number 1 to it. (This “initial mapping” sounds a bit arbitrary. We will be discussing it later). And for any n belonging to Fp, we define:
E(N)=G+G+G+…..+G=G multiplied by n
There is an operation expression. Define the operation +G as “generator”, denoted as g. Then the above definition is equivalent to:
After defining addition, the following linear relationship becomes evident:
Therefore, any linear relationship (or constraint) in Fp will be preserved in the encrypted space B through this mapping. For instance, the equation l + m = n in Fp will lead to
However, as equation (5) that Alice wants to prove is in quadratic form, it is not linear enough. In order to handle quadratic terms, we need to define multiplication in the encrypted space. This is called a pairing, or bilinear map, which I’ll explain in the following section.
Suppose G1, G2 and GT are groups of prime order g. A pairing function, or bilinear map, is defined as follows:
This whole is termed the “proving key” (PK). Note that the representation of vectors within E() should be understood as vectors of elliptic curve points, where each point is mapped from the corresponding vector element. Thus, these 11 vectors actually comprise 62 (=42+63+63+63) elliptic curve points. These 62 ECPs will be provided to Alice, the prover. In a general scenario, for a problem with m inputs and n primary constraints, the PK will contain 2n + 3(m+n+1)*3 = 11n + 9m + 9 ECPs.
Simultaneously, the following computations will be performed:
The entire process is called “verification key” (VK). Only 7 elliptic curve points (ECPs) are involved here, which need to be provided to the verifier. It should be noted that no matter how many inputs and primary constraints are involved in the problem, VK is always composed of 7 ECPs.
In addition, it is worth mentioning that the “trusted setup” and the process of generating PK and VK only need to be done once for a specific problem.
Now possessing the public key (PK), Alice will compute the following elliptic curve points (ECPs):
These 9 elliptic curve points are crucial for a zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK)!
Note that Alice is essentially just performing linear combinations on the elliptic curve points in the public key. This is particularly critical and will be rigorously checked during verification.
Now, Alice presents the zk-SNARK proof, finally leading us to the verification process, which happens in three steps.
First, it is necessary to confirm that the first 8 elliptic curve points are indeed the linear combinations of those points in the common reference string.
This article will decode this technology using mathematics, illustrating how it can prove the truth of knowledge without revealing any information.
Compiled by: DODO Research
Author: 0xAlpha Co-founder @DeriProtocol
Zk-SNARK, or zero-knowledge succinct non-interactive argument of knowledge, enables a verifier to confirm that a prover possesses certain knowledge (called witnesses) that satisfies specific relationships without revealing any information about the knowledge itself.
Generating a zk-SNARK for a specific problem involves the following four stages:
The first three steps merely transform the representation of the original statement. The last step projects the statement from the third step into an encrypted space using homomorphic encryption, allowing the prover to verify the mirrorred statements therein. Given that this projection utilizes asymmetric encryption, it is not feasible to backtrack from the statement in step 3 to the original statement, ensuring zero-knowledge exposure.
The mathematical background required to read this article is comparable to the algebra level expected of first-year STEM college students. The only concept that might be challenging might be elliptic curve cryptography. For those unfamiliar with this, you can think of it as an exponential function with a special base, keeping in mind that its inverse remains unsolved.
This article will follow the following notation rules:
Matrices: Bold uppercase letters, e.g. A; written in explicit forms \
Vectors: Lowercase letter with arrows, written in explicit forms [ ]; all vectors are column vectors by default \
Scalars: Represented by regular letters
Let’s take the following question as an example:
f(x)=x^3+x^2+5 (1)
Suppose Alice wants to prove that she knows a truth. We’ll walk through the entire procedure Alice needs to take to generate a zk-SNARK for her proof.
This question may seem too simple because it doesn’t involve control flow (e.g., if-then-else). However, it is not difficult to convert functions with control flow into arithmetic expressions. For example, consider the following problem (or “function” in programming languages):
f(x, z):
if(z==1):
return xxx+xx+5
else:
return xx+5
Rewriting this problem into the following arithmetic expression (assuming z belongs to (0,1)) is no more complex than equation (1) .
f(x,z)=(z-1)(x^2+5)+z(x^3+x^2+5)
In this article, we will continue to use equation (1) as the basis for our discussion.
Equation (1) can be broken down into the following basic arithmetic operations:
This corresponds to the following arithmetic circuit:
We also refer to equation (2) as a set of 4 “primary constraints”, each constraint describing the relationship of an arithmetic gate in the circuit. In general, a set of n primary constraints can be summarized as a quadratic arithmetic program (QAP), which will be explained next.
First, let’s define a “witness” vector as follows:
where y, s1, s2, and s3 are defined as in equation (2).
Typically, for a problem with m inputs and n arithmetic gates (i.e. n-1 intermediate steps and the final output), this witness vector will be (m+n+1) dimensional. In real-world cases, the number n can be extremely large. For example, for hash functions, n is usually in the thousands.
Next, let us construct three n(m+n+1) matrices A,B,C so that we can rewrite equation (2) as follows:
Equation (3) is just another representation of equation (2): each set (ai, bi, ci) corresponds to a gate in the circuit. We can see this by explicitly expanding the matrix equation:
So LHS=RHS is exactly the same as equation (2), and each row of the matrix equation corresponds to a primary constraint (i.e., an arithmetic gate in the circuit).
We skipped the steps of building (A, B, C), which are actually relatively straightforward. We only need to convert them into matrix according to the corresponding primary constraints to determine the content of each row of (A, B, C), that is, (ai, bi, ci). Taking the first row as an example: we can easily see that to make the first primary constraint x multiplied by x equal to s1, we need to multiply [0,1,0,0,0], [0, 1,0,0,0] and [0,0,0,1,0,0] by the vector s.
Thus, we have successfully converted the arithmetic circuit into a matrix equation, namely equation (3). At the same time, we also changed the object that needs to be proved to be mastered to the witness vector s.
Step 3: Polynomials
Now, let’s construct an n(n+m+*1) matrix PA, which defines a set of polynomials regarding z:
Ensuring that the functiontakes on the following values at {1, 2, 3, 4} satisfies the conditions:
Then we can rewrite A as:
These are four sets of linear equations involving six variables that can be solved using traditional methods. However, a more efficient way to solve PA in this case is Lagrangian interpolation, which exploits the peculiarities of the problem. Here we skip the process of solving PA, which is a bit tedious but straightforward.
Similarly, we construct PB and PC separately for B and C. Then we can rewrite the matrix equationas follows:
Observing each row separately, it’s evident that these four rows correspond to the same expression evaluated at four different points. Therefore, the above matrix equation is equivalent to:
Rewrite equation (4) as:
where i(z)=1 is just a trivial zeroth degree polynomial in z used to unify the equations - all terms are quadratic. The necessity of this will become clear shortly. Note that this equation only contains the addition and multiplication of polynomials in z.
Please note that arithmetic operations, addition (+) and multiplication (X), are also mapped to corresponding operations in the finite field of elliptic curve points. The symbols and are used to avoid confusion and indicate that these are redefined field operations.
Next, we’ll explain practical steps in more detail.
The general theory of elliptic curves goes far beyond the scope of this article. For the purposes herein, an elliptic curve is defined as mappings from a prime field Fp to the function E, where E includes points such that y^2=x^3+ax+b (called elliptic curve points, or ECPs for short).
Please note that while we’ve been discussing regular arithmetic thus far, when transitioning to a prime field, arithmetic operations on numbers are performed using modular arithmetic. For example, a+b=a+b(mod p), where p is the order of Fp.
On the other hand, the addition of two elliptic curve points is defined as shown below:
Specifically, the addition of P and Q of two ECPs:
Find the intersection of line PQ and curve R and define it as -(P+Q)
Flip to the “mirror” point R’ of R on the curve.
Therefore we define the addition of elliptic curve points P+Q=R’:
Note that this rule also applies to the special case where the addition of one ECP is used, in which case the tangent to that point will be used.
Now suppose we randomly select a point G and map the number 1 to it. (This “initial mapping” sounds a bit arbitrary. We will be discussing it later). And for any n belonging to Fp, we define:
E(N)=G+G+G+…..+G=G multiplied by n
There is an operation expression. Define the operation +G as “generator”, denoted as g. Then the above definition is equivalent to:
After defining addition, the following linear relationship becomes evident:
Therefore, any linear relationship (or constraint) in Fp will be preserved in the encrypted space B through this mapping. For instance, the equation l + m = n in Fp will lead to
However, as equation (5) that Alice wants to prove is in quadratic form, it is not linear enough. In order to handle quadratic terms, we need to define multiplication in the encrypted space. This is called a pairing, or bilinear map, which I’ll explain in the following section.
Suppose G1, G2 and GT are groups of prime order g. A pairing function, or bilinear map, is defined as follows:
This whole is termed the “proving key” (PK). Note that the representation of vectors within E() should be understood as vectors of elliptic curve points, where each point is mapped from the corresponding vector element. Thus, these 11 vectors actually comprise 62 (=42+63+63+63) elliptic curve points. These 62 ECPs will be provided to Alice, the prover. In a general scenario, for a problem with m inputs and n primary constraints, the PK will contain 2n + 3(m+n+1)*3 = 11n + 9m + 9 ECPs.
Simultaneously, the following computations will be performed:
The entire process is called “verification key” (VK). Only 7 elliptic curve points (ECPs) are involved here, which need to be provided to the verifier. It should be noted that no matter how many inputs and primary constraints are involved in the problem, VK is always composed of 7 ECPs.
In addition, it is worth mentioning that the “trusted setup” and the process of generating PK and VK only need to be done once for a specific problem.
Now possessing the public key (PK), Alice will compute the following elliptic curve points (ECPs):
These 9 elliptic curve points are crucial for a zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK)!
Note that Alice is essentially just performing linear combinations on the elliptic curve points in the public key. This is particularly critical and will be rigorously checked during verification.
Now, Alice presents the zk-SNARK proof, finally leading us to the verification process, which happens in three steps.
First, it is necessary to confirm that the first 8 elliptic curve points are indeed the linear combinations of those points in the common reference string.