Matthew Wampler-Doty
Abstract: A “Proof of Work” (abbreviated PoW) system for a cryptocurrency based on providing logical formulae and variable assignments is proposed. Miners are required to both mine the logical formulae as well as provide variable assignments, as per the Stephen Cook’s celebrated Satisfaction Problem (abbreviated SAT) from his paper "The Complexity of Theorem Proving Procedures" (1970). To ensure the problem is non-trivial, formulae must conform to a large number of constraints, among them are: (1) they must have a fixed number of literals in each clause, (2) the graph of clauses must be connected, and a spanning tree traversal must be provided to prove this, (3) a minimal number of variables is necessary and finally (4) there must be exactly N clauses. A naïve miner is proposed, and it is shown that it follows the same statistical behavior as the conventional BitCoin mining system. Given the nature of the problems this system is based on, we anticipate occasional algorithmic breakthroughs will result in speedups which will be difficult to introduce into ASICs, given their long development cycle. Verification is linear in the number of clauses in the mined solution.
One of the problems facing Bitcoin is the rise of ASIC hardware specially designed for mining. These systems are designed to solve problems of the the form:
Where nonce is a free variable which may be incremented from some minimal value. Solutions to such problems constitute Proof of Work that is used as the basis for BitCoin. Since sha256 is straightforward to implement, ASICs can readily compute it in very few clock cycles, as compared to conventional CPU based architectures. As of November 2014, it is no longer feasible to mine BitCoin using a conventional CPU based architecture.
Alternatives systems are currently being investigated in the crypto-currency community to avoid this issue. One proposal is Hashimoto by Thaddeus Dryja (2014), who suggests using a two stage hash function. Another system, due to Vitalik Buterin, uses randomly generated logic circuits.
An independent thread of research involves mining protocols is based on traditional, Nondeterministic Polynomial (NP) problems from the mathematics literature. One such system is PrimeCoin, which is based on finding numbers that pass the popular test for primality using Fermat's Little Theorem. This was done to allow for reasonable control of inflation. As such numbers used in its blockchain are only probabilistic primes rather than true prmes, and some are not primes at all but rather composites known as Carmichael numbers (see A002997 in the Online Encyclopedia of Integer Sequences). A similar cryptocurrency, RieCoin, is also based finding prime numbers, but it uses the Miller-Rabin test to ensure true primality.
Traditional NP problems from computer science theory present an intriguing possibility for avoiding ASIC block chain mining. For algorithmically rich problems, speedups can come from researcher insight, rather than simple hardware improvements. Since ASICs have a long development cycle, this allows for savvy researchers to compete with ASICs designers and allow for CPUs to remain a competitive means of mining. In Boolean Satisfaction (SAT) in particular, researchers routinely report increases in performance in orders of magnitude (see, for instance, Duraira (2004), Aloul (2006), Velev (2009), etc.). It is noteworthy that all commercial SAT solvers are CPU based.
To this end, we propose SATCoin, for use as a Proof of Work specification based on SAT problems intended to be difficult to implement in an ASIC. We first go over how proof of work is verified, and then go on to present an algorithm suitable for mining.
In this section, we present the verification scheme for SATCoin, which exhibits linear time complexity.
A satisfaction problem is the problem of finding an assignment of logical variables to either True or False such that a logical formula witha particular form is true. Formulae for SAT problems are presented in Conjunctive Normal Form (CNF), which are comprised of conjuncts of disjuncts of literals. A literal is a logical formula or its negation. Here is an example of a CNF formula:
SAT problems are solved by some assignment α of variables to either True or False that make each disjunct of the conjunct true. For instance, as long as B is assigned to False, E is assigned to True, and C is assigned to False, all of the disjuncts of the above formula are made true.
In python, a literal can be modeled as an (immutable) object with two attributes: sign
and a variable
. A sign
is either True
or False
, while variable
is the name of the underlying variable.
A CNF formula can then be modeled as a collection of collections of literals. Here is how we'd express the example above:
example_sat_problem = \
[[{"sign": True, "variable": "A"},
{"sign": False, "variable": "B"},
{"sign": False, "variable": "D"}],
[{"sign": False, "variable": "B"},
{"sign": True, "variable": "C"},
{"sign": False, "variable": "D"},
{"sign": True, "variable": "E"}],
[{"sign": True, "variable": "C"},
{"sign": False, "variable": "C"},
{"sign": True, "variable": "A"}]]
An assignment can be modeled as a dictionary, with the keys being the variable names and the values being True
or False
.
example_assignment = {"B": False, "E": True, "C": False}
Checking if a SAT problem is satisfied by a variable assignment is straight forward. We make the assumption that if a variable's truth value is not specified then it is False
.
def satisfied(SAT_problem, variable_assignment):
from collections import defaultdict
variable_assignment = defaultdict(lambda: False, **variable_assignment)
return any(variable_assignment[literal["variable"]] == literal["sign"]
for clause in SAT_problem
for literal in clause)
We can verify satisfied
is implemented correctly by checking that our example evaluates to True
:
satisfied(example_sat_problem, example_assignment)
True
We next a present how SAT problems are generated. We opt to allow SAT problems to be generated non-deterministically, allowing for the miner to construct the problem themselves.
To begin, define a linear congruential random number generator based on the secp256k1 elliptic curve that is used in BitCoin.
def next_random(n):
return (abs(n) * 3**160 + 7**80) % (2**256 - 4294968273)
Next, we define the initial conditions for blockchain generation. A header is provided by the cryptocurrency network, which is a sha256 hash of the current blockchain. At the beginning of generation SAT problem generation, the miner selects a set of generators, which are all positive integers.
At each stage of SAT problem generation, the generators initially chosen are used to compute n literals, using the latest random number n supplied by next
. For the ith generator gi, the variable for its literal is n mod gi while the sign is True
if the ith lowest bit of n is 1, and False
if the ith lowest bit is 0. This is implemented as follows:
def generate_literals(n, generators):
return [{"variable": n % g, "sign": ((n >> idx) % 2) == 1, "generator": g}
for idx,g in enumerate(generators)]
The miner then picks some number of literals from the generated literals to form a clause. The chosen generators are added to the current random value, and next
is called on the result.
To give an example, suppose that the current random value is 3681122926843
, and the user chose [9,13,14,21,33,35]
as their initial generators. Then they could choose from among the following possibilities of literals to generate a clause:
generate_literals(3681122926843, [9,13,14,21,33,35])
[{'generator': 9, 'sign': True, 'variable': 1}, {'generator': 13, 'sign': True, 'variable': 11}, {'generator': 14, 'sign': False, 'variable': 5}, {'generator': 21, 'sign': True, 'variable': 19}, {'generator': 33, 'sign': True, 'variable': 10}, {'generator': 35, 'sign': True, 'variable': 33}]
Suppose the miner wanted to make a clause with 3 literals. Given the 6 possible generators, they would have (63)=20 possibilities to choose from. Let's suppose they chose the literals generated by [9,21,33]
, then the next random value would be:
next_random(3681122926843+9+21+33)
72951700441152273488650601127460710619839498214931026171753573659363821185246L
Putting everything together, we can verify if a miner correctly generated a SAT problem with the following check:
def sat_problem_generated(header, generators, sat_problem):
n = header
for clause in sat_problem:
possible_literals = generate_literals(n, generators)
if not all(literal in possible_literals for literal in clause):
return False
n = next_random(n + sum(literal["generator"] for literal in clause))
return True
Here is an example with 8 generators:
header = 5751887518028651109
example_generators = [5,7,8,9,11,13,17,19]
example_generated_sat_problem = \
[({'generator': 5, 'sign': True, 'variable': 4},
{'generator': 7, 'sign': False, 'variable': 5},
{'generator': 9, 'sign': False, 'variable': 6}),
({'generator': 5, 'sign': False, 'variable': 3L},
{'generator': 7, 'sign': True, 'variable': 4L},
{'generator': 8, 'sign': False, 'variable': 2L}),
({'generator': 5, 'sign': False, 'variable': 0L},
{'generator': 7, 'sign': True, 'variable': 6L},
{'generator': 8, 'sign': True, 'variable': 6L})]
sat_problem_generated(5751887518028651109, example_generators, example_generated_sat_problem)
True
So far, we have presented SAT problems, and shown how a miner can non-deterministically construct one. Right now the problem has no constraints, however. In order to make the problem non-trivial, we introduce a series of structural constraints.
One simple criterion is to have a fixed number of clauses. More clauses put more constraints on the possible SAT solutions.
def number_of_clauses(n, clauses):
return n == len(clauses)
The first demand we make is that there be a specified number of unique variables in a clause; this is the clause's width.
Since it will come up somewhat often, here is a simple function for extracting the set of variables from a clause:
def clause_variables(clause):
"Returns a frozenset of variables in a clause"
return frozenset(literal["variable"] for literal in clause)
We can easily check that there are n variables in a clause by checking the length of the result of clause_variables
:
def clause_width(n, clause):
return n == len(clause_variables(clause))
def all_clauses_width(n, clauses):
return all(clause_width(n, clause) for clause in clauses)
SAT problems with repitious clauses are easier to solve, so we will want to constrain problems so that no two have the same variables.
def no_repeated(clauses):
visited = set([])
for clause in clauses:
variables = clause_variables(clause)
if variables in visited:
return False
visited.add(variables)
return True
Here's an example of a set of clauses where everything is unique:
example_unique = \
[({'sign': True, 'variable': 4L},
{'sign': False, 'variable': 6L},
{'sign': True, 'variable': 5L}),
({'sign': True, 'variable': 3L},
{'sign': False, 'variable': 3L},
{'sign': True, 'variable': 5L}),
({'sign': True, 'variable': 3L},
{'sign': True, 'variable': 2L},
{'sign': False, 'variable': 3L}),
({'sign': False, 'variable': 4L},
{'sign': False, 'variable': 6L},
{'sign': True, 'variable': 4L})]
no_repeated(example_unique)
True
Here's an example with repeated clauses that we do not want to permit:
example_not_unique = \
[({'sign': True, 'variable': 0L},
{'sign': False, 'variable': 0L},
{'sign': False, 'variable': 7L}),
({'sign': False, 'variable': 0L},
{'sign': False, 'variable': 1L},
{'sign': False, 'variable': 0L}),
({'sign': True, 'variable': 3L},
{'sign': True, 'variable': 2L},
{'sign': True, 'variable': 7L}),
({'sign': True, 'variable': 4L},
{'sign': False, 'variable': 3L},
{'sign': True, 'variable': 5L}),
({'sign': True, 'variable': 4L},
{'sign': True, 'variable': 1L},
{'sign': False, 'variable': 3L}),
({'sign': True, 'variable': 0L},
{'sign': True, 'variable': 0L},
{'sign': True, 'variable': 7L})]
no_repeated(example_not_unique)
False
Another property we wish to enforce is that the graph of clauses must be connected. This enforces a certain level of interaction among all of the clauses; setting one literal can potentially force the value of any other literal.
To ensure that the graph of clauses is connected, we make use of the following observation:
Fact. A graph is connected if and only if there is a traversal of a spanning tree that visits every node.
Checking that a given traversal of a spanning tree visits every node can be carried out in linear time. To start, we define a binary predicate which asserts whether two clauses are connected:
def connected(c1, c2):
ls1 = clause_variables(c1)
ls2 = clause_variables(c2)
return any(l in ls2 for l in ls1)
A spanning tree traversal is represented as a list of data structures, each with two fields:
'child clause index'
)'parent clause index'
)In the case of the root, there is no parent clause, but all subsequent entries must have a parent clause. Morever, the parent clause must always have been a clause that's been previously listed, and the parent and child must be connected according to the connected
predicate.
Given that the spanning tree traversal never repeats and has as many entries as clauses, we know that the graph of clauses is indeed connected.
def clauses_connected_by_traversal(clauses, traversal):
if len(traversal) != len(clauses) or \
not (0 <= traversal[0]['child clause index'] < len(clauses)):
return False
visited = set([traversal[0]['child clause index']])
for i in range(1,len(traversal)):
child_clause_index = traversal[i]['child clause index']
parent_clause_index = traversal[i]['parent clause index']
if child_clause_index in visited or \
not (parent_clause_index in visited) or \
not (0 <= child_clause_index < len(clauses)) or \
not connected(clauses[child_clause_index],
clauses[parent_clause_index]):
return False
visited.add(child_clause_index)
return True
It's of course helpful to see an example:
connected_example_sat_problem = \
[({'sign': True, 'variable': 4L},
{'sign': False, 'variable': 4L},
{'sign': True, 'variable': 5L}),
({'sign': True, 'variable': 3L},
{'sign': False, 'variable': 4L},
{'sign': False, 'variable': 1L}),
({'sign': True, 'variable': 2L},
{'sign': True, 'variable': 6L},
{'sign': False, 'variable': 3L}),
({'sign': True, 'variable': 3L},
{'sign': False, 'variable': 0L},
{'sign': True, 'variable': 5L}),
({'sign': True, 'variable': 4L},
{'sign': False, 'variable': 0L},
{'sign': False, 'variable': 1L}),
({'sign': False, 'variable': 2L},
{'sign': False, 'variable': 1L},
{'sign': True, 'variable': 4L})]
connected_example_spanning_tree_traversal = \
[{'child clause index': 0},
{'child clause index': 1, 'parent clause index': 0},
{'child clause index': 2, 'parent clause index': 1},
{'child clause index': 3, 'parent clause index': 2},
{'child clause index': 4, 'parent clause index': 1},
{'child clause index': 5, 'parent clause index': 2}]
clauses_connected_by_traversal(connected_example_sat_problem, connected_example_spanning_tree_traversal)
True
The resulting spanning tree can of course be easily recovered from the traversal. Here's the spanning tree used in the above example:
%load_ext hierarchymagic
%%dot -f svg
graph G {
0--1; 1--2; 2--3; 1--4; 2--5;
}
Similar to clauses, we also demand that there be a fixed number of generators:
def number_of_generators(n, generators):
return n == len(generators)
Some criteria for clauses are easy to satisfy with a choice of small generators. So we constrain the problem by presenting a minimal value for generator.
def min_generator(m, generators):
return m <= min(*generators)
So far, we've presented a number of criteria for enforcing that the SAT problems generated by miners be non-trivial. Ideally, we'd like to be able to set a hard statistical probability that a proposed solution will pass.
We adopt the approach taken by BitCoin, but with a modification. We introduce the criterion that:
Like BitCoin, this effectively means that with probability p a solution will be rejected, where p=target2256. Unlike BitCoin, where solutions can be found via linear search (simply incrementing a nonce value), the miner must instead enumerate possible satisifying variable assignments for a SAT problem they have constructed. Enumerating every satisfying variable assignment for a given SAT problem is #P-complete, which is NP-Hard. Of course, a miner could also try to compose other SAT problems with varying solutions, which is in NP.
Below we give an implementation of this criterion; we make use of the JSON representation of the variable assignment, although there are many possible representations.
def sha256_less_than_target(block_header, variable_assignment, target):
import hashlib
import json
s = hashlib.sha256()
s.update(str(block_header))
s.update(json.dumps(variable_assignment, sort_keys=True))
return int('0x' + s.hexdigest(), 16) < target
Here is an example; as we can see, variable_assignment
and block_header
will pass if we are rejecting approximately 66.¯6% of solutions.
example_block_header = 28850183455121774801176625632015427996404687904360779760892949139292104258030L
example_variable_assignment = {
1L: True,
3L: True,
6L: False,
8L: False,
9L: True,
15L: True,
17L: True,
18L: False,
19L: True,
23L: True,
28L: False,
30L: False,
35L: True,
36L: False,
37L: True }
sha256_less_than_target(example_block_header, example_variable_assignment, 2**256 / 3)
True
If we instead demand that approximately 75% of solutions be rejected, this example does not pass.
sha256_less_than_target(example_block_header, example_variable_assignment, 2**256 / 4)
False
We now present all of our criteria together as a comprehensive proof of work validation scheme.
def proof_of_work_validate(solution, parameters):
return (satisfied(solution["clauses"], solution["variable assignment"]) and
sat_problem_generated(parameters["block header"], solution["generators"], solution["clauses"]) and
number_of_clauses(parameters["number of clauses"], solution["clauses"]) and
all_clauses_width(parameters["clause width"], solution["clauses"]) and
no_repeated(solution["clauses"]) and
clauses_connected_by_traversal(solution["clauses"], solution["spanning tree traversal"]) and
number_of_generators(parameters["number of generators"], solution["generators"]) and
min_generator(parameters["minimal generator"], solution["generators"]) and
sha256_less_than_target(parameters["block header"], solution["variable assignment"], parameters["target"]))
In this section, we present a mining algorithm. Since the acceptance criteria are open ended, there are many possible mining algorithms.
Mining, like ordinary SAT solving, can be carried out via heuristic backtrack search. Below is a simple depth first search algorithm making use of python generators. By construction, this algorithm runs in O(EXP(n)), where n is the number of clauses.
def mine(params):
i = 0
while True:
start = params["minimal generator"] + i
generators = range(start, start + parameters["number of generators"])
for state in initial_states(generators, params):
for solution in search(state,params):
if sha256_less_than_target(params["block header"], solution["variable assignment"], params["target"]):
yield solution
else:
break
i += 1
def search(state, params):
if (len(state["clauses"]) == params["number of clauses"]):
yield {"clauses": state["clauses"],
"spanning tree traversal": state["spanning tree traversal"],
"variable assignment": state["variable assignment"],
"generators": state["generators"]}
else:
for new_state in possible_new_states(state, params):
for solution in search(new_state, params):
yield solution
Our solution makes use of a search state data structure; for simplicity we model it as immutable and copy-on-write. At each stage of the search, a pool of possible state transitions is lazily generated, and each one is searched recursively.
def possible_new_states(state, params):
from itertools import combinations
for clause in combinations(generate_literals(state["current random"],
state["generators"]),
params["clause width"]):
parent_clause_index = None
for literal in clause:
if literal["variable"] in state["variable spanning tree indexes"]:
parent_clause_index = state["variable spanning tree indexes"][literal["variable"]]
break
if (parent_clause_index == None or
not clause_width(params["clause width"], clause) or
clause_variables(clause) in state["visited clause variables"]):
continue
for pivot in clause:
if not (pivot["variable"] in state["variable assignment"] and
state["variable assignment"][pivot["variable"]] != pivot["sign"]):
yield update_state(state, clause, parent_clause_index, pivot)
Below, we give how initial states are generated. Book keeping variables include: which clauses have been visited, the state of the constructed spanning tree, and the state of the variable assignment under construction.
def initial_states(generators, params):
from itertools import combinations
for clause in combinations(generate_literals(params["block header"], generators),
params["clause width"]):
if not clause_width(params["clause width"], clause):
continue
for pivot in clause:
yield {"current random": next_random(params["block header"] +
sum(literal["generator"]
for literal in clause)),
"clauses": [clause],
"variable assignment": {pivot["variable"]: pivot["sign"]},
"spanning tree traversal": [{ "child clause index": 0 }],
"visited clause variables": set([clause_variables(clause)]),
"variable spanning tree indexes": { literal["variable"]: 0
for literal in clause },
"generators": generators }
Finally, we give how states are updated when new information arrives.
def update_state(state, clause, parent_clause_index, pivot):
from copy import deepcopy
index = len(state["clauses"])
new_state = deepcopy(state)
new_state["clauses"].append(clause)
new_state["visited clause variables"].add(clause_variables(clause))
new_state["variable assignment"][pivot["variable"]] = pivot["sign"]
for l in clause:
new_state["variable spanning tree indexes"][l["variable"]] = index
new_state["spanning tree traversal"].append({"parent clause index": parent_clause_index,
"child clause index": index})
new_state["current random"] = \
next_random(state["current random"] +
sum(literal["generator"] for literal in clause))
return new_state
Putting it together, we can verify that indeed our mining algorithm satisfies our proof of work validation scheme
block_header = 59807446017497444054206621583131621044750062436043876690313232144928752617878L
parameters = {
"block header": 59807446017497444054206621583131621044750062436043876690313232144928752617878L,
"number of clauses": 50,
"clause width": 3,
"target": 2**256 / 2,
"minimal generator": 40,
"number of generators": 8
}
solution = mine(parameters).next()
proof_of_work_validate(solution, parameters)
True
In this section, we look at the performance of the mining algorithm under different difficulties. A model for expected performance given a set difficulty is provided. We first show empirically that BitCoin mining conforms to this model, and then go on to show that SATcoin mining also conforms.
In this section, we give a simple reference model for BitCoin mining: bitcoin mining time follows a geometric distribution with respect to difficulty.
We begin by assuming that sha256
is an unbiased, uniformly random value. For a set value, the probability that a given nonce will fail to return a hash with value less than 2256/D is 1/D. This means that the number of times X the miner must compute a sha256
hash to mine a currency is governed by a geometric random distribution:
Next, we model computing the sha256
hash of a block header and nonce, as in conventional BitCoin mining, as taking constant time ˆC. We can think of ˆC as the time it takes to mine when difficulty is set to 0. Let T be a random variable describing the time to mine a coin. We can derive a closed form for the expected time mine a bitcoin as follows:
This model predicts that expect time for mining should increase linearly with difficulty D.
We can test this model by writing a simple miner, as follows:
def bitcoin_mine(block_header, target):
import hashlib
nonce = 0
while int('0x' + hashlib.sha256(hashlib.sha256(str(block_header + nonce)).digest()).hexdigest(), 16) >= target:
nonce += 1
return nonce
Timing for mining can be carried out using the standard python benchmarking library timeit
:
from timeit import timeit
max_val = 2**256
def time_bitcoin_difficulty(d):
return timeit('bitcoin_mine(randint(0,max_val), %d)' % (max_val / d),
'from random import randint ; from __main__ import bitcoin_mine, max_val',
number=1)
bitcoin_difficulties = range(100,1000,10)
bitcoin_sample_times = [[time_bitcoin_difficulty(d) for _ in range(500)] for d in bitcoin_difficulties]
We can now carry out Monty-Carlo experiments involving our expected time to mine model. Since our hypothesis is that mining should increase linearly with difficulty, it should fit nicely with a linear regression:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import numpy as np
from pylab import polyfit
import matplotlib.pyplot as plt
average_times = map(np.mean, bitcoin_sample_times)
C,k = polyfit(bitcoin_difficulties, average_times, 1)
plt.plot(bitcoin_difficulties, average_times)
plt.plot(bitcoin_difficulties, [C*d + k for d in bitcoin_difficulties], '-r')
plt.title("Average Mining Time (Linear)")
plt.show()
print "Estimated Ĉ:", C, "secs"
print "Overhead:", k, "secs"
Estimated Ĉ: 4.00361967002e-06 secs Overhead: 9.21835800638e-05 secs
We can see that mining occasionally has some outliers which create spikes in the average mining time. This is a consequence of the geometric distribution; while the average time increases linearly with difficulty, variance increases quadratically. We can derive a closed form the variance as follows:
Empirically, we should expect variance to roughly follow a quadratic formula k+C1D+C2D2, which takes into account computational overhead and other sources of noise. This model should follow the idealized model closely; we expect:
variances = map(np.var, bitcoin_sample_times)
C2,C1,k = polyfit(bitcoin_difficulties, variances, 2)
plt.plot(bitcoin_difficulties, variances)
plt.plot(bitcoin_difficulties, [C2*d*d + C1*d + k for d in bitcoin_difficulties], '-r')
plt.title("Mining Time Variance (Quadratic)")
plt.show()
print "C₁:", C1, "C₂:", C2
print "Estimated Ĉ²:", C**2
C₁: -3.4142328846e-10 C₂: 1.61648134516e-11 Estimated Ĉ²: 1.60289704622e-11
As we can see, our model's predictions are quite accurate for the projected variance of mining times.
Finally, we can infer a Cummulative Distribution Function (CDF) for mining time based on the geometric distribution; it is approximated as follows:
Based on the CDF, we can perform hypothesis testing based on the Kolmogorov-Smirnov Test. The default in scipy.stats
is two-sided.
import scipy.stats
import pandas
data = []
for d, samples in zip(bitcoin_difficulties[::5], bitcoin_sample_times[::5]):
cdf = lambda t: 1 - (1 - (1. / d))**(t / C)
ks_statistic, p_value = scipy.stats.kstest(samples, cdf)
data.append({"Difficulty D": d,
"KS Statistic": ks_statistic,
"P-Value": p_value})
pandas.DataFrame.from_records(data)
Difficulty D | KS Statistic | P-Value | |
---|---|---|---|
0 | 100 | 0.108416 | 1.424640e-05 |
1 | 150 | 0.154282 | 7.376921e-11 |
2 | 200 | 0.066713 | 2.227100e-02 |
3 | 250 | 0.030847 | 7.282142e-01 |
4 | 300 | 0.072371 | 1.008654e-02 |
5 | 350 | 0.037753 | 4.690235e-01 |
6 | 400 | 0.052314 | 1.250579e-01 |
7 | 450 | 0.041803 | 3.388827e-01 |
8 | 500 | 0.021436 | 9.756494e-01 |
9 | 550 | 0.040351 | 3.821977e-01 |
10 | 600 | 0.047582 | 2.013377e-01 |
11 | 650 | 0.038995 | 4.260011e-01 |
12 | 700 | 0.059341 | 5.675518e-02 |
13 | 750 | 0.044131 | 2.769786e-01 |
14 | 800 | 0.038618 | 4.387742e-01 |
15 | 850 | 0.065284 | 2.692867e-02 |
16 | 900 | 0.026932 | 8.613362e-01 |
17 | 950 | 0.039018 | 4.252390e-01 |
Recall that the null-hypothesis in the KS-Test is that the two distributions are the same, and that we reject the null hypothesis when the p-value is high or the KS-Statistic is high. As we can see, the KS-Statistic itself is always low, while the p-value it typically low up to higher difficulties D. As D increases the underlying geometric distribution approaches an exponential distribution, where statistics can be dominated by outliar behavior even in Monty-Carlo experiments (a point Nassim Taleb makes at length in his writings, most notably The Black Swan (2007)).
Unlike bitcoin, the SATCoin protocol outlined has more difficulty parameters. The parameters we investigate are:
We begin by looking at adjusting the target difficulty. We observe that SATCoin follows the same geometric distribution statistics that BitCoin follows:
def random_parameters_with_difficulty(d, number_of_clauses, min_generator):
from random import randint
return {
"block header": randint(0,2**256),
"number of clauses": number_of_clauses,
"clause width": 3,
"target": 2**256 / d,
"minimal generator": min_generator,
"number of generators": 8
}
def time_sat_difficulty(d,n,m):
return timeit('next(mine(random_parameters_with_difficulty(%d,%d,%d)))' % (d,n,m),
'from __main__ import random_parameters_with_difficulty, mine',number=1)
sat_difficulties = range(1,50)
sat_sample_times = [[time_sat_difficulty(d,10,5) for _ in range(100)] for d in sat_difficulties]
The statistical behavior of mining is the same as BitCoin, however there is a lot more variance and computational overhead. As we can see the average mining time increases linearly:
plt.plot(sat_difficulties, map(np.mean, sat_sample_times))
C,k = polyfit(sat_difficulties, map(np.mean, sat_sample_times), 1)
plt.plot(sat_difficulties, [C*d + k for d in sat_difficulties], '-r')
plt.title("Average Mining Time (Linear)")
plt.show()
print "Estimated Ĉ:", C, "secs"
print "Overhead:", k, "secs"
Estimated Ĉ: 0.00395911297141 secs Overhead: -0.00852927855691 secs
Similarly variance increases quadratically, however we can see that it is much higher than BitCoin:
variances = map(np.var, sat_sample_times)
C2,C1,k = polyfit(sat_difficulties, variances, 2)
plt.plot(sat_difficulties, variances)
plt.plot(sat_difficulties, [C2*d*d + C1*d + k for d in sat_difficulties], '-r')
plt.title("Mining Time Variance (Quadratic)")
plt.show()
print "C₁:", C1, "C₂:", C2
print "Estimated Ĉ²:", C**2
C₁: 7.84860076556e-05 C₂: 1.35902456088e-05 Estimated Ĉ²: 1.56745755204e-05
As a result, our statistical hypothesis testing is also weaker:
data = []
for d, samples in zip(sat_difficulties[4::5], sat_sample_times[4::5]):
cdf = lambda t: 1 - (1 - (1. / d))**(t / C)
ks_statistic, p_value = scipy.stats.kstest(samples, cdf)
data.append({"Difficulty D": d,
"KS Statistic": ks_statistic,
"P-Value": p_value})
pandas.DataFrame.from_records(data)
Difficulty D | KS Statistic | P-Value | |
---|---|---|---|
0 | 5 | 0.126027 | 0.076387 |
1 | 10 | 0.094064 | 0.320197 |
2 | 15 | 0.091186 | 0.356983 |
3 | 20 | 0.093278 | 0.329957 |
4 | 25 | 0.120947 | 0.098627 |
5 | 30 | 0.124660 | 0.081910 |
6 | 35 | 0.153303 | 0.016174 |
7 | 40 | 0.072371 | 0.669294 |
8 | 45 | 0.093874 | 0.322536 |
This poor behavior may just be an artifact that our sample miner is written in Python rather than in C. However, below we see some other statistics that lead us to believe that our miner is in general less statistically stable than BitCoin mining.
Our miner, like all SAT solvers, performs backtrack tree search. We can thus expect that its worst case performance is exponential in the clause length. Since we are giving it random input, Murphy's law dictates we should expect it to stick to its worst case performance, especially because it is written quite naïvely.
clause_lengths = range(10,100,10)
sat_clause_sample_times = [[time_sat_difficulty(1,l,5) for _ in range(100)] for l in clause_lengths]
plt.plot(clause_lengths, map(np.mean, sat_clause_sample_times))
C,k = polyfit(clause_lengths, map(lambda x: np.log(np.mean(x)), sat_clause_sample_times), 1)
plt.plot(clause_lengths, [np.exp(C*d + k) for d in clause_lengths], '-r')
plt.yscale('log')
plt.title("Average Mining Time (Exponential)")
plt.show()
Variance too scales exponentially with clause length:
plt.plot(clause_lengths, map(np.var, sat_clause_sample_times))
C,k = polyfit(clause_lengths, map(lambda x: np.log(np.var(x)), sat_clause_sample_times), 1)
plt.plot(clause_lengths, [np.exp(C*d + k) for d in clause_lengths], '-r')
plt.yscale('log')
plt.title("Mining Time Variance (Exponential)")
plt.show()
While our miner is naïve, we have clearly defined mining as a kind of exponential search. As a consequence, we can expect that using the number of clauses as a measure of difficulty for SATCoin will lead to greater variance in coin issuance than BitCoin. On the other hand, dynamic clause depth is where SAT research shines the most, since theoretical breakthroughs such as conflict clause learning are arguably what have enabled advances in SAT over the last 10 years.
We finally look at restrictions on the minimal generator as a difficulty measure. Monty Carlo statistics once again suggests an exponential distribution.
minimal_generators = range(100,1000,100)
sat_min_gen_sample_times = [[time_sat_difficulty(1,5,g) for _ in range(100)] for g in minimal_generators]
plt.plot(minimal_generators, map(np.mean, sat_min_gen_sample_times))
C,k = polyfit(minimal_generators, map(lambda x: np.log(np.mean(x)), sat_min_gen_sample_times), 1)
plt.plot(minimal_generators, [np.exp(C*d + k) for d in minimal_generators], '-r')
plt.yscale('log')
plt.title("Average Mining Time (Exponential)")
plt.show()
plt.plot(minimal_generators, map(np.var, sat_min_gen_sample_times))
C,k = polyfit(minimal_generators, map(lambda x: np.log(np.var(x)), sat_min_gen_sample_times), 1)
plt.plot(minimal_generators, [np.exp(C*d + k) for d in minimal_generators], '-r')
plt.yscale('log')
plt.title("Mining Time Variance (Exponential)")
plt.show()
We have presented a viable alternative to BitCoin, albeit with less simple statistical behavior. Even so, by design it can take advantage of BitCoin's conventional issuance protocol, since it has the same statistical behavior with respect to difficulty as BitCoin.
It is likely that given sufficient market incentive, special hardware will be divised to tackle any programming challenge, no matter how formidable. However, an engineering reality is that challenging computer science problems, such as SAT and Linear Programming solvers, have resisted such innovations to date, despite wide use in industry.
There are many refinements that can be made to the SAT solver presented here, which is solely presented as a proof of concept. Primarily, the innovations found in MiniSat, most notably conflict clause management, should be introduced.