In the ideal mathematical world, operations like $1+2=3$, $4\times 3 = 12$, $(\sqrt{2})^2 = 2$ are unambiguously defined, however, when one is representing numbers in a computer, this is no longer true. The main reason of this is the so-called finite arithmetic, what is the way in which a computer performs basic operations. Some features of finite arithmetic are stated below:
In spite of this, defining adequately the set of elements in which our computer will operate, round-off methods can be systematically neglected, yielding correct results within reasonable error margins. In some pathological cases, when massive iterations are required, these errors must be taken into account more seriously.
import numpy as np
As everyone knows, the base of the modern computation is the binary numbers. The binary base or base-2 numeral system is the simplest one among the existing numeral bases. As every electronic devices are based on logic circuits (circuits operating with logic gates), the implementation of a binary base is straightforward, besides, any other numeral system can be reduced to a binary representation.
According to the standard IEEE 754-2008, representation of real numbers can be done in several ways, single-precision and double precision are the most used ones.
Single-precision numbers are used when one does not need very accurate results and/or need to save memory. These numbers are represented by a 32-bits (Binary digIT) lenght binary number, where the real number is stored following the next rules:
The formula for recovering the real number is then given by:
$$r = (-1)^s\times \left( 1 + \sum_{i=1}^{23}b_{23-i}2^{-i} \right)\times 2^{e-127}$$where $s$ is the sign, $b_{23-i}$ the fraction bits and $e$ is given by:
$$e = \sum_{i=0}^7 b_{23+i}2^i$$Next, it is shown a little routine for calculating the value of the represented 32-bits number
def number32( binary ):
#Inverting binary string
binary = binary[::-1]
#Decimal part
dec = 1
for i in xrange(1,24):
dec += int(binary[23-i])*2**-i
#Exponent part
exp = 0
for i in xrange(0,8):
exp += int(binary[23+i])*2**i
#Total number
number = (-1)**int(binary[31])*2**(exp-127)*dec
return number
number32( "00111110001000000000000000000000" )
0.15625
Single-precision system can represent real numbers within the interval $\pm 10^{-38} \cdots 10^{38}$, with $7-8$ decimal digits.
#Decimal digits
print "\n"
print "Decimal digits contributions for single precision number"
print 2**-23., 2**-15., 2**-5. , "\n"
#Largest and smallest exponent
suma = 0
for i in xrange(0,8):
suma += 2**i
print "Largest and smallest exponent for single precision number"
print 2**(suma-127.), 2**(-127.),"\n"
Decimal digits contributions for single precision number 1.19209289551e-07 3.0517578125e-05 0.03125 Largest and smallest exponent for single precision number 3.40282366921e+38 5.87747175411e-39
Double-precision numbers are used when high accuracy is required. These numbers are represented by a 64-bits (Binary digIT) lenght binary number, where the real number is stored following the next rules:
The formula for recovering the real number is then given by:
$$r = (-1)^s\times \left( 1 + \sum_{i=1}^{52}b_{52-i}2^{-i} \right)\times 2^{e-1023}$$where $s$ is the sign, $b_{23-i}$ the fraction bits and $e$ is given by:
$$e = \sum_{i=0}^{10} b_{52+i}2^i$$Double-precision system can represent real numbers within the interval $\pm 10^{-308} \cdots 10^{308}$, with $16-17$ decimal digits.
1. Write a python script that calculates the double precision number represented by a 64-bits binary.
2. What is the number represented by:
0 10000000011 1011100100001111111111111111111111111111111111111111
**ANSWER:** 27.56640625The most basic arithmetic operations are addition and multiplication. Further operations such as subtraction, division and power are secondary as they can be reached by iteratively use the latter ones.
As mentioned before, arithmetic operations are not exact in a computer due to the inherent limitations in number representing. Even when adding two already approximate numbers, say a single-precision couple of numbers, the result may not be a representable number, being necessary to apply approximation rules.
N = 9
x = 0
for i in xrange(N):
x += np.float16(1.0/N)
print x
0.999755859375
Note that the sucessive application of rounded-off numbers produces a final result less precise.
print "5/7", np.float32(5/7.)
print "1/3", np.float32(1/3.)
print np.float32(5/7.+1/3.), 22/21.
print "Error:", np.float32(5/7.+1/3.)-22/21.
5/7 0.714286 1/3 0.333333 1.04762 1.04761904762 Error: 5.67663283046e-08
Although the float16 or half-float precision is standard according to the IEEE 754-2008, many devices do not support it well.
For multiplication it is applied the same round-off rules as the addition, however, be aware that multiplicative errors propagate more quickly than additive errors.
N = 20
x = 1
for i in xrange(N):
x *= np.float16(2.0**(1.0/N))
print x, np.float16(5/7.)
1.99580530418 0.71436
The final result has an error at the third decimal digit, one more than the case of addition.
ACTIVITY
Find the error associated to the finite representation in the next operations
$$ x-u, \frac{x-u}{w}, (x-u)*v, u+v $$considering the values
$$ x = \frac{5}{7}, y = \frac{1}{3}, u = 0.71425 $$$$ v = 0.98765\times 10^5, w = 0.111111\times 10^{-4} $$