Maximum/minimum representable integers.
The maximum representable integer is the largest integer
i for which i+1>i holds true.
Using the while loop determine your maximum
integer and compare it with "int.MaxValue".
Something like
int i=1; while(i+1>i) {i++;}
Write("my max int = {0}\n",i);
It can take some seconds to calculate.
The minimum representable integer is the most negative
integer i for which i-1<i holds
true.
Using the while loop determine your minimum
integer and compare with int.MinValue.
while loop calculate the
machine epsilon for the types float and double.
Something like
double x=1; while(1+x!=1){x/=2;} x*=2;
float y=1F; while((float)(1F+y) != 1F){y/=2F;} y*=2F;
There seem to be no predefined values for this numbers in csharp (I
couldn't find it in any case). However, in a IEEE 64-bit floating-point
number (double), where 1bit is reserved for the sign and 11bits for
exponent, there are 52bits remaining for the fraction, therefore the
double machine epsilon must be about System.Math.Pow(2,-52).
For single precision (float) the machine epsilon should be about
System.Math.Pow(2,-23).
Check this.
Suppose tiny=epsilon/2. Calculate the two sums,
sumA=1+tiny+tiny+...+tiny; sumB=tiny+tiny+...+tiny+1;which should seemingly be the same and print out the values
sumA-1
and sumB-1. Someting like
int n=(int)1e6;
double epsilon=Pow(2,-52);
double tiny=epsilon/2;
double sumA=0,sumB=0;
sumA+=1; for(int i=0;i<n;i++){sumA+=tiny;}
for(int i=0;i<n;i++){sumB+=tiny;} sumB+=1;
WriteLine($"sumA-1 = {sumA-1:e} should be {n*tiny:e}");
WriteLine($"sumB-1 = {sumB-1:e} should be {n*tiny:e}");
Explain why there is a difference.
double d1 = 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1; double d2 = 8*0.1;both doubles "d1" and "d2" should be equal 0.8 and then the "==" operator should produce the "true" result. However, try
WriteLine($"d1={d1:e15}");
WriteLine($"d2={d2:e15}");
WriteLine($"d1==d2 ? => {d1==d2}");
and see that this is not the case (not in my box in any case). That
is because the decimal number 0.1 cannot be represented exactly as a
52-digit binary number.
For this reason, one needs a more complex comparison algorithm. Two doubles in a finite digit representation can only be compared with the given absolute and/or relative precision (where the values for the precision actually depend on the task at hand and generally must be supplied by the user).
Therefore, implement a function with the signature
bool approx(double a, double b, double acc=1e-9, double eps=1e-9)that returns "
true" if the numbers 'a' and 'b' are equal
either with absolute precision "acc",
|a-b| < accor with relative precision "epsilon",
|a-b|/Max(|a|,|b|) < epsand returns "
false" otherwise.
Something like
public static bool approx
(double a, double b, double acc=1e-9, double eps=1e-9){
if(Abs(b-a) < acc) return true;
else if(Abs(b-a) < Max(Abs(a),Abs(b))*eps) return true;
else return false;
}