We know that the fixed precision representation of real numbers is done by mapping them into integer space I from real space R. However the underlying integer representation of the 32-bit floats and 64-bit doubles is a sign magnitude representation (that is the reason why we see -0.00 during output of some numerical algorithms). People seem to be adding a constant to convert the underlying sign magnitude form to 2's compliment. However following is a more logical conversion (since we know that 2's compliment is 1's compliment +1 , the motivation of 2's compliment is to avoid the -0 as we know). The following comparison routine compares two floating point numbers (add the tolerance in the terms of the number the real numbers to make it approximate comparison).
/*Less than or greater than*/ char CompareFloats(float a, float b){ int aint = *(int *)&a; int bint = *(int *)&b; aint = (aint & (1UL<<31))? ~(aint^(1UL<<31))+1:aint; bint = (bint & (1UL<<31))? ~(bint^(1UL<<31))+1:bint; aint -= bint; printf("%f %s %f \n",a,(aint <0)?"<":">=",b); printf("There are %d real numbers \n",(aint<0)?-aint:aint); printf("between %f and %f (in 32 bit float representation)\n",a,b); }
No comments:
Post a Comment