Floating Point Arithmetic - how its done: ----------------------------------------- CS 205 Introducing Computational Programming 2/23/97 Kris Stewart San Diego State University NOTE - this is a simplified model of floating point representation of numbers and arithmetic. It is hopefully going to convey a "feeling" of how floating point works without laboring on details. The word size on the computer is a fixed number of bits. For simplicity, let's work with digits (decimal number system) rather than the binary number system used in a computer and work only with positive numbers, thereby avoid using up one of the digits to "store" the sign of the number. Assume our computer word holds 8 digits. So the largest integer would be 99999999 (or 99,999,999 roughly 100 million). We also want to examine "real numbers" which are represented in floating point (or scientific) notation, e.g. .15679 * 10^2 (= 15.679). The two parts of this representation are the "fraction" (15679 above) which control precision or accuracy and the "exponent" (2 above). It was assumed that we are working with decimal values, so the 10 need not be stored. It is also assumed that we have "normalized" values to have no leading digit, e.g. 15.679 = 1.5679 * 10^1 = .015679 * 10^3 are all equivalent representations of the same value. In order to represent values on a computer we must decide how many of our 8 decimal digits to allocate to the mantissa, or fractional, part and how many to allocate to the exponent. --------------------------------- | | | | | | | | | --------------------------------- We'll make an arbitrary decision to allow 5 digits for the mantissa and which leaves 3 digits for the exponent. We'll make the assumption that this is how the hardware has been designed. Although we are excluding negative numbers, so that the mantissa portion does not have to use a digit to keep track of the sign, we will allow exponents to be negative, so that the "fascinating" considerations of underflow and overflow can be addressed. We'll let the + be presented by 0 and - by 1 --------------------------------- | 1 | 5 | 6 | 7 | 9 | 0 | 0 | 2 | --------------------------------- ^ | -- denotes sign of exponent First, lets examine adding two floating point values .15679e2 + .72831e4 The first step is to adjust the exponents so they match. There will be a specific algorithm coded into the "microcode" of the hardware deciding which exponent is normalized to which, but for this example, we'll normalize to the smaller exponent .15679e2 + .72831e4 = (15.679 + 7283.1 =) = .15679e2 + 72.831e2 = 72.98779e2 = .7298779e4 We can only store 5 digit in the mantissa, so must round (or chop) the result to five digits. We'll choosing chopping since this is typically a faster operation = .72987e4 (Rounding would have yield the answer .72988e4) What is the largest number that this system can store? .99999e999 What is the smallest (positive) number? (we choose to have negative exponents, so one of the digits must be used to store that information) .10000e-99 What is the "unit roundoff"? .10000e1 + u = .10000e1 u = .1e-3 We can verify this by aligning exponents (to the smaller exponent as we did above), performing the add and then chopping the result to remain within the range of storable numbers: representable final values value ------------- ----- .1e1 + .1e-3 = 1000.0e-3 + .1e-3 = 1000.1e-3 = .10001e1 align exponents 5 digits in mantissa There were specific choices made above - chopping instead of rounding - aligning to smaller exponent instead of larger exponent - but the overall results would be roughly the same with different choices. One of the fundamental properties of floating point arithmetic on any computer is that you will want to know the "unit roundoff" of the platform you use and this can be easily computed. You have examples that actually try to find out the level of this "rounding" of rohan for lab today.