IEEE Arithmetic

Chivers, Ian; Sleightholme, Jane

doi:10.1007/978-3-319-75502-1_36

Ian Chivers³ &
Jane Sleightholme⁴

Abstract

The aims of this chapter are to look in more depth at arithmetic and in particular at the support that Fortran provides for the IEEE 754 and later standards. There is a coverage of:

hardware support for arithmetic.
integer formats.
floating point formats: single and double.
special values: denormal, infinity and not a number — nan.
exceptions and flags: divide by zero, inexact, invalid, overflow, underflow.

Any effectively generated theory capable of expressing elementary arithmetic cannot be both consistent and complete. In particular, for any consistent, effectively generated formal theory that proves certain basic arithmetic truths, there is an arithmetical statement that is true, but not provable in the theory.

Godel, First incompleteness theorem

Access provided by CONRICYT-eBooks. Download chapter PDF

Basic Computer Arithmetic

The floating-point environment

Exploiting Structure in Floating-Point Arithmetic

Aims

The aims of this chapter are to look in more depth at arithmetic and in particular at the support that Fortran provides for the IEEE 754 and later standards. There is a coverage of:

hardware support for arithmetic.
integer formats.
floating point formats: single and double.
special values: denormal, infinity and not a number — nan.
exceptions and flags: divide by zero, inexact, invalid, overflow, underflow.

36.1 Introduction

The literature contains details of the IEEE arithmetic standards. The bibliography contains details of a number of printed and on-line sources.

36.2 History

When we use programming languages to do arithmetic two major concerns are the ability to develop reliable and portable numerical software. Arithmetic is done in hardware and there are a number of things to consider:

the range of hardware available both now and in the past.
the evolution of hardware.

There has been a very considerable change in arithmetic units since the first computers. Table 36.1 is a list of hardware and computing systems that the authors have used or have heard of. It is not exhaustive or definitive, but rather reflects the authors’ age and experience.

Table 36.1 Computer hardware and manufacturers

Full size table

Table 36.2 lists some of the operating systems.

Table 36.2 Operating systems

Full size table

Again the list is not exhaustive or definitive. The intention is simply to provide some idea of the wide range of hardware, computer manufacturers and operating systems that have been around in the past 50 years.

To cope with the anarchy in this area Doctor Robert Stewart (acting on behalf of the IEEE) convened a meeting which led to the birth of IEEE 754.

The first draft, which was prepared by William Kahan, Jerome Coonen and Harold Stone, was called the KCS draft and eventually adopted as IEEE 754. A fascinating account of the development of this standard can be found in An Interview with the Old Man of Floating Point, and the bibliography provides a web address for this interview. Kahan went on to get the ACM Turing Award in 1989 for his work in this area.

This has become a de facto standard amongst arithmetic units in modern hardware. Note that it is not possible to describe precisely the answers a program will give, and the authors of the standard knew this. This goal is virtually impossible to achieve when one considers floating point arithmetic. Reasons for this include:

the conversions of numbers between decimal and binary formats.
the use of elementary library functions.
results of calculations may be in hardware inaccessible to the programmer.
intermediate results in subexpressions or arguments to procedures.

The bibliography contains details of a paper that addresses this issue in much greater depth — Differences Among IEEE 754 Implementations.

Fortran is one of a small number of languages that provides access to IEEE arithmetic, and it achieves this via TR1880 which is an integral part of Fortran 2003. The C standard (C9X) addresses this issue and Java offers limited IEEE arithmetic support. More information can be found in the references at the end of the chapter.

36.3 IEEE Specifications

There have been several IEEE arithmetic standards. The following information is taken from the ISO site.

The url is

ISO/IEC/IEEE 60559:2011(E) specifies formats and methods for floating-point arithmetic in computer systems - standard and extended functions with single, double, extended, and extendable precision and recommends formats for data interchange. Exception conditions are defined and standard handling of these conditions is specified. It provides a method for computation with floating-point numbers that will yield the same result whether the processing is done in hardware, software, or a combination of the two. The results of the computation will be identical, independent of implementation, given the same input data. Errors, and error conditions, in the mathematical processing will be reported in a consistent manner regardless of implementation. This first edition, published as ISO/IEC/IEEE 60559, replaces the second edition of IEC 60559.

Here is the standard history.

ISO/IEC/IEEE 60559:2011(E)
IEC 559:1989
IEC 559:1982

The standard provides coverage of the following areas, which is taken from the table of contents.

Floating-point formats
- Overview
- Specification levels
- Sets of floating-point data
- Binary interchange format encodings
- Decimal interchange format encodings
- Interchange format parameters
- Extended and extendable precisions
Attributes and rounding
- Attribute specification
- Dynamic modes for attributes
- Rounding-direction attributes
Operations
- Overview
- Decimal exponent calculation
- Homogeneous general-computational operations
- Format of general-computational operations
- Quiet-computational operations
- Signaling-computational operations
- Non-computational operations
- Details of conversions from floating-point to integer formats
- Details of operations to round a floating-point datum to integral value
- Details of totalorder predicate
- Details of comparison predicates
- Details of conversion between floating-point data and external character sequences
Infinity, NaNs, and sign bit
- Infinity arithmetic
- Operations with NaNs
- The sign bit
Default exception handling
- Overview: exceptions and flags
- Invalid operation
- Division by zero
- Overflow
- Underflow
- Inexact
Alternate exception handling attributes
- Overview
- Resuming alternate exception handling attributes
- Immediate and delayed alternate exception handling attributes
Recommended operations
- Conforming language- and implementation-defined functions
- Recommended correctly rounded functions
- Operations on dynamic modes for attributes
- Reduction operations
Expression evaluation
- Expression evaluation rules
- Assignments, parameters, and function values
- preferred width attributes for expression evaluation
- Literal meaning and value-changing optimizations
Reproducible floating-point results

36.4 Floating Point Formats

Table 36.3 summarises the formats specified in the IEEE 754-2008 standard.

Table 36.3 IEEE formats

Full size table

36.5 Procedure Summary

Tables 36.4 and 36.5 summarise the procedures.

Table 36.4 IEEE Arithmetic module procedure summary

Full size table

Table 36.5 IEEE Exceptions module procedure summary

Full size table

36.6 General Comments About the Standard

The special bit patterns provide the following:

\( +0 \)
\( -0 \)
subnormal numbers in the range 1.17549421E-38 to 1.40129846E-45
\( + \infty \)
\( - \infty \)
quiet NaN (Not a Number)
signalling NaN

One of the first systems that the authors worked with that had special bit patterns set aside was the CDC 6000 range of computers that had negative indefinite and infinity. Thus the ideas are not new, as this was in the late 1970s.

The support of positive and negative zero means that certain problems can be handled correctly including:

The evaluation of the log function which has a discontinuity at zero.
The equation \( \sqrt{1/z} = 1/z \) can be solved when \( z = -1 \)

See also the Kahan paper Branch Cuts for complex Elementary functions, or Much Ado About Nothing’s Sign Bit for more details.

Subnormals, which permit gradual underflow, fill the gap between 0 and the smallest normal number.

Simply stated underflow occurs when the result of an arithmetic operation is so small that it is subject to a larger than normal rounding error when stored. The existence of subnormals means that greater precision is available with these small numbers than with normal numbers. The key features of gradual underflow are:

When underflow does occur there should never be a loss of accuracy any greater than that from ordinary roundoff.
The operations of addition, subtraction, comparison and remainder are always exact.
Algorithms written to take advantage of subnormal numbers have smaller error bounds than other systems.
if x and y are within a factor of 2 then x-y is error free, which is used in a number of algorithms that increase the precision at critical regions.

The combination of positive and negative zero and subnormal numbers means that when x and y are small and x-y has been flushed to zero the evaluation of \( 1 / (x-y) \) can be flagged and located.

Certain arithmetic operations cause problems including:

\( 0 * \infty \)
0 / 0
\( \sqrt{x} \) when \( x < 0 \)

and the support for NaN handles these cases.

The support for positive and negative infinity allows the handling of x / 0 when x is nonzero and of either sign, and the outcome of this means that we write our programs to take the appropriate action. In some cases this would mean recalculating using another approach.

For more information see the references in the bibliography.

36.7 Resume

The above has provided a quick tour of the IEEE standard. We’ll now look at what Fortran has to offer to support it.

36.8 Fortran Support for IEEE Arithmetic

Fortran first introduced support for IEEE arithmetic in ISO TR 15580. The Fortran 2003 standard integrated support into the main standard. Fortran 2018 offers more support, and for more details one should consult Chap. 17 of that document.

The intrinsic modules

ieee_features
ieee_exceptions
ieee_arithmetic

provide support for exceptions and IEEE arithmetic. Whether the modules are provided is processor dependent. If the module ieee_features is provided, which of the named constants defined in this standard are included is processor dependent. The module ieee_arithmetic behaves as if it contained a use statement for ieee_exceptions; everything that is public in ieee_exceptions is public inieee_arithmetic.

The first thing to consider is the degree of conformance to the IEEE standard. It is possible that not all of the features are supported. Thus the first thing to do is to run one or more test programs to determine the degree of support for a particular system.

36.9 Derived Types and Constants Defined in the Modules

The modules

ieee_exceptions
ieee_arithmetic
ieee_features

define five derived types, whose components are all private.

36.9.1 ieee_exceptions

This module defines ieee_flag_type, for identifying a particular exception flag.

Possible values are

The module also defines the array named constants

The last is for saving the current floating point status.

36.9.2 ieee_arithmetic

This module defines ieee_class_type, for identifying a class of floating-point values.

Possible values are:

The module defines ieee_round_type, for identifying a particular rounding mode. Its only possible values are those of named constants defined in the module: ieee_nearest, ieee_to_zero, ieee_up, and ieee_down for the ieee_modes; and ieee_other for any other mode.

The elemental operator == for two values of one of these types to return true if the values are the same and false otherwise.

The elemental operator /= for two values of one of these types to return true if the values differ and false otherwise.

36.9.3 ieee_features

This module defines ieee_features_type, for expressing the need for particular ieee_features. Its only possible values are those of named constants defined in the module:

ieee_datatype
ieee_denormal
ieee_divide
ieee_halting
ieee_inexact_flag
ieee_inf
ieee_invalid_flag
ieee_nan
ieee_rounding
ieee_sqrt
ieee_underflow_flag

36.9.4 Further Information

There are a number of additional sources of information.

the Fortran standard.
documentation that comes with your compiler.

The latter has the benefit of describing what is supported in that compiler.