Representaciones numéricas

Presenter Notes

Resumen:

Representaciones numéricas:
- Racionales.
- Enteros.

Nicolás Wolovick, 20200316

Presenter Notes

Representaciones de racionales

Presenter Notes

Representaciones posibles

"La computación es la ciencia de la abstracción".

¿Mapeo no inyectivo R -> cantidad finita de bits?

BCD: 123.45 -> 0001 0010 0011 0100 0101
Racionales: 123.45 -> 12345/100
Punto fijo: 123.45 -> 11000000111001
Mantisa/Exponente: 123.45 -> 0.12345 * 10^3

Si hacemos arqueología computacional, vamos a encontrar de todo.

Ferranti Mercury.
IBM 1620 Computer with ADvanced Economic Technology (crítica)
IBM 360.
MS Basic (MBF).

Hay cosas curiosas, la ZX81 sólo manejaba flotantes cuando su µP sumaba enteros de hasta 16 bits.

Presenter Notes

Representaciones de punto flotante

(-1)^s * (1+mantisa) * 2^{exponente-offset}

Normalized Floating Point Representation

Verdades de Pedro Grullo:

No podemos capturar todos los racionales.
Hay más precisión cerca del 0.
Hay un máximo y mínimo positivo (negativo).
Estaría bueno tener representaciones denormalizadas.

Presenter Notes

Problemitas

a+b = a, b!=0

1 REAL*4 X,Y
2 X = 1.25E8
3 Y = X + 7.5E-3
4 IF (X .EQ. Y) THEN
5     PRINT *,'Am I nuts or what?'
6 ENDIF
7 PRINT *,X,Y
8 END

0.1*10 != 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1

 1 REAL*4 X,Y
 2 X = 0.1
 3 Y = 0
 4 DO I=1,10
 5     Y = Y+X
 6 ENDDO
 7 IF (Y .EQ. 1.0) THEN
 8     PRINT *,'Algebra is truth'
 9 ELSE
10     PRINT *,'Not here'
11 ENDIF
12 PRINT *,1.0-Y
13 END

Presenter Notes

Problemitas

x^-1 * x != 1

1 INTEGER X
2 DO X=1, 1000
3     IF ((1.0/X)*X .NE. 1.0) THEN
4         PRINT *, "X^-1*X != 1 for X=", X
5     ENDIF
6 ENDDO
7 END

Imprime 135 números.

Presenter Notes

Explicación

Loss of accuracy while aligning the decimal point

La suma no es asociativa.
No hay inverso multiplicativo.

Ejemplo

1 (X+Y)+Z = (.00005 + .00005) + 1.0000
2         = .0001 + 1.0000
3         = 1.0001
4 X+(Y+Z) = .00005 + (.00005 + 1.0000)
5         = .00005 + 1.0000
6         = 1.0000

Presenter Notes

La suma no es asociativa. Consecuencias

El no-determinismo en el orden de la suma se pierde.
No es un conjunto es una secuencia.

Secuencialidad => Corrección

¿Adiós paralelismo!

Recetas para sumar bien.
Recetas para sumar mal, pero rápido ... muy rápido.

Shhh, no le digan a nadie ...

El estándar de "C" dice que (x+y)+z se puede evaluar como x+(y+z), pero en el C Standard 5.1.2.3 Program execution previene diferentes evaluaciones si dan diferentes resultados.

¿Y del inverso multiplicativo?

Transformar divisiones en multiplicaciones siempre que se pueda.

Presenter Notes

Kahan summation algorithm

 1 float kahan(void) {
 2         float s = x[0];
 3         float c = 0.0f;
 4         for (int i = 1; i < N; i++) {
 5             float y = x[i] - c;
 6             float t = s + y;
 7             c = (t - s) - y;
 8             s = t;
 9         }
10         return s;
11 }

Toda la precisión de fp64 usando fp32.
Ojo con las optimizaciones, -O3 -ffast-math: caga la fruta.
No es paralelizable ... puaj!

John D. Cook, Summing an array of floating point numbers, 20191105.

Presenter Notes

El estándar: IEEE754-1985

IEEE754

Presenter Notes

IEEE754-1985

Presenter Notes

IEEE754-1985

También define el "como" de las computaciones:

Suma, resta.
Multiplicación, división.
Raíz cuadrada.
Conversion de y hacia enteros.

Hay no-determinismo en esta especificación
Esto permite diferentes implementaciones.

Implementaciones: ARM, Intel, Cray, NVIDIA, AMD, Xilnix, IBM, ...

Presenter Notes

Guard, round and sticky bits (`grs`)

Mejora la precisión

Computation using guards and sticky bits

Presenter Notes

Como truncar según `grs`

Todas las implementaciones permiten modificar el rounding mode.

Karen Miller, Floating Point Representation, 2006.

Presenter Notes

Valores especiales

Notar

El cero es un ristra de bits en 000...0.
Números denormalizados para un gradual underflow.
Como operan los valores especiales.
- NaN es una constante que se propaga infinitamente.

Presenter Notes

En la Práctica

Presenter Notes

`fp32` vs. `fp64`

Los costos (velocidad) dependen de la plataforma (Intel, AMD, NVIDIA).
fp64 ocupa el doble de memoria.
- Doble de transferencia y la memoria es LENTA.

Notas

Usualmente hay mayores costos asociados a fp64.
- La diferencia entre una GTX 780 y una Tesla K20c.
Lo ideal es armar algoritmos mixed precision.
Antes era fp16, fp24 y no IEEE754 compatible.
- Gaming!
- La GTX 280 no era IEEE754, recién CC2.0 (Fermi) fue IEEE754.
Ahora está de moda fp16 para DNN y están encastrados en los Tensor Cores de Volta.
Intel siempre fue internamente fp80 y por fuera fp32 o fp64.

Presenter Notes

Ejemplo: `fp16`, `fp32`, `fp64` en NVIDIA

Ryan Smith, The NVIDIA GeForce GTX 1080 & GTX 1070 Founders Editions Review: Kicking Off the FinFET Generation, Anandtech, July 2016.

Presenter Notes

¿Cuánta precisión necesito?

R. Munroe, Coordinate Precision, XKCD 2170.

Presenter Notes

`fp64`, la jactancia de los ricos

Matsuoka et.al, Double-precision FPUs in High-PerformanceComputing: an Embarrassment of Riches?, 2019.

A snail in the coffin for Linpack and its stunt machines.

By studying a large number of HPC proxy application, we found no significant performance difference between these two processors, despite one having more double-precision compute than the other.

Mixed precision algorithms FTW!
¡Todos los paquetes de MD!

Presenter Notes

ML cambia todo

`bfloat16`

Facebook floating point math

We have developed an alternate approach to making AI models run efficiently. Building on a lineage of ideas reaching back to the early days of computer science more than 70 years ago, our method optimizes floating point itself.

Presenter Notes

Mocos típicos

Daniel Lemire

How many floating-point numbers are in the interval [0,1]?, 2017.

-- no me convence, no pude hacer el experimento que yo quería, lo dejo para que alguien lo mire --

Presenter Notes

Switches en los compiladores

nvcc

--use_fast_math                                    (-use_fast_math)             
        Make use of fast math library. -use_fast_math implies -ftz=true 
        -prec-div=false -prec-sqrt=false.

--ftz [true,false]                                 (-ftz)                       
        When performing single-precision floating-point operations, flush 
        denormal values to zero or preserve denormal values. -use_fast_math 
        implies --ftz=true.
        Default value:  0.

--prec-div [true,false]                            (-prec-div)                  
        For single-precision floating-point division and reciprocals, use IEEE 
        round-to-nearest mode or use a faster approximation. -use_fast_math 
        implies --prec-div=false.
        Default value:  1.

--prec-sqrt [true,false]                           (-prec-sqrt)                 
        For single-precision floating-point square root, use IEEE 
        round-to-nearest mode or use a faster approximation. -use_fast_math 
        implies --prec-sqrt=false.
        Default value:  1.

--fmad [true,false]                                (-fmad)                      
        Enables (disables) the contraction of floating-point multiplies and 
        adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, 
        or DFMA). This option is supported only when '--gpu-architecture' is 
        set with compute_20, sm_20, or higher. For other architecture classes, 
        the contraction is always enabled. -use_fast_math implies --fmad=true.
        Default value:  1.

Presenter Notes

Switches en los compiladores

gcc

-ffast-math
   Sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and
   -fcx-limited-range.

-funsafe-math-optimizations
   Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE
   or ANSI standards.  When used at link-time, it may include libraries or startup files that change the default FPU control word
   or other similar optimizations.

   This option is not turned on by any -O option since it can result in incorrect output for programs that depend on an exact
   implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do
   not require the guarantees of these specifications.  Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and
   -freciprocal-math.

   The default is -fno-unsafe-math-optimizations.

Presenter Notes

Rounding Mode

Intel SSE3

Modificar el registro de estado MXCSR.

FZ  bit 15  Flush To Zero
R+  bit 14  Round Positive
R-  bit 13  Round Negative
RZ  bits 13 and 14  Round To Zero
RN  bits 13 and 14 are 0    Round To Nearest

NVIDIA Kepler

Modificadores de las instrucciones de assembler:

.rn  Round to nearest even.
.rz  Round towards zero.
.rm  Round towards -infty.
.rp  Round towards +infty.

Presenter Notes

Consejos

Buscar switches para relajar o reforzar compatibilidad IEEE.
Usar fp64 de manera juiciosa (ancho de banda, memoria).
- Ej: LAMMPS computa dinámica en fp64 y almacena los vectores en fp32.
Sumar ordenadamente.
Multiplicar antes que las divisiones cuando se pueda.
Multiplicar en vez de dividir.
Comparar por valores cercanos, no iguales.
MUCHÍSIMO CUIDADO con las conversiones de tipo.

Presenter Notes

Unum ¿El futuro?

John Gustafson, A Radical Approach to Computation with Real Numbers, JSFI, 2016.

Presenter Notes

Enteros

Presenter Notes

Enteros

Naturales: notación binaria

Enteros: representación complemento a 2s.

Tamaños en "C" de 32 bits

1  8: char
2 16: short
3 32: int, long
4 64: long long

¡Depende de la arquitectura!

Tamaños en "C" de `64 bits`

1  8: char
2 16: short
3 32: int
4 64: long

Tamaños fijos

1 #include <inttypes.h>
2 
3 int8_t   spin;
4 uint64_t total_sum;

Presenter Notes

Cosas a tener en cuenta

Overflow
- Más performance, mayor tamaño (Gustafson's Law) y con N grande es fácil hacer overflow.
¿Con o sin signo?
- Compilación más estricta.
- Mas bits de alcance.
Tipos ajustados en general.
- Más posibilidades de optimización.
- Más información para que el compilador detecte errores en compile time.
Órden en las operaciones.

Presenter Notes

Cosas a tener en cuenta (cont'd)

Conversiones de tipo!

R. Munroe, INT(PI), XKCD 1275.

Presenter Notes

Bibliografía

Charles Severance, Kevin Dowd, "High Performance Computing", Connexions, 2010, CC3.0.
David Goldberg, "What Every Programmer should know about floating point numbers", ACM Computer Surveys, 1991.
Daniel Lemire, How many floating-point numbers are in the interval [0,1]?, 2017.
Matsuoka et.al, Double-precision FPUs in High-PerformanceComputing: an Embarrassment of Riches?, 2019.

Table of Contents	t
Exposé	ESC
Full screen slides	e
Presenter View	p
Source Files	s
Slide Numbers	n
Toggle screen blanking	b
Show/hide slide context	c
Notes	2
Help	h

Ejemplo

Shhh, no le digan a nadie ...

¿Y del inverso multiplicativo?

Mejora la precisión

Notar

Notas

bfloat16

Facebook floating point math

nvcc

gcc

Intel SSE3

NVIDIA Kepler

Tamaños en "C" de 32 bits

Tamaños en "C" de 64 bits

Tamaños fijos

Conversiones de tipo!

La otra

La otra²

Table of Contents

Help

`bfloat16`

Tamaños en "C" de `64 bits`