Representaciones numéricas

Presenter Notes

Resumen:

Terminamos con la Clase 1: arquitectura de computadoras.
Representaciones numéricas:
- Racionales.
- Enteros.

Nicolás Wolovick, 20160310

Presenter Notes

Representaciones de racionales

Presenter Notes

Representaciones posibles

"La computación es la ciencia de la abstracción".

¿Mapeo no inyectivo R -> cantidad finita de bits?

BCD: 123.45 -> 0001 0010 0011 0100 0101
Racionales: 123.45 -> 12345/100
Punto fijo: 123.45 -> 11000000111001
Mantisa/Exponente: 123.45 -> 0.12345 * 10^3

Si hacemos arqueología computacional, vamos a encontrar de todo.

Ferranti Mercury.
IBM 360.
MS Basic (MBF).

Hay cosas curiosas. Por ejemplo la ZX81 solo manejaba flotantes cuando su µP sumaba enteros de hasta 16 bits.

Presenter Notes

Representaciones de punto flotante

(-1)^s * (1+mantisa) * 2^{exponente-offset}

Normalized Floating Point Representation

Verdades de perogrullo.

No podemos capturar todos los racionales.
Hay más precisión cerca del 0.
Hay un máximo y mínimo positivo (negativo).
Estaría bueno tener representaciones denormalizadas.

Presenter Notes

Problemitas

a+b = a, b!=0

1 REAL*4 X,Y
2 X = 1.25E8
3 Y = X + 7.5E-3
4 IF (X .EQ. Y) THEN
5     PRINT *,'Am I nuts or what?'
6 ENDIF
7 PRINT *,X,Y
8 END

0.1*10 != 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1

 1 REAL*4 X,Y
 2 X = 0.1
 3 Y = 0
 4 DO I=1,10
 5     Y = Y+X
 6 ENDDO
 7 IF (Y .EQ. 1.0) THEN
 8     PRINT *,'Algebra is truth'
 9 ELSE
10     PRINT *,'Not here'
11 ENDIF
12 PRINT *,1.0-Y
13 END

Presenter Notes

Problemitas

x^-1 * x != 1

1 INTEGER X
2 DO X=1, 1000
3     IF ((1.0/X)*X .NE. 1.0) THEN
4         PRINT *, "X^-1*X != 1 for X=", X
5     ENDIF
6 ENDDO
7 END

Imprime 135 números.

Presenter Notes

Explicación

Loss of accuracy while aligning the decimal point

La suma no es asociativa.
No hay inverso multiplicativo.

Ejemplo

1 (X+Y)+Z = (.00005 + .00005) + 1.0000
2         = .0001 + 1.0000
3         = 1.0001
4 X+(Y+Z) = .00005 + (.00005 + 1.0000)
5         = .00005 + 1.0000
6         = 1.0000

Presenter Notes

La suma no es asociativa. Consecuencias

El no-determinismo en el orden de la suma se pierde.
No es un conjunto es una secuencia.

Secuencialidad => Corrección

¿Adiós paralelismo!

Recetas para sumar bien.
Recetas para sumar mal, pero rápido ... muy rápido.

Shhh, no le digan a nadie ...

El estándar de "C" dice que (x+y)+z se puede evaluar como x+(y+z).
(mirar gimple)

¿Y del inverso multiplicativo?

Transformar divisiones en multiplicaciones siempre que se pueda.

Presenter Notes

El estándar: IEEE754-1985

IEEE754

Presenter Notes

IEEE754-1985

Presenter Notes

IEEE754-1985

También define el "como" de las computaciones:

Suma, Resta.
Multiplicación, División.
Raíz Cuadrada.
Conversion de y hacia enteros.

Hay no-determinismo en esta especificación que permite diferentes implementaciones.

Implementaciones: Intel, Cray, NVIDIA, AMD, Xilnix, IBM, ...

Presenter Notes

Guard digits + Sticky bit

Mejora la precisión

Computation using guards and sticky bits

Presenter Notes

Como truncar segun `ggs`

Todas las implementaciones permiten modificar el rounding mode.

Presenter Notes

Valores especiales

Notar

El cero es un ristra de bits en 000...0.
Números denormalizados para un gradual underflow.
Como operan los valores especiales.
- NaN es una constante que se propaga infinitamente.

Presenter Notes

En la Práctica

Presenter Notes

`fp32` vs. `fp64`

Los costos (velocidad) dependen de la plataforma (Intel, AMD, NVIDIA).
fp64 ocupa el doble de memoria.
- Doble de transferencia y la memoria es LENTA.

Notas

Usualmente hay mayores costos asociados a fp64.
- La diferencia entre una GTX 780 y una Tesla K20c.
Lo ideal es armar algoritmos mixed precision.
Antes era fp16 y no IEEE754 compatible.
- Gaming!
- La GTX 280 no era IEEE754, recién CC2.0 (Fermi) fue IEEE754.
Ahora está de moda fp16 para CV y DNN.
(ver Compute Capability 5.3 para NVIDIA Jetson TX1.)
Intel siempre fue internamente fp80 y por fuera fp32 o fp64.

Presenter Notes

Switches en los compiladores

nvcc

--use_fast_math                                    (-use_fast_math)             
        Make use of fast math library. -use_fast_math implies -ftz=true 
        -prec-div=false -prec-sqrt=false.

--ftz [true,false]                                 (-ftz)                       
        When performing single-precision floating-point operations, flush 
        denormal values to zero or preserve denormal values. -use_fast_math 
        implies --ftz=true.
        Default value:  0.

--prec-div [true,false]                            (-prec-div)                  
        For single-precision floating-point division and reciprocals, use IEEE 
        round-to-nearest mode or use a faster approximation. -use_fast_math 
        implies --prec-div=false.
        Default value:  1.

--prec-sqrt [true,false]                           (-prec-sqrt)                 
        For single-precision floating-point square root, use IEEE 
        round-to-nearest mode or use a faster approximation. -use_fast_math 
        implies --prec-sqrt=false.
        Default value:  1.

--fmad [true,false]                                (-fmad)                      
        Enables (disables) the contraction of floating-point multiplies and 
        adds/subtracts into floating-point multiply-add operations (FMAD, FFMA, 
        or DFMA). This option is supported only when '--gpu-architecture' is 
        set with compute_20, sm_20, or higher. For other architecture classes, 
        the contraction is always enabled. -use_fast_math implies --fmad=true.
        Default value:  1.

Presenter Notes

Switches en los compiladores

gcc

-ffast-math
   Sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and
   -fcx-limited-range.

-funsafe-math-optimizations
   Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE
   or ANSI standards.  When used at link-time, it may include libraries or startup files that change the default FPU control word
   or other similar optimizations.

   This option is not turned on by any -O option since it can result in incorrect output for programs that depend on an exact
   implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do
   not require the guarantees of these specifications.  Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and
   -freciprocal-math.

   The default is -fno-unsafe-math-optimizations.

Presenter Notes

Rounding Mode

Intel SSE3

Modificar el registro de estado MXCSR.

FZ  bit 15  Flush To Zero
R+  bit 14  Round Positive
R-  bit 13  Round Negative
RZ  bits 13 and 14  Round To Zero
RN  bits 13 and 14 are 0    Round To Nearest

NVIDIA Kepler

Modificadores de las instrucciones de assembler:

.rn  Round to nearest even.
.rz  Round towards zero.
.rm  Round towards -infty.
.rp  Round towards +infty.

Presenter Notes

Consejos

Buscar switches para relajar o reforzar compatibilidad IEEE.
Usar fp64 de manera juiciosa (ancho de banda, memoria).
- Ej: LAMMPS computa dinámica en fp64 y almacena los vectore en fp32.
Sumar ordenadamente.
Multiplicar antes que las divisiones cuando se pueda.
Multiplicar en vez de dividir.
Comparar por valores cercanos, no iguales.
MUCHÍSIMO CUIDADO con las conversiones de tipo.

Presenter Notes

Enteros

Presenter Notes

Enteros

Naturales: notación binaria

Enteros: representación complemento a 2s.

Tamaños en "C" de 32 bits

1 bits: nombre
2 8: char
3 16: short
4 32: int, long
5 64: long

¡Depende de la arquitectura!

Tamaños fijos

1 #include <inttypes.h>
2 
3 int8_t   spin;
4 uint64_t total_sum;

Presenter Notes

Cosas a tener en cuenta

Overflow
- Más performance, mayor tamaño (Gustafson's Law) y con N grande es fácil hacer overflow.
¿Con o sin signo?
- Compilación más estricta.
- Mas bits de alcance.
Tipos ajustados en general.
- Más posibilidades de optimización.
- Más información para que el compilador detecte errores en compile time.
Órden en las operaciones.
¡Conversiones de tipo!

Presenter Notes

Bibliografía

Charles Severance, Kevin Dowd, "High Performance Computing", Connexions, 2010, CC3.0.
David Goldberg, "What Every Programmer should know about floating point numbers", ACM Computer Surveys, 1991.

Table of Contents	t
Exposé	ESC
Full screen slides	e
Presenter View	p
Source Files	s
Slide Numbers	n
Toggle screen blanking	b
Show/hide slide context	c
Notes	2
Help	h

Ejemplo

Shhh, no le digan a nadie ...

¿Y del inverso multiplicativo?

Mejora la precisión

Notar

Notas

nvcc

gcc

Intel SSE3

NVIDIA Kepler

Tamaños en "C" de 32 bits

Tamaños fijos

La otra

La otra²

Table of Contents

Help