Integers
- Unsigned
- Signed
Fixed point
Floating point

Binary representation of integers

Important

In case of overflow the most significant bits are lost

Formulas

1\underbrace{0\ldots0_{(2)}}_n=2^n\\ \underbrace{1\ldots1}_n=1\underbrace{0\ldots0}_{n}-1=2^n-1\\ 0,\underbrace{0\ldots0}_{n-1}1=2^{-n}\\ 0,\underbrace{1\ldots1}_{n}=1,\underbrace{0\ldots0}_n-0,\underbrace{0\ldots1}_{n-1}=1-2^{-n} \end{gathered}

Important

Direct codes are the same as inverse and complementary codes for positive numbers, they differ only for negative numbers

The bits in the leftmost square are the sign bits

$[80]_{d i r} = [80]_{in v} = [80]_{co m pl} =$ $[- 80]_{d i r}$ = $[- 80]_{in v} =$ $[- 80]_{co m pl} =$

Important

The formula for inverse codes (for negative numbers) is as follows $2^{n} - 1 - ∣ x ∣$ , or just flip all the bits of the direct code

The formula for the complementary code (for negative numbers) is as follows $2^{n} - ∣ x_{(2)} ∣$ , or $[x]_{inv} + 1$

Important

You cannot obtain the absolute value from the complementary code To convert from complement form to direct form you have to subtract 1 and flip all the bits (basically you convert to inverse form and then to direct form), or just flip the bits until the least significant active bit (without flipping that last one)

Subunitary convention

$x = \frac{11}{16} = 11 \times 1 6^{- 1} = 0, B_{(16)} = 0, 101 1_{(2)}$ $[\frac{11}{16}]_{dir} = [\frac{11}{16}]_{inv} = [\frac{11}{16}]_{compl} =$ $[\frac{- 11}{16}]_{dir} =$ (same rules as for integers apply for inverse and complementary form)

Fixed point representation

I is the number of integer bits, F is the number of fractional bits and we also have a sign bit The minimum absolute value is $2^{- F}$ and the maximum absolute value is $2^{I} - 2^{- F}$

Floating point representation of real numbers

Important

Any real number can be written as $x = \pm 0, m \times b^{e}$ , where $m$ is mantissa, $b$ is the numeration base and $e$ is exponent

IEEE 754 standard

$e$ is the number of digits in the whole part (in binary)

Exercise

$? x having C 1 D E 000 0_{(16)} as it FP, SP, m > 1$

🪴 Quartz 4.0

Explorer

Lecture 2 - Internal representations of numbers

Binary representation of integers

Formulas

Subunitary convention

Fixed point representation

Floating point representation of real numbers

IEEE 754 standard

Exercise

Graph View

Table of Contents

Backlinks