Building Zerocalc, part IV - precision

Michal Ciesielski - Jul 7 - - Dev Community

Performing arithmetic calculations using computers results in limited precision. Operations on integers are accurate up to minimum and maximum values allowed by the selected integer representation. Using the i128 type we can get any number up to 38 digits long which is enough assuming you will not try to count atoms in the known universe (~10^79). Floating point operations are more challenging as not all values can be represented in the standard used by Rust (IEEE 754). For example:

1.1 * 2.2 = 2.4200000000000004

There are libraries, such as rug, that can help us deal with this problem, offering extended precision and big integer operations. For Zerocalc I am going to do something simpler and still get some benefits of improved precision: as long as we operate on integers, we will stick to integers and only switch to floats when necessary.

For example, when adding or multiplying integers, the result is an integer. When dividing, if the division has no reminder, we will use integers. I also want to make the operations easy to write just as we used normal primitives, for example, I want to write:

x = 1
y = 2
result = x * y
Enter fullscreen mode Exit fullscreen mode

and don't think about what primitive types to operate on.

To achieve this we will use two Rust features: enums and std::ops traits which are similar to C++'s operator overloading. Let's start with defining our numbers. We will derive from PartialEq to allow comparisons of Number instances:

#[derive(Debug, PartialEq)]
pub enum Number {
    Int(i128),
    Float(f64),
    NaN
}
Enter fullscreen mode Exit fullscreen mode

The extra NaN or Not a Number option will help us handle edge cases such as division by 0. Rust's floating point numbers already have similar values defined as constants such as std::f64::NAN. To make it easy to create Number instances in tests, we can add a helper conversion implementations:

impl From<f64> for Number {
    fn from(value: f64) -> Self {
        if value == std::f64::NAN || value == std::f64::INFINITY || value == std::f64::NEG_INFINITY {
            Number::NaN
        } else {
            Number::Float(value)
        }

    }
}

impl From<Number> for f64 {
    fn from(value: Number) -> Self {
        match value {
            Number::Int(i) => i as f64,
            Number::Float(f) => f,
            Number::NaN => f64::NAN
        }
    }
}

impl From<i128> for Number {
    fn from(value: i128) -> Self {
        Number::Int(value)
    }
}

impl From<Option<i128>> for Number {
    fn from(value: Option<i128>) -> Self {
        match value {
            Some(i) => Number::Int(i),
            None => Number::NaN
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

With this, we can create a Number from actual values easily:

let x: Number = 1.into(); // creates Number::Int(1)
let y: Number = 1.0.into(); // creates Number::Float(1.0)
Enter fullscreen mode Exit fullscreen mode

You may be wondering why we have two conversions for i128 - one for the actual integer and one for the option. This will help us to handle errors as we will see in the next step.

Now we can define the first operator - addition. The rules for addition will be:

  • If any operand is NaN, the result is NaN.
  • If both operands are integers, the result is an integer.
  • In all other cases (one or both operands are float) the result is float.
  • If the result cannot be represented as i128 or f64 - return NaN.
impl Add for Number {
    type Output = Number;
    fn add(self, rhs: Self) -> Self::Output {
        match (self, rhs) {
            (Number::NaN, _) | (_, Number::NaN) => Number::NaN,
            (Number::Int(l), Number::Int(r)) => l.checked_add(r).into(),
            _ => {
                let l: f64 = self.into();
                let r: f64 = rhs.into();
                (l + r).into()
             }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The checked_add method returns an option - Some if the result can be safely represented as i128, None in other cases. This is where our From<Option<i281>> comes in handy. In the case of floats the errors will be returned as predefined constants such as NAN. Let's test it!

#[test]
fn test_add() {
    let int1: Number = 1.into();
    let int2: Number = 2.into();
    assert_eq!(Number::Int(3), int1 + int2);

    let fl2: Number = 2.0.into();

    if let Number::Float(fl_res) = int1 + fl2 {
        assert!((fl_res - 3.0).abs() < std::f64::EPSILON);
    } else {
        panic!("Float expected");
    }
}
Enter fullscreen mode Exit fullscreen mode

Subtraction and multiplication are very similar, in the case of division we need to check if the reminder of dividing two integrals is 0 - if yes, the result will also be 0. We also have to watch for division by 0 operations. checked_rem will handle both for us:

impl Div for Number {
    type Output = Number;
    fn div(self, rhs: Self) -> Self::Output {
        match (self, rhs) {
            (Number::NaN, _) | (_, Number::NaN) => Number::NaN,
            (Number::Int(l), Number::Int(r)) => {
                let rem = l.checked_rem(r);
                match rem {
                    Some(r) if r > 0 => Number::Float(self.into()) / Number::Float(rhs.into()),
                    Some(_) => Number::Int(l / r),
                    None => Number::NaN
                }
            },
            _ => {
                let l: f64 = self.into();
                let r: f64 = rhs.into();
                (l / r).into()
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

With this simple approach, we can make easy calculations that will preserve integers as long as possible and will gracefully handle errors.

Bonus

A cautious reader may have noticed the impl From<Number> for f64 implementation does not do any checks when converting 128 into f64. How can we safely convert a 128-bit integer into an 64-bit float? According to IEEE-754, the 64-bit floating numbers are stored as:

  • 11 bit exponent
  • 53 bit precision number

Quoting after Wikipedia:

The 11 bit width of the exponent allows the representation of numbers between 10^−308 and 10^308

So there will be rounding errors, but we can store a maximum i128 number; we can even count atoms in the universe! Let's check what happens if we convert 128::MAX:

fn main() {
    println!("i128::MAX: {},\ni128::MAX as f64: {}", std::i128::MAX, std::i128::MAX as f64) 
}
Enter fullscreen mode Exit fullscreen mode

Output:

i128::MAX: 170141183460469231731687303715884105727,
i128::MAX as f64: 170141183460469230000000000000000000000
Enter fullscreen mode Exit fullscreen mode

Sources:

https://rtfeldman-rust-workshop.netlify.app/1.2
https://crates.io/crates/rug
https://www.wolframalpha.com/input?i=number+of+atoms+in+universe
https://en.wikipedia.org/wiki/Double-precision_floating-point_format

. . . . . . . .
Terabox Video Player