A simple representation of units of measure in Rust

Back in the 1990s, there was a space probe called the Mars Climate Orbiter. The objective of its mission - in a nutshell - was to perform measurements related to the atmosphere and climate of the red planet. The misison failed, and one of the errors that led to the failure, sounds quite unbelievable:

"The peer review preliminary findings indicate that one team used English units (e.g., inches, feet and pounds) while the other used metric units for a key spacecraft operation."

Inspired by this notorious episode in the history of science and engineering, I started to think about how to create custom types to represent units of measure (which is one part of handling such challenges) in a statically typed programming language such as Rust, an exciting language I’m currently learning.

If you want to reproduce what’s happening here, I recommend using a recent version of stable Rust (I’m using version 1.30). I have a repository on GitHub, feel free to clone it and do some hacking!

The simplest solution is to define a struct with a single unnamed argument. Such as

pub struct Celsius(pub f64);

It’s simple, it has information about the unit, it can be easily constructed

let water_temperature = Celsius(42.0);

and the value can be easily extracted (using .0 because Celsius is a tuple struct).

let water_temperature_value = water_temperature.0;

Plus it can be used to define constants:

pub const ABSOLUTE_ZERO_TEMPERATURE: Celsius = Celsius(-273.15);

But most importantly, making units of measure explicit makes developers (and the compiler) be aware of them.

Let’s improve the data structure a bit. Making it generic would allow you to use other types and save you from boilerplate. However, a constraint to use numeric types is required. For this purpose, you can use the Num trait from the num-traits crate.

pub struct Celsius<Value: Num>(pub Value);

Now you can use custom numeric types, and choose the one that is best for your use case.

For further use, let’s rewrite the type of the constant defined above:

pub const ABSOLUTE_ZERO_TEMPERATURE: Celsius<f64> = Celsius(-273.15);

This simple format has another important feature: it’s inexpensive from the aspect of memory. If you use the std::mem::size_of function and print out the size of plain numeric types vs wrapped ones, you will get the following values:

Type(s) Size (bytes)
u8, i8 1
Celsius<u8>, Celsius<i8> 1
u16, i16 2
Celsius<u16>, Celsius<i16> 2
u32, i32 4
Celsius<u32>, Celsius<i32> 4
u64, i64 8
Celsius<u64>, Celsius<i64> 8
u128, i128 16
Celsius<u128>, Celsius<i128> 16
f32 4
Celsius<f32> 4
f64 8
Celsius<f64> 8
BigRational 64
Celsius<BigRational> 64

The table shows that wrapping numeric types in simple structs such as Celsius will not increase your memory usage.

Similarly to Celsius, you can define other units of temperature such as Kelvin, Fahrenheit:

pub struct Kelvin<Value: Num>(pub Value);

pub struct Fahrenheit<Value: Num>(pub Value);

and so on.

In contrast to primitives, these types cannot be cast to one another using the as keyword. Instead, you can use Rust’s standard Into trait for safe conversion, which allows you to place the actual conversion formulas in the implementation. A side note: be careful with the implementation if you are using floating point types.

impl Into<Fahrenheit<f64> for Celsius<f64> {

fn into(self) -> Fahrenheit<f64> {

  Fahrenheit(((self.0 * 1.8 + 32.0) * 1e6).round() / 1e6)

 }

}

Of course, one can make errors in the implementation, or do ad hoc conversions. The compiler won’t be able to stop one from doing that. That’s where code review and thorough testing come into the picture (which are another part of handling such challenges).

Speaking of testing, it is worth implementing the PartialEq trait for your types if you want to assert equality. PartialOrd is recommended if you want to compare your values. You can save a lot of writing by using the #[derive] annotation.

#[derive(PartialEq, PartialOrd, Debug)]

pub struct Celsius<Value: Num>(pub Value);

Debug is useful if you want to print your values (when your unit tests fail, for instance). Now you can do things such as

assert_eq!(

Fahrenheit(-459.67),

 ABSOLUTE_ZERO_TEMPERATURE.into()

);

and

assert!(

Celsius(2) > Celsius(1)

);

You can do a lot more with derive. With the release of Rust version 1.30, procedural macros have become part of stable Rust. One of the features of procedural macros is that you can define custom derives, which is really useful for automating implementation of simple traits. Take the derive_more crate for example: it has macros that automatically implement traits of operator overloading for simple structs, such as the types defined here.

#[derive(Add, Div, Mul, Sub, PartialEq, PartialOrd, Debug)]

pub struct Celsius<Value: Num>(pub Value);

Now you can perform arithmetic on units of measure without unwrapping. You can add or subtract the same units of measure (attempting to add Celsius to Fahrenheit for example will result in a compilation error), and multiply or divide them with numbers of their Value type.

As a practical example for arithmetic, here’s a function that calculates the mean temperature based on a collection of values in Celsius:

pub fn mean_temperature(values: Vec<Celsius<f64>>) -> Celsius<f64> {

let len = values.len();

if len == 0 {

  Celsius(0.0)

 } else {

  values.into_iter().fold(Celsius(0.0), |acc, temp| acc + temp) / (len as f64)

 }

}

A quick check:

assert_eq!(

Celsius(11.5),

mean_temperature(vec![Celsius(11.3), Celsius(12.3), Celsius(10.9)])

);

If you want to make the mean_temperature function more readable, you can implement the Sum trait for Celsius (see the full code on GitHub). Or, to save some energy, you can create a derive macro that does the job for all units-of-measure-kind-of types.

Then you can write mean_temperature as

pub fn mean_temperature(values: Vec<Celsius<f64>>) -> Celsius<f64> {

let len = values.len();

if len == 0 { Celsius(0.0) } else {

  let sum: Celsius<f64> = values.into_iter().sum();

  sum / (len as f64)

 }

}

Wrapping it up, using Rust you can find a simple and flexible generic solution to represent units of measure, which allows you to add extra information to numeric types without memory overhead. Adding implementations of the From/Into trait provides safe conversion between the types. With the derive annotation, you can automate trait implementations quite easily. Furthermore, procedural macros can save you a lot of work and add useful features to your data structures.

You should keep in mind that this is not the only solution to the problem. You might want to consider other options when you encounter such a challenge, and I encourage you to explore the opportunities provided by your language of choice. Since units of measure are - implicitly - part of everyday life (and therefore part of everyday engineering), I’m sure such challenges will come really soon.

Function composition in Rust using a custom smart pointer

Still drunk with the power of function composition, I started to play around with the technique in Rust, a language I've been experimenting with. Rust is a low-level language with a strict compiler that saves you from doing dangerous things. Furthermore, Rust is a functional language. It has several concepts and features inspired by Haskell (read more) and Scala for example. The design of Rust makes it highly expressive and attractive.

I have found some intro posts on the functional nature of Rust, such as, this, and this. I also recommend the Rust book. To reproduce what’s happening here, first you will need a recent version of stable Rust. I’m using Rust version 1.29.0 here. Then clone the repo fbox_example, compile & run the program on your local machine. In this post, I am going to show (sometimes simplified) snippets of the code.

If you want to return a function in Rust, you can do it if you wrap it in a Box, for example. Box is a smart pointer in Rust, it allows you to store data on the heap. In order to have composable functions that allow a user friendly syntax, I have decided to define a custom type, a smart pointer called FBox.

pub struct FBox<FIn, FOut> {

f: Box<Fn(FIn) -> FOut>

}

FBox has a function f. The generic type parameters FIn and FOut represent the input and output type of f. The type signature of f shows tat f can be any kind of function that a) has one argument; and b) takes ownership of the argument. The handling of values is governed by the ownership rules of Rust, and safety is guaranteed by the compiler. (EDIT: due to the characteristics of the Fn trait, FBox is not as strict as written in point b), it's actually more flexible. However, you can take advantage of the ownership rules if your functions and types allow.)

The behavior of FBox is defined by its methods. Its factory method FBox::new takes a function as argument and wraps it in an FBox.

But before going any further with the FBox's behavior, let's see what I'm going to use it for. The Rust book has a really cool example of an http server. One of its features is that it reads html files from the file system and converts them into collections of bytes (u8, unsigned 8-bit integer) before writing them to the stream. I'm going to create similar functionality here using the and_then method of FBox. Read a file based on a path, and return the contents as an immutable collection of bytes.

I need building blocks first. I import a function from the standard library as read_from_path. It will read a file and return a Vec of bytes wrapped in a Result. Then I want to safely unwrap the result of the file IO via extract_bytes.

fn extract_bytes(bytes_in_a_result: Res) -> Vec<u8> {

 bytes_in_a_result.unwrap_or_else(|error| { println!("{:?}", error); vec![] })

}

Here, Res is a type alias for the Result<_, _> type (check the repo to see what it actually stands for). The unwrap_or_else method of the Result type is lazy, it will only print the error and create an empty Vec if it actually finds an error, otherwise it returns the file contents.

Then I want to convert the the collection to immutable, such as Vector from the immutable Rust collection library im, because I prefer immutability and Vec of the standard library is mutable. So, the third building block is Vector::from.

Now I can call FBox::new with read_from_path as argument, and use the and_then method to create a composite function. What and_then does is that it takes the f from FBox and an argument g, and returns

FBox::new(move |x| g((self.f)(x)))

You wrap a function g(f(x)) in an FBox. The move keyword here is to

make a closure take ownership of all its captures

The other method compose gives you f(g(x)):

FBox::new(move |x| (self.f)(g(x)))

I prefer and_then over compose because to me it's easier to read and to reason about (it is called 'and then' for a reason).

When using and_then, the input type of g must match the output type of f. If there is any mismatch, the compiler will throw an error. Which means the composition is type safe.

let bytes_from_path =

FBox::new(read_from_path)

  .and_then(extract_bytes)

  .and_then(Vector::from);

This definition tells you what you can expect if you call bytes_from_path. It reads a file from the path, then extracts the bytes from the result, then creates an immutable Vector of bytes. Nothing happened yet though because FBox is lazy. You need to tell it explicitly to apply its function by calling the apply method.

let bytes = bytes_from_path.apply(path);

To see the results, let's use println!.

println!("{:?}", bytes);

When using the path of the example file index.html, you get something like this:

[60, 33, 68, 79, 67, 84, 89, 80, 69 // Rest is omitted

Wrapping it up, I have defined a smart pointer FBox. FBox has a function, and allows you to use functions as building blocks via methods compose and and_then. The compiler guarantees the type safety of composition, and safe memory management by the ownership rules of Rust. FBox is lazy, it only applies its function when you explicitly tell it to do so by calling the apply method with an input value. You can use functions from different libraries, and easily define your own. And with only a few lines of readable and simple code, you can create heavy machinery.

A functional hello to R

I study R because I am looking for an effective data analysis tool. I am a post-hello-world R student interested in functional programming (FP). You might have heard about, or learning, or already mastered FP. If you are interested in a brief summary on the concept, you can find one here or here for example, or take a look at the Wikipedia article on FP. My motivation here is that I prefer data analysis without changing the data. Furthermore, if one is familiar with FP, and the language one is learning supports FP, the way of calculating things can be very easy to find. Especially when learning a high-level language with a lightweight syntax.

If you want to reproduce what's happening in this post, you can do it using the R interpreter. My current R version is 3.5.1 (2018-07-02) -- "Feather Spray". If you want to have answers about a function immediately, use the interpreter. Simply type '?' and the name of the function. As an example, you can do it like this for the sum function:

> ?sum

It opens the corresponding page for sum in your browser. As an alternative, you can take a look at the R documentation.

So if you're ready, let's go!

Let's define a function first! In R, you can use the function keyword to do that.

sq <- function(n) n ^ 2

The '<-' is an assignment operator in R. So I have created a function and I call it sq.

But what does it do? The R interpreter will tell us.

First, I create a vector of numbers in the range of 1 to 10.

> one.to.ten <- 1:10; one.to.ten

[1] 1 2 3 4 5 6 7 8 9 10

Which I pass directly to sq.

> sq(one.to.ten)

[1] 1 4 9 16 25 36 49 64 81 100

Wow! What happened here?

> ?'^'

The Description of Arithmetic Operators looks like this:

These unary and binary operators perform arithmetic on numeric or complex vectors (or objects which can be coerced to them)

You can try the operator directly:

> one.to.ten ^ 2

[1] 1 4 9 16 25 36 49 64 81 100

You can even do

> one.to.ten ^ one.to.ten

[1] 1 4 27 256 3125 46656

[7] 823543 16777216 387420489 10000000000

Which is doing something like [1 ^ 1, 2 ^2 ... 10 ^ 10.]

This is a consequence of the functional and array language nature of R. It's like an undercover map (about map, filter, and reduce) function.

You can use other operators such as '<' to filter your data using the square bracket notation.

> one.to.ten[one.to.ten < 2]

[1] 1

As you can see, R supports higher-order functions (HOFs). You can do mapping, filtering, aggregation (R has built-in functions Map, Filter, and Reduce), and even more.

Now, I'm going to define a sum of squares function (by which I mean Σx2 here). I have the building blocks, R's sum, and my very own sq so I don't have to do it from scratch. I only need to use sum and sq to build larger blocks via function composition.

I came across an R library called purrr, which - among other awesome FP tools - has a function called compose for this purpose. I'm going to use one of purrr's features in this post, but first I'm going to show you how you can define your own compose function in R.

Based on the definition, you can do it like this:

my.compose <- function(f, g) function(x) f(g(x))

I have defined a higher-order function my.compose. It takes two functions f and g as arguments, and returns a function, which applies g to a given argument, then applies f to the output of g.

I can define my sum of squares writing

sum.of.squares <- my.compose(sum, sq)

Let's say you have a vector of numbers. When you pass it to sum.of.squares, first the squares are calculated for all numbers in the vector, then aggregated by sum.

Let's check how it works for the test input!

> sum.of.squares(one.to.ten)

[1] 385

I can think of another example. Let's say I have some data with outliers.

> data.with.outliers <- c(1, 4, 5, 4, 5, 4, 5, 9)

I'd like to calculate the mean and variance, but I want to filter the outliers first. There is a post on outlier handling by the way if you want to dig deeper. Here, I'm going to use a solution found on StackOverflow.

filter.outliers <- function(x) x[!x %in% boxplot.stats(x)$out]

It works like this:

> filter.outliers(data.with.outliers)

[1] 4 5 4 5 4 5

I can define equivalent functions of mean and var, with automatic outlier filtering.

mean.wo <- my.compose(mean, filter.outliers)

var.wo <- my.compose(var, filter.outliers)

The 'wo' in the name stands for without outliers.

You can check how it affects the mean and variance:

> mean(data.with.outliers)

[1] 4.625

> mean.wo(data.with.outliers)

[1] 4.5

> var(data.with.outliers)

[1] 4.839286

> var.wo(data.with.outliers)

[1] 0.3

If you want to try your own outlier handling, you can do it by function composition.

There's another FP feature I'd like to show, namely partial application. I'm going to need purrr for this.

> library(purrr)

I am going to tell my.compose to always apply filter.outliers as its g argument. It will behave like a "function factory". It is going to create functions that automatically filter outliers of a vector, and apply another function of choice to the vector.

filter.then <- partial(my.compose, g=filter.outliers)

Now I can define my sum (and sum of squares) without outliers.

sum.wo <- filter.then(sum)

sum.of.squares.wo <- filter.then(sum.of.squares)

So let's try the above in the interpreter!

> sum(data.with.outliers)

[1] 37

> sum.wo(data.with.outliers)

[1] 27

> sum.of.squares(data.with.outliers)

[1] 205

> sum.of.squares.wo(data.with.outliers)

[1] 123

The differences you see between the 'wo' and non-'wo' numbers mean that the automatic outlier filtering works! On top of that, the functions sum.wo and sum.of.squares.wo can be further composed if needed.

Doing this exercise made me experience the advantages of FP in R. Using R in a functional style can help me build powerful tools quite fast, and keep my code concise. Using my new tools, I can analyze data without changing the data.

R and functional programming are friends, so R and I are going to be friends. So hello R!