Building a User-Defined Function implementation for Apache Arrow and Webassembly - Part 3: demo and some issues along the way.
We're almost home
Introduction
This is a three part blog post about a small venture I did into building a UDF implementation using Apache Arrow and Wasmtime.
In our last post, we had implemented some of the building blocks for our project. Now, we’re going to put it all together and see how it functions!
First, our host
fn main() {
let wasm = // get wasm from fs or something.
let engine = Engine::new(
Config::new()
.debug_info(true)
.coredump_on_trap(true)
.cranelift_opt_level(OptLevel::None),
)
.unwrap();
let module = Module::new(&engine, wasm).unwrap();
let linker = Linker::new(&engine);
let mut store: Store<u32> = Store::new(&engine, 4);
let instance = linker.instantiate(&mut store, &module).unwrap();
// we retrieve our malloc function.
let wasm_malloc_fn = instance
.get_typed_func::<u32, u32>(&mut store, "_malloc")
.unwrap();
// we retrieve the memory instance we were looking for.
let mut memory = instance.get_memory(&mut store, "memory").unwrap();
// Here, we instantiate a simple int32 arrow array and copy it to
memory
let array = Int32Array::from(vec![Some(1), None, Some(3)]);
// Copy the array contents.
let result = copy_array(
&array.to_data(),
&wasm_malloc_fn,
&mut store,
&mut memory);
// Clone the actual array to memory.
let ptr = clone_array(
result,
&wasm_malloc_fn,
&mut store,
&mut memory);
// Clone the schema contents.
let ffi_schema = clone_schema(
array.data_type(),
&wasm_malloc_fn,
&mut store,
&mut memory).unwrap();
// Clone the schema itself to memory.
let ffi_schema_ptr = copy_schema(
ffi_schema,
&wasm_malloc_fn,
&mut store,
&mut memory);
// Retrieve the run function - this is considered the main function
in our wasm module.
let wasm_run_fn = instance
.get_typed_func::<(u32, u32), u32>(&mut store, "run")
.unwrap();
// Call the function.
let result = wasm_run_fn.call(&mut store, (ptr, ffi_schema_ptr));
// Assert the summation of the array is 4.
assert_eq!(result.unwrap(), 4);
}
Second, the module.
// Our malloc function.
#[unsafe(no_mangle)]
pub unsafe fn _malloc(len: u32) -> *mut u8 {
let mut buf = Vec::with_capacity(len.try_into().unwrap());
let ptr = buf.as_mut_ptr();
std::mem::forget(buf);
ptr
}
// Our run function implementation.
#[unsafe(no_mangle)]
pub unsafe fn run(
array_ptr: *const FFI_ArrowArray,
schema_ptr: *const FFI_ArrowSchema) -> u32 {
// Here we use the standard arrow::ffi logic to retrieve our arrays.
let array = FFI_ArrowArray::from_raw(
array_ptr as *mut FFI_ArrowArray);
let schema = FFI_ArrowSchema::from_raw(
schema_ptr as *mut FFI_ArrowSchema);
let array = unsafe { from_ffi(array, &schema) }.unwrap();
let array = Int32Array::try_from(array).unwrap();
// Summation of the array.
let result = array.iter().fold(0_i32, |mut acc, curr| match curr {
Some(i) => {
acc += i;
acc
}
None => acc,
});
std::mem::forget(array);
std::mem::forget(schema);
result.try_into().unwrap()
}
Some unexpected problems
Attempt to release the array using RAII
backtrace:
error while executing at wasm backtrace:
0: 0x1c834 - example_wasm.wasm!<arrow_schema::ffi::FFI_ArrowSchema as core::ops::drop::Drop>::drop::h9670aa25ed7d180b
1: 0x195a - example_wasm.wasm!run
Caused by:
0: error while executing at wasm backtrace:
0: 0x1c834 - example_wasm.wasm!<arrow_schema::ffi::FFI_ArrowSchema as core::ops::drop::Drop>::drop::h9670aa25ed7d180b
1: 0x195a - example_wasm.wasm!run
1: wasm trap: indirect call type mismatch
Upon getting the above message, it was clear that we had an issue with the Drop trait in Rust. This was primarily brought about by the fact that I had:
Included a nullptr as the release function in the schema/array in the previous post and
Rust’s RAII system runs .drop() on all objects when the object goes out of scope.
This meant that the object would need to not be dropped and therefore I included the two std::mem::forget calls at the bottom of run function.
misalignment of pointers
The host machine has a pointer memory address size of whatever the architecture of the running computer says it is. In my case, this was a u64. That meant that any time I transmuted a struct to a byte array and then expected the wasm module to reference that struct with a pointer, it would write it with a u64, but read it with a u32 memory address size (as that is the standard in webassembly.).
This meant that I would need to force the computer to use a u32 as opposed to a u64 as the pointer size. I am still not convinced this is ideal at the moment as we don’t have a definitive way to know if the webassembly language is using 32 or 64 bit pointer lengths.
Conclusion
And there we have it! A way for us to communicate our Arrow structs to webassembly. If you’d like a general idea of atleast some of this code (I can’t guarantee that this repository won’t change quite a lot), you can take a look here: https://github.com/ilikepi63/arrow_wasmtime.