198 lines
8.2 KiB
Markdown
198 lines
8.2 KiB
Markdown
|
# zerovec [](https://crates.io/crates/zerovec)
|
||
|
|
||
|
<!-- cargo-rdme start -->
|
||
|
|
||
|
Zero-copy vector abstractions for arbitrary types, backed by byte slices.
|
||
|
|
||
|
`zerovec` enables a far wider range of types — beyond just `&[u8]` and `&str` — to participate in
|
||
|
zero-copy deserialization from byte slices. It is `serde` compatible and comes equipped with
|
||
|
proc macros
|
||
|
|
||
|
Clients upgrading to `zerovec` benefit from zero heap allocations when deserializing
|
||
|
read-only data.
|
||
|
|
||
|
This crate has four main types:
|
||
|
|
||
|
- [`ZeroVec<'a, T>`] (and [`ZeroSlice<T>`](ZeroSlice)) for fixed-width types like `u32`
|
||
|
- [`VarZeroVec<'a, T>`] (and [`VarZeroSlice<T>`](ZeroSlice)) for variable-width types like `str`
|
||
|
- [`ZeroMap<'a, K, V>`] to map from `K` to `V`
|
||
|
- [`ZeroMap2d<'a, K0, K1, V>`] to map from the pair `(K0, K1)` to `V`
|
||
|
|
||
|
The first two are intended as close-to-drop-in replacements for `Vec<T>` in Serde structs. The third and fourth are
|
||
|
intended as a replacement for `HashMap` or [`LiteMap`](docs.rs/litemap). When used with Serde derives, **be sure to apply
|
||
|
`#[serde(borrow)]` to these types**, same as one would for [`Cow<'a, T>`].
|
||
|
|
||
|
[`ZeroVec<'a, T>`], [`VarZeroVec<'a, T>`], [`ZeroMap<'a, K, V>`], and [`ZeroMap2d<'a, K0, K1, V>`] all behave like
|
||
|
[`Cow<'a, T>`] in that they abstract over either borrowed or owned data. When performing deserialization
|
||
|
from human-readable formats (like `json` and `xml`), typically these types will allocate and fully own their data, whereas if deserializing
|
||
|
from binary formats like `bincode` and `postcard`, these types will borrow data directly from the buffer being deserialized from,
|
||
|
avoiding allocations and only performing validity checks. As such, this crate can be pretty fast (see [below](#Performance) for more information)
|
||
|
on deserialization.
|
||
|
|
||
|
See [the design doc](https://github.com/unicode-org/icu4x/blob/main/utils/zerovec/design_doc.md) for details on how this crate
|
||
|
works under the hood.
|
||
|
|
||
|
## Cargo features
|
||
|
|
||
|
This crate has several optional Cargo features:
|
||
|
- `serde`: Allows serializing and deserializing `zerovec`'s abstractions via [`serde`](https://docs.rs/serde)
|
||
|
- `yoke`: Enables implementations of `Yokeable` from the [`yoke`](https://docs.rs/yoke/) crate, which is also useful
|
||
|
in situations involving a lot of zero-copy deserialization.
|
||
|
- `derive`: Makes it easier to use custom types in these collections by providing the `#[make_ule]` and
|
||
|
`#[make_varule]` proc macros, which generate appropriate [`ULE`](https://docs.rs/zerovec/latest/zerovec/ule/trait.ULE.html) and
|
||
|
[`VarULE`](https://docs.rs/zerovec/latest/zerovec/ule/trait.VarULE.html)-conformant types for a given "normal" type.
|
||
|
- `std`: Enabled `std::Error` implementations for error types. This crate is by default `no_std` with a dependency on `alloc`.
|
||
|
|
||
|
[`ZeroVec<'a, T>`]: ZeroVec
|
||
|
[`VarZeroVec<'a, T>`]: VarZeroVec
|
||
|
[`ZeroMap<'a, K, V>`]: ZeroMap
|
||
|
[`ZeroMap2d<'a, K0, K1, V>`]: ZeroMap2d
|
||
|
[`Cow<'a, T>`]: alloc::borrow::Cow
|
||
|
|
||
|
## Examples
|
||
|
|
||
|
Serialize and deserialize a struct with ZeroVec and VarZeroVec with Bincode:
|
||
|
|
||
|
```rust
|
||
|
use zerovec::{VarZeroVec, ZeroVec};
|
||
|
|
||
|
// This example requires the "serde" feature
|
||
|
#[derive(serde::Serialize, serde::Deserialize)]
|
||
|
pub struct DataStruct<'data> {
|
||
|
#[serde(borrow)]
|
||
|
nums: ZeroVec<'data, u32>,
|
||
|
#[serde(borrow)]
|
||
|
chars: ZeroVec<'data, char>,
|
||
|
#[serde(borrow)]
|
||
|
strs: VarZeroVec<'data, str>,
|
||
|
}
|
||
|
|
||
|
let data = DataStruct {
|
||
|
nums: ZeroVec::from_slice_or_alloc(&[211, 281, 421, 461]),
|
||
|
chars: ZeroVec::alloc_from_slice(&['ö', '冇', 'म']),
|
||
|
strs: VarZeroVec::from(&["hello", "world"]),
|
||
|
};
|
||
|
let bincode_bytes =
|
||
|
bincode::serialize(&data).expect("Serialization should be successful");
|
||
|
assert_eq!(bincode_bytes.len(), 67);
|
||
|
|
||
|
let deserialized: DataStruct = bincode::deserialize(&bincode_bytes)
|
||
|
.expect("Deserialization should be successful");
|
||
|
assert_eq!(deserialized.nums.first(), Some(211));
|
||
|
assert_eq!(deserialized.chars.get(1), Some('冇'));
|
||
|
assert_eq!(deserialized.strs.get(1), Some("world"));
|
||
|
// The deserialization will not have allocated anything
|
||
|
assert!(!deserialized.nums.is_owned());
|
||
|
```
|
||
|
|
||
|
Use custom types inside of ZeroVec:
|
||
|
|
||
|
```rust
|
||
|
use zerovec::{ZeroVec, VarZeroVec, ZeroMap};
|
||
|
use std::borrow::Cow;
|
||
|
use zerovec::ule::encode_varule_to_box;
|
||
|
|
||
|
// custom fixed-size ULE type for ZeroVec
|
||
|
#[zerovec::make_ule(DateULE)]
|
||
|
#[derive(Copy, Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
|
||
|
struct Date {
|
||
|
y: u64,
|
||
|
m: u8,
|
||
|
d: u8
|
||
|
}
|
||
|
|
||
|
// custom variable sized VarULE type for VarZeroVec
|
||
|
#[zerovec::make_varule(PersonULE)]
|
||
|
#[zerovec::derive(Serialize, Deserialize)] // add Serde impls to PersonULE
|
||
|
#[derive(Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
|
||
|
struct Person<'a> {
|
||
|
birthday: Date,
|
||
|
favorite_character: char,
|
||
|
#[serde(borrow)]
|
||
|
name: Cow<'a, str>,
|
||
|
}
|
||
|
|
||
|
#[derive(serde::Serialize, serde::Deserialize)]
|
||
|
struct Data<'a> {
|
||
|
#[serde(borrow)]
|
||
|
important_dates: ZeroVec<'a, Date>,
|
||
|
// note: VarZeroVec always must reference the ULE type directly
|
||
|
#[serde(borrow)]
|
||
|
important_people: VarZeroVec<'a, PersonULE>,
|
||
|
#[serde(borrow)]
|
||
|
birthdays_to_people: ZeroMap<'a, Date, PersonULE>
|
||
|
}
|
||
|
|
||
|
|
||
|
let person1 = Person {
|
||
|
birthday: Date { y: 1990, m: 9, d: 7},
|
||
|
favorite_character: 'π',
|
||
|
name: Cow::from("Kate")
|
||
|
};
|
||
|
let person2 = Person {
|
||
|
birthday: Date { y: 1960, m: 5, d: 25},
|
||
|
favorite_character: '冇',
|
||
|
name: Cow::from("Jesse")
|
||
|
};
|
||
|
|
||
|
let important_dates = ZeroVec::alloc_from_slice(&[Date { y: 1943, m: 3, d: 20}, Date { y: 1976, m: 8, d: 2}, Date { y: 1998, m: 2, d: 15}]);
|
||
|
let important_people = VarZeroVec::from(&[&person1, &person2]);
|
||
|
let mut birthdays_to_people: ZeroMap<Date, PersonULE> = ZeroMap::new();
|
||
|
// `.insert_var_v()` is slightly more convenient over `.insert()` for custom ULE types
|
||
|
birthdays_to_people.insert_var_v(&person1.birthday, &person1);
|
||
|
birthdays_to_people.insert_var_v(&person2.birthday, &person2);
|
||
|
|
||
|
let data = Data { important_dates, important_people, birthdays_to_people };
|
||
|
|
||
|
let bincode_bytes = bincode::serialize(&data)
|
||
|
.expect("Serialization should be successful");
|
||
|
assert_eq!(bincode_bytes.len(), 168);
|
||
|
|
||
|
let deserialized: Data = bincode::deserialize(&bincode_bytes)
|
||
|
.expect("Deserialization should be successful");
|
||
|
|
||
|
assert_eq!(deserialized.important_dates.get(0).unwrap().y, 1943);
|
||
|
assert_eq!(&deserialized.important_people.get(1).unwrap().name, "Jesse");
|
||
|
assert_eq!(&deserialized.important_people.get(0).unwrap().name, "Kate");
|
||
|
assert_eq!(&deserialized.birthdays_to_people.get(&person1.birthday).unwrap().name, "Kate");
|
||
|
|
||
|
} // feature = serde and derive
|
||
|
```
|
||
|
|
||
|
## Performance
|
||
|
|
||
|
`zerovec` is designed for fast deserialization from byte buffers with zero memory allocations
|
||
|
while minimizing performance regressions for common vector operations.
|
||
|
|
||
|
Benchmark results on x86_64:
|
||
|
|
||
|
| Operation | `Vec<T>` | `zerovec` |
|
||
|
|---|---|---|
|
||
|
| Deserialize vec of 100 `u32` | 233.18 ns | 14.120 ns |
|
||
|
| Compute sum of vec of 100 `u32` (read every element) | 8.7472 ns | 10.775 ns |
|
||
|
| Binary search vec of 1000 `u32` 50 times | 442.80 ns | 472.51 ns |
|
||
|
| Deserialize vec of 100 strings | 7.3740 μs\* | 1.4495 μs |
|
||
|
| Count chars in vec of 100 strings (read every element) | 747.50 ns | 955.28 ns |
|
||
|
| Binary search vec of 500 strings 10 times | 466.09 ns | 790.33 ns |
|
||
|
|
||
|
\* *This result is reported for `Vec<String>`. However, Serde also supports deserializing to the partially-zero-copy `Vec<&str>`; this gives 1.8420 μs, much faster than `Vec<String>` but a bit slower than `zerovec`.*
|
||
|
|
||
|
| Operation | `HashMap<K,V>` | `LiteMap<K,V>` | `ZeroMap<K,V>` |
|
||
|
|---|---|---|---|
|
||
|
| Deserialize a small map | 2.72 μs | 1.28 μs | 480 ns |
|
||
|
| Deserialize a large map | 50.5 ms | 18.3 ms | 3.74 ms |
|
||
|
| Look up from a small deserialized map | 49 ns | 42 ns | 54 ns |
|
||
|
| Look up from a large deserialized map | 51 ns | 155 ns | 213 ns |
|
||
|
|
||
|
Small = 16 elements, large = 131,072 elements. Maps contain `<String, String>`.
|
||
|
|
||
|
The benches used to generate the above table can be found in the `benches` directory in the project repository.
|
||
|
`zeromap` benches are named by convention, e.g. `zeromap/deserialize/small`, `zeromap/lookup/large`. The type
|
||
|
is appended for baseline comparisons, e.g. `zeromap/lookup/small/hashmap`.
|
||
|
|
||
|
<!-- cargo-rdme end -->
|
||
|
|
||
|
## More Information
|
||
|
|
||
|
For more information on development, authorship, contributing etc. please visit [`ICU4X home page`](https://github.com/unicode-org/icu4x).
|