Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/guide #1183

Merged
merged 6 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion burn-book/src/basic-workflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@ This guide will walk you through the process of creating a custom model built wi
train a simple convolutional neural network model on the MNIST dataset and prepare it for inference.

For clarity, we sometimes omit imports in our code snippets. For more details, please refer to the
corresponding code in the `examples/guide` directory.
corresponding code in the `examples/guide` [directory](https://github.com/tracel-ai/burn/tree/main/examples/guide)
where this complete example is reproduced step-by-step from dataset & model definition to training.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
The demo example can be executed from burn's base directory using the command:
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
```bash
cargo run --example guide
```

## Key Learnings

Expand Down
28 changes: 20 additions & 8 deletions burn-book/src/basic-workflow/backend.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

We have effectively written most of the necessary code to train our model. However, we have not
explicitly designated the backend to be used at any point. This will be defined in the main
entrypoint of our program, namely the `main` function below.
entrypoint of our program, namely the `main` function defined in `src/main.rs`.

```rust , ignore
use burn::optim::AdamConfig;
Expand All @@ -28,8 +28,9 @@ fn main() {
You might be wondering why we use the `guide` prefix to bring the different modules we just
implemented into scope. Instead of including the code in the current guide in a single file, we
separated it into different files which group related code into _modules_. The `guide` is simply the
name we gave to our _crate_, which contains the different files. Below is a brief explanation of the
different parts of the Rust module system.
name we gave to our _crate_, which contains the different files. If you named your project crate as `my-first-burn-model`,
you can equivalently replace all usages of `guide` above with `my-first-burn-model`.Below is a brief explanation of the
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
different parts of the Rust module system.

A **package** is a bundle of one or more crates that provides a set of functionality. A package
contains a `Cargo.toml` file that describes how to build those crates. Burn is a package.
Expand All @@ -38,13 +39,23 @@ A **crate** is a compilation unit in Rust. It could be a single file, but it is
split up crates into multiple _modules_ and possibly multiple files. A crate can come in one of two
forms: a binary crate or a library crate. When compiling a crate, the compiler first looks in the
crate root file (usually `src/lib.rs` for a library crate or `src/main.rs` for a binary crate). Any
module declared in the crate root file will be inserted in the crate for compilation.
module declared in the crate root file will be inserted in the crate for compilation. For this demo example, we will
define a library crate where all the individual modules(model, data, training etc.) are listed inside `src/lib.rs` as
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
following:
ashdtu marked this conversation as resolved.
Show resolved Hide resolved

```
pub mod data;
pub mod inference;
pub mod model;
pub mod training;
```

A **module** lets us organize code within a crate for readability and easy reuse. Modules also allow
us to control the _privacy_ of items.
us to control the _privacy_ of items. The pub keyword used above for example, is used to make a module publicly available
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
inside the crate.

For this guide, we defined a library crate with a single example where the `main` function is
defined, as illustrated in the structure below.
In our examples, demonstrating the code within this guide, we defined a library crate with a single example where
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
the `main` function is defined inside the `guide.rs` file, as illustrated in the structure below.

```
guide
Expand All @@ -60,7 +71,8 @@ guide
```

The source for this guide can be found in our
[GitHub repository](https://github.com/tracel-ai/burn/tree/main/examples/guide).\
[GitHub repository](https://github.com/tracel-ai/burn/tree/main/examples/guide) which can be used to run this basic
work flow example end-to-end.\
ashdtu marked this conversation as resolved.
Show resolved Hide resolved

</details><br>

Expand Down
3 changes: 3 additions & 0 deletions burn-book/src/basic-workflow/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ To iterate over a dataset efficiently, we will define a struct which will implem
trait. The goal of a batcher is to map individual dataset items into a batched tensor that can be
used as input to our previously defined model.

Let us start by defining our dataset functionalities in a file `src/data.rs`. We shall omit some of the imports for brevity,
but the full code for following this guide can be found at `examples/guide/` [directory](https://github.com/tracel-ai/burn/tree/main/examples/guide).

```rust , ignore
use burn::{
data::{dataloader::batcher::Batcher, dataset::vision::MNISTItem},
Expand Down
4 changes: 2 additions & 2 deletions burn-book/src/basic-workflow/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ For loading a model primed for inference, it is of course more efficient to dire
weights into the model, bypassing the need to initially set arbitrary weights or worse, weights
computed from a Xavier normal initialization only to then promptly replace them with the stored
weights. With that in mind, let's create a new initialization function receiving the record as
input.
input. This new function can be defined alongside the `init` function for the `ModelConfig` struct in `src/model.rs`.

```rust , ignore
impl ModelConfig {
Expand All @@ -30,7 +30,7 @@ It is important to note that the `ModelRecord` was automatically generated thank
trait. It allows us to load the module state without having to deal with fetching the correct type
manually. Everything is validated when loading the model with the record.

Now let's create a simple `infer` method in which we will load our trained model.
Now let's create a simple `infer` method in a new file `src/inference.rs` which we will use to load our trained model.

```rust , ignore
pub fn infer<B: Backend>(artifact_dir: &str, device: B::Device, item: MNISTItem) {
Expand Down
6 changes: 3 additions & 3 deletions burn-book/src/basic-workflow/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Our goal will be to create a basic convolutional neural network used for image c
will keep the model simple by using two convolution layers followed by two linear layers, some
pooling and ReLU activations. We will also use dropout to improve training performance.

Let us start by creating a model in a file `model.rs`.
Let us start by defining our model struct in a new file `src/model.rs`.

```rust , ignore
use burn::{
Expand Down Expand Up @@ -281,9 +281,9 @@ network modules already built with Burn use the `forward` nomenclature, simply b
standard in the field.

Similar to neural network modules, the [`Tensor`](../building-blocks/tensor.md) struct given as a
parameter also takes the Backend trait as a generic argument, alongside its rank. Even if it is not
parameter also takes the Backend trait as a generic argument, alongside its rank(or dimensionality). Even if it is not
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
used in this specific example, it is possible to add the kind of the tensor as a third generic
argument.
argument. For example, a 3-dimensional Tensor of different data types(float, int, bool) would be defined as following:

```rust , ignore
Tensor<B, 3> // Float tensor (default)
Expand Down
6 changes: 4 additions & 2 deletions burn-book/src/basic-workflow/training.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Training

We are now ready to write the necessary code to train our model on the MNIST dataset. Instead of a
simple tensor, the model should output an item that can be understood by the learner, a struct whose
We are now ready to write the necessary code to train our model on the MNIST dataset.
We shall define the code for this training section in the file: `src/training.rs`.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved

Instead of a simple tensor, the model should output an item that can be understood by the learner, a struct whose
responsibility is to apply an optimizer to the model. The output struct is used for all metrics
calculated during the training. Therefore it should include all the necessary information to
calculate any metric that you want for a task.
Expand Down
44 changes: 44 additions & 0 deletions burn-book/src/building-blocks/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,50 @@ distributions.
| `MapperDataset` | Computes a transformation lazily on the input dataset. |
| `ComposedDataset` | Composes multiple datasets together to create a larger one without copying any data. |

Let us look at the basic usages of each dataset transform and how they can be composed together. The full documentation
of each transform can be found at the [API reference](https://burn.dev/docs/burn/data/dataset/transform/index.html).
ashdtu marked this conversation as resolved.
Show resolved Hide resolved

* **SamplerDataset**: This transform can be used to sample items from a dataset with(default) and without replacement.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
Transform is initialized with a sampling size which can be bigger or smaller than the input dataset size. This is
particularly useful in cases where we want to checkpoint larger datasets more often during training
and smaller datasets less often as the size of an epoch is now controlled by the sampling size. Sample usage:
```rust, ignore
type DbPedia = SqliteDataset<DbPediaItem>;
let dataset: DbPedia = HuggingfaceDatasetLoader::new("dbpedia_14")
.dataset("train").
.unwrap();

let dataset = SamplerDataset<DbPedia, DbPediaItem>::new(dataset, 10000);
```

* **ShuffledDataset**: This transform can be used to shuffle the items of a dataset. Particularly useful before splitting
the raw dataset into train/test splits. Can be initialized with a seed to ensure reproducibility.
```rust, ignore
let dataset = ShuffledDataset<DbPedia, DbPediaItem>::with_seed(dataset, 42);
```

* **PartialDataset**: This transform is useful to return a view of the dataset with a specified start, end index. Used
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
to create train/val/test splits. Below example we show how to chain ShuffledDataset and PartialDataset to create splits.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
```rust, ignore
// define chained dataset type here for brevity
let PartialData = PartialDataset<ShuffledDataset<DbPedia, DbPediaItem>>;
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
let dataset_len = dataset.len();
let split == "train"; // or "val"/"test"

let data_split = match split {
"train" => PartialData::new(dataset, 0, len * 8 / 10), // Get first 80% dataset
"test" => PartialData::new(dataset, len * 8 / 10, len), // Take remaining 20%
_ => panic!("Invalid split type"), // Handle unexpected split types
};
```

* **MapperDataset**: This transform is useful to apply a transformation on each of the items of a dataset. Particularly
useful for normalization of image data when channel means are known.

* **ComposedDataset**: This transform is useful to compose multiple datasets downloaded from multiple sources(say
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
different HuggingfaceDatasetLoader sources) into a single bigger dataset which can be sampled from one source.


## Storage

There are multiple dataset storage options available for you to choose from. The choice of the
Expand Down
85 changes: 78 additions & 7 deletions burn-book/src/building-blocks/tensor.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Tensor

As previously explained in the [model section](../basic-workflow/model.md), the Tensor struct has 3
generic arguments: the backend, the number of dimensions (rank) and the data type.
generic arguments: the backend, the number of dimensions (rank) and the data type.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved

```rust , ignore
Tensor<B, D> // Float tensor (default)
Expand All @@ -13,14 +13,85 @@ Tensor<B, D, Bool> // Bool tensor
Note that the specific element types used for `Float`, `Int`, and `Bool` tensors are defined by
backend implementations.

## Operations
In contrast to PyTorch where a Tensor is characterised by the size of it's each dimension, a Burn Tensor is
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
characterized by the number of dimensions D in its declaration. The actual shape of the tensor is inferred from
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
its initialization. For eg, a Tensor of size: (5,) is initialized as below:
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
```rust, ignore
// correct: Tensor is 1-Dimensional with 5 elements
let tensor_1 = Tensor::<Backend, 1>::from_floats([1.0, 2.0, 3.0, 4.0, 5.0]);

// incorrect: let tensor_1 = Tensor::<Backend, 5>::from_floats([1.0, 2.0, 3.0, 4.0, 5.0]);
// this will lead to an error and is for creating a 5-D tensor
```

### Initialization

Burn Tensors are primarily initialized using the `from_data()` method which takes the `Data` struct as input.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
The `Data` struct has two fields: value & shape. To access the data of any tensor, we use the method `.to_data()`.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
Let's look at a couple of examples for initializing a tensor from different inputs.

```rust, ignore
// Initialization from a float array
let tensor_1 = Tensor::<Backend, 1>::from_data([1.0, 2.0, 3.0]);

// equivalent to (recommended)
let tensor_2 = Tensor::<Backend, 1>::from_floats([1.0, 2.0, 3.0]);
ashdtu marked this conversation as resolved.
Show resolved Hide resolved

// Initialization from a custom type

struct BMI {
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
age: i8,
height: i16,
weight: f32
}

let bmi = BMI{
age: 10,
height: 100,
weight: 50.0
};
let tensor_3 = Tensor::<Backend, 1>::from_data(Data::from([bmi.age as f32, bmi.height as f32, bmi.weight]).convert());

```
The `.convert()` method for Data struct is called in the last example to ensure that the data's primitive type is
consistent across all backends. The convert operation can also be done at element wise level as:
`let tensor_4 = Tensor::<B, 1, Int>::from_data(Data::from([(item.age as i64).elem()])`.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved

## Ownership and Cloning

Almost all Burn operations take ownership of the input tensors. Therefore, reusing a tensor multiple
times will necessitate cloning it. Don't worry, the tensor's buffer isn't copied, but a reference to
it is increased. This makes it possible to determine exactly how many times a tensor is used, which
is very convenient for reusing tensor buffers and improving performance. For that reason, we don't
provide explicit inplace operations. If a tensor is used only one time, inplace operations will
always be used when available.
times will necessitate cloning it. Let's look at an example to understand the ownership rules and cloning better.
Suppose we want to do a simple min-max normalization of an input tensor.
```rust, ignore
let input = Tensor::<Wgpu, 1>::from_floats([1.0, 2.0, 3.0, 4.0]);
let min = input.min();
let max = input.max();
let input = (input - min).div(max - min);
```
With PyTorch tensors, the above code would work as expected. However, Rust's strict ownership rules will give an error
and prevent using the input tensor after the first `.min()` operation. The ownership of the input tensor is transferred
to the variable `min` and the input tensor is no longer available for further operations. Burn Tensors like most
complex primitives do not implement the `Copy` trait and therefore have to be cloned explicitly. Now let's rewrite
a working example of doing min-max normalization with cloning.

```rust, ignore
let input = Tensor::<Wgpu, 1>::from_floats([1.0, 2.0, 3.0, 4.0]);
let min = input.clone().min();
let max = input.clone().max();
let input = (input.clone() - min.clone()).div(max - min);
println!("{:?}", input.to_data()); // Success: [0.0, 0.33333334, 0.6666667, 1.0]

// Notice that max, min have been moved in last operation so the below print will give an error.
// If we want to utlize them as well for further operations, they will need to be cloned in similar fashion.
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
// println!("{:?}", min.to_data());
```

We don't need to be worried about memory overhead because with cloning, the tensor's buffer isn't copied,
and only a reference to it is increased. This makes it possible to determine exactly how many times a tensor is used,
which is very convenient for reusing tensor buffers and improving performance. For that reason, we don't provide
ashdtu marked this conversation as resolved.
Show resolved Hide resolved
explicit inplace operations. If a tensor is used only one time, inplace operations will always be used when available.

## Tensor Operations

Normally with PyTorch, explicit inplace operations aren't supported during the backward pass, making
them useful only for data preprocessing or inference-only model implementations. With Burn, you can
Expand Down
2 changes: 1 addition & 1 deletion burn-book/src/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ Tensor {
```

While the previous example is somewhat trivial, the upcoming
[basic workflow section](./basic-workflow/) will walk you through a much more relevant example for
[basic workflow section](./basic-workflow/README.md) will walk you through a much more relevant example for
deep learning applications.

## Running examples
Expand Down