tracel-ai · ashdtu · Jan 30, 2024 · Jan 26, 2024 · Jan 26, 2024 · Jan 26, 2024
diff --git a/burn-book/src/basic-workflow/README.md b/burn-book/src/basic-workflow/README.md
@@ -4,7 +4,12 @@ This guide will walk you through the process of creating a custom model built wi
 train a simple convolutional neural network model on the MNIST dataset and prepare it for inference.
 
 For clarity, we sometimes omit imports in our code snippets. For more details, please refer to the
-corresponding code in the `examples/guide` directory.
+corresponding code in the `examples/guide` [directory](https://github.com/tracel-ai/burn/tree/main/examples/guide)
+where this complete example is reproduced step-by-step from dataset & model definition to training.
+The demo example can be executed from burn's base directory using the command:
+```bash
+cargo run --example guide
+```
 
 ## Key Learnings
 

diff --git a/burn-book/src/basic-workflow/backend.md b/burn-book/src/basic-workflow/backend.md
@@ -2,7 +2,7 @@
 
 We have effectively written most of the necessary code to train our model. However, we have not
 explicitly designated the backend to be used at any point. This will be defined in the main
-entrypoint of our program, namely the `main` function below.
+entrypoint of our program, namely the `main` function defined in `src/main.rs`.
 
 ```rust , ignore
 use burn::optim::AdamConfig;
@@ -28,8 +28,9 @@ fn main() {
 You might be wondering why we use the `guide` prefix to bring the different modules we just
 implemented into scope. Instead of including the code in the current guide in a single file, we
 separated it into different files which group related code into _modules_. The `guide` is simply the
-name we gave to our _crate_, which contains the different files. Below is a brief explanation of the
-different parts of the Rust module system.
+name we gave to our _crate_, which contains the different files. If you named your project crate as `my-first-burn-model`, 
+you can equivalently replace all usages of `guide` above with `my-first-burn-model`.Below is a brief explanation of the
+different parts of the Rust module system. 
 
 A **package** is a bundle of one or more crates that provides a set of functionality. A package
 contains a `Cargo.toml` file that describes how to build those crates. Burn is a package.
@@ -38,13 +39,23 @@ A **crate** is a compilation unit in Rust. It could be a single file, but it is
 split up crates into multiple _modules_ and possibly multiple files. A crate can come in one of two
 forms: a binary crate or a library crate. When compiling a crate, the compiler first looks in the
 crate root file (usually `src/lib.rs` for a library crate or `src/main.rs` for a binary crate). Any
-module declared in the crate root file will be inserted in the crate for compilation.
+module declared in the crate root file will be inserted in the crate for compilation. For this demo example, we will
+define a library crate where all the individual modules(model, data, training etc.) are listed inside `src/lib.rs` as 
+following:
+
+```
+pub mod data;
+pub mod inference;
+pub mod model;
+pub mod training;
+```
 
 A **module** lets us organize code within a crate for readability and easy reuse. Modules also allow
-us to control the _privacy_ of items.
+us to control the _privacy_ of items. The pub keyword used above for example, is used to make a module publicly available
+inside the crate.
 
-For this guide, we defined a library crate with a single example where the `main` function is
-defined, as illustrated in the structure below.
+In our examples, demonstrating the code within this guide, we defined a library crate with a single example where 
+the `main` function is defined inside the `guide.rs` file, as illustrated in the structure below.
 
 ```
 guide
@@ -60,7 +71,8 @@ guide
 ```
 
 The source for this guide can be found in our
-[GitHub repository](https://github.com/tracel-ai/burn/tree/main/examples/guide).\
+[GitHub repository](https://github.com/tracel-ai/burn/tree/main/examples/guide) which can be used to run this basic 
+work flow example end-to-end.\
 
 </details><br>
 

diff --git a/burn-book/src/basic-workflow/data.md b/burn-book/src/basic-workflow/data.md
@@ -9,6 +9,9 @@ To iterate over a dataset efficiently, we will define a struct which will implem
 trait. The goal of a batcher is to map individual dataset items into a batched tensor that can be
 used as input to our previously defined model.
 
+Let us start by defining our dataset functionalities in a file `src/data.rs`. We shall omit some of the imports for brevity,
+but the full code for following this guide can be found at `examples/guide/` [directory](https://github.com/tracel-ai/burn/tree/main/examples/guide).
+
 ```rust , ignore
 use burn::{
     data::{dataloader::batcher::Batcher, dataset::vision::MNISTItem},

diff --git a/burn-book/src/basic-workflow/inference.md b/burn-book/src/basic-workflow/inference.md
@@ -6,7 +6,7 @@ For loading a model primed for inference, it is of course more efficient to dire
 weights into the model, bypassing the need to initially set arbitrary weights or worse, weights
 computed from a Xavier normal initialization only to then promptly replace them with the stored
 weights. With that in mind, let's create a new initialization function receiving the record as
-input.
+input. This new function can be defined alongside the `init` function for the `ModelConfig` struct in `src/model.rs`.
 
 ```rust , ignore
 impl ModelConfig {
@@ -30,7 +30,7 @@ It is important to note that the `ModelRecord` was automatically generated thank
 trait. It allows us to load the module state without having to deal with fetching the correct type
 manually. Everything is validated when loading the model with the record.
 
-Now let's create a simple `infer` method in which we will load our trained model.
+Now let's create a simple `infer` method in a new file `src/inference.rs` which we will use to load our trained model.
 
 ```rust , ignore
 pub fn infer<B: Backend>(artifact_dir: &str, device: B::Device, item: MNISTItem) {

diff --git a/burn-book/src/basic-workflow/model.md b/burn-book/src/basic-workflow/model.md
@@ -26,7 +26,7 @@ Our goal will be to create a basic convolutional neural network used for image c
 will keep the model simple by using two convolution layers followed by two linear layers, some
 pooling and ReLU activations. We will also use dropout to improve training performance.
 
-Let us start by creating a model in a file `model.rs`.
+Let us start by defining our model struct in a new file `src/model.rs`.
 
 ```rust , ignore
 use burn::{
@@ -281,9 +281,9 @@ network modules already built with Burn use the `forward` nomenclature, simply b
 standard in the field.
 
 Similar to neural network modules, the [`Tensor`](../building-blocks/tensor.md) struct given as a
-parameter also takes the Backend trait as a generic argument, alongside its rank. Even if it is not
+parameter also takes the Backend trait as a generic argument, alongside its rank(or dimensionality). Even if it is not
 used in this specific example, it is possible to add the kind of the tensor as a third generic
-argument.
+argument. For example, a 3-dimensional Tensor of different data types(float, int, bool) would be defined as following:
 
 ```rust , ignore
 Tensor<B, 3> // Float tensor (default)

diff --git a/burn-book/src/basic-workflow/training.md b/burn-book/src/basic-workflow/training.md
@@ -1,7 +1,9 @@
 # Training
 
-We are now ready to write the necessary code to train our model on the MNIST dataset. Instead of a
-simple tensor, the model should output an item that can be understood by the learner, a struct whose
+We are now ready to write the necessary code to train our model on the MNIST dataset. 
+We shall define the code for this training section in the file: `src/training.rs`. 
+
+Instead of a simple tensor, the model should output an item that can be understood by the learner, a struct whose
 responsibility is to apply an optimizer to the model. The output struct is used for all metrics
 calculated during the training. Therefore it should include all the necessary information to
 calculate any metric that you want for a task.

diff --git a/burn-book/src/building-blocks/dataset.md b/burn-book/src/building-blocks/dataset.md
@@ -32,6 +32,50 @@ distributions.
 | `MapperDataset`   | Computes a transformation lazily on the input dataset.                                                                   |
 | `ComposedDataset` | Composes multiple datasets together to create a larger one without copying any data.                                     |
 
+Let us look at the basic usages of each dataset transform and how they can be composed together. The full documentation
+of each transform can be found at the [API reference](https://burn.dev/docs/burn/data/dataset/transform/index.html).
+
+* **SamplerDataset**: This transform can be used to sample items from a dataset with(default) and without replacement.
+Transform is initialized with a sampling size which can be bigger or smaller than the input dataset size. This is
+particularly useful in cases where we want to checkpoint larger datasets more often during training
+and smaller datasets less often as the size of an epoch is now controlled by the sampling size. Sample usage:
+```rust, ignore
+type DbPedia = SqliteDataset<DbPediaItem>;
+let dataset: DbPedia = HuggingfaceDatasetLoader::new("dbpedia_14")
+        .dataset("train").
+        .unwrap();
+
+let dataset = SamplerDataset<DbPedia, DbPediaItem>::new(dataset, 10000);
+```
+
+* **ShuffledDataset**: This transform can be used to shuffle the items of a dataset. Particularly useful before splitting
+the raw dataset into train/test splits. Can be initialized with a seed to ensure reproducibility. 
+```rust, ignore
+let dataset = ShuffledDataset<DbPedia, DbPediaItem>::with_seed(dataset, 42);
+```
+
+* **PartialDataset**: This transform is useful to return a view of the dataset with a specified start, end index. Used
+to create train/val/test splits. Below example we show how to chain ShuffledDataset and PartialDataset to create splits.
+```rust, ignore
+// define chained dataset type here for brevity
+let PartialData = PartialDataset<ShuffledDataset<DbPedia, DbPediaItem>>;
+let dataset_len = dataset.len();
+let split == "train"; // or "val"/"test"
+
+let data_split = match split {
+            "train" => PartialData::new(dataset, 0, len * 8 / 10), // Get first 80% dataset
+            "test" => PartialData::new(dataset, len * 8 / 10, len), // Take remaining 20%
+            _ => panic!("Invalid split type"),                     // Handle unexpected split types
+        };
+```
+
+* **MapperDataset**: This transform is useful to apply a transformation on each of the items of a dataset. Particularly
+useful for normalization of image data when channel means are known. 
+
+* **ComposedDataset**: This transform is useful to compose multiple datasets downloaded from multiple sources(say 
+different HuggingfaceDatasetLoader sources) into a single bigger dataset which can be sampled from one source. 
+
+
 ## Storage
 
 There are multiple dataset storage options available for you to choose from. The choice of the

diff --git a/burn-book/src/building-blocks/tensor.md b/burn-book/src/building-blocks/tensor.md
@@ -1,7 +1,7 @@
 # Tensor
 
 As previously explained in the [model section](../basic-workflow/model.md), the Tensor struct has 3
-generic arguments: the backend, the number of dimensions (rank) and the data type.
+generic arguments: the backend, the number of dimensions (rank) and the data type. 
 
 ```rust , ignore
 Tensor<B, D>           // Float tensor (default)
@@ -13,14 +13,85 @@ Tensor<B, D, Bool>     // Bool tensor
 Note that the specific element types used for `Float`, `Int`, and `Bool` tensors are defined by
 backend implementations.
 
-## Operations
+In contrast to PyTorch where a Tensor is characterised by the size of it's each dimension, a Burn Tensor is
+characterized by the number of dimensions D in its declaration. The actual shape of the tensor is inferred from
+its initialization. For eg, a Tensor of size: (5,) is initialized as below:
+```rust, ignore
+// correct: Tensor is 1-Dimensional with 5 elements
+let tensor_1 = Tensor::<Backend, 1>::from_floats([1.0, 2.0, 3.0, 4.0, 5.0]);
+
+// incorrect: let tensor_1 = Tensor::<Backend, 5>::from_floats([1.0, 2.0, 3.0, 4.0, 5.0]);
+// this will lead to an error and is for creating a 5-D tensor
+```
+
+### Initialization
+
+Burn Tensors are primarily initialized using the `from_data()` method which takes the `Data` struct as input. 
+The `Data` struct has two fields: value & shape. To access the data of any tensor, we use the method `.to_data()`.
+Let's look at a couple of examples for initializing a tensor from different inputs. 
+
+```rust, ignore
+// Initialization from a float array
+let tensor_1 = Tensor::<Backend, 1>::from_data([1.0, 2.0, 3.0]);
+
+// equivalent to (recommended)
+let tensor_2 = Tensor::<Backend, 1>::from_floats([1.0, 2.0, 3.0]);
+
+// Initialization from a custom type
+
+struct BMI {
+    age: i8,
+    height: i16,
+    weight: f32
+}
+
+let bmi = BMI{
+        age: 10,
+        height: 100,
+        weight: 50.0
+    };
+let tensor_3 = Tensor::<Backend, 1>::from_data(Data::from([bmi.age as f32, bmi.height as f32, bmi.weight]).convert());
+
+```
+The `.convert()` method for Data struct is called in the last example to ensure that the data's primitive type is 
+consistent across all backends. The convert operation can also be done at element wise level as: 
+`let tensor_4 = Tensor::<B, 1, Int>::from_data(Data::from([(item.age as i64).elem()])`.  
+
+## Ownership and Cloning
 
 Almost all Burn operations take ownership of the input tensors. Therefore, reusing a tensor multiple
-times will necessitate cloning it. Don't worry, the tensor's buffer isn't copied, but a reference to
-it is increased. This makes it possible to determine exactly how many times a tensor is used, which
-is very convenient for reusing tensor buffers and improving performance. For that reason, we don't
-provide explicit inplace operations. If a tensor is used only one time, inplace operations will
-always be used when available.
+times will necessitate cloning it. Let's look at an example to understand the ownership rules and cloning better.
+Suppose we want to do a simple min-max normalization of an input tensor. 
+```rust, ignore
+    let input = Tensor::<Wgpu, 1>::from_floats([1.0, 2.0, 3.0, 4.0]);
+    let min = input.min();
+    let max = input.max();
+    let input = (input - min).div(max - min);
+```
+With PyTorch tensors, the above code would work as expected. However, Rust's strict ownership rules will give an error
+and prevent using the input tensor after the first `.min()` operation. The ownership of the input tensor is transferred
+to the variable `min` and the input tensor is no longer available for further operations. Burn Tensors like most 
+complex primitives do not implement the `Copy` trait and therefore have to be cloned explicitly. Now let's rewrite 
+a working example of doing min-max normalization with cloning.
+
+```rust, ignore
+    let input = Tensor::<Wgpu, 1>::from_floats([1.0, 2.0, 3.0, 4.0]);
+    let min = input.clone().min();
+    let max = input.clone().max();
+    let input = (input.clone() - min.clone()).div(max - min);
+    println!("{:?}", input.to_data());      // Success: [0.0, 0.33333334, 0.6666667, 1.0]
+
+    // Notice that max, min have been moved in last operation so the below print will give an error.
+    // If we want to utlize them as well for further operations, they will need to be cloned in similar fashion.
+    // println!("{:?}", min.to_data());    
+```
+
+We don't need to be worried about memory overhead because with cloning, the tensor's buffer isn't copied, 
+and only a reference to it is increased. This makes it possible to determine exactly how many times a tensor is used, 
+which is very convenient for reusing tensor buffers and improving performance. For that reason, we don't provide 
+explicit inplace operations. If a tensor is used only one time, inplace operations will always be used when available.
+
+## Tensor Operations
 
 Normally with PyTorch, explicit inplace operations aren't supported during the backward pass, making
 them useful only for data preprocessing or inference-only model implementations. With Burn, you can

diff --git a/burn-book/src/getting-started.md b/burn-book/src/getting-started.md
@@ -165,7 +165,7 @@ Tensor {
 ```
 
 While the previous example is somewhat trivial, the upcoming
-[basic workflow section](./basic-workflow/) will walk you through a much more relevant example for
+[basic workflow section](./basic-workflow/README.md) will walk you through a much more relevant example for
 deep learning applications.
 
 ## Running examples