Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typo Fixes #66

Merged
merged 27 commits into from
Aug 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
54156ad
Lowercase loop variable
hassan-elsheikha May 9, 2024
480f346
Fix typo
hassan-elsheikha May 10, 2024
e9e2724
Clarify definition of the size of a filter
hassan-elsheikha May 13, 2024
9463611
Loop description typo fixes
hassan-elsheikha May 13, 2024
08cbc6f
Add comma for clarity
hassan-elsheikha May 13, 2024
2ee806f
Make variable name an inline code block
hassan-elsheikha May 13, 2024
defb0a6
Add C++ styling to code block
hassan-elsheikha May 13, 2024
4b1d722
Inline-code many variable names
hassan-elsheikha May 13, 2024
0caa2ce
Inline volatile keyword
hassan-elsheikha May 13, 2024
7b45ac0
Add C++ styling to code block
hassan-elsheikha May 13, 2024
43269b5
Add C++ styling to code block
hassan-elsheikha May 13, 2024
9223f06
Add C++ styling to code block
hassan-elsheikha May 13, 2024
197c6fa
Add C++ styling to code block
hassan-elsheikha May 13, 2024
a918e12
Add C++ styling to code block
hassan-elsheikha May 13, 2024
de50654
Incorrect usage of "dependent"
hassan-elsheikha May 13, 2024
dd7cbef
Clarify tradeoff on one sentence
hassan-elsheikha May 13, 2024
836c973
Reverse incorrect change
hassan-elsheikha May 13, 2024
221b09b
Update readme.md
hassan-elsheikha Jul 28, 2024
6716bb2
Update readme.md
hassan-elsheikha Jul 28, 2024
11842a5
Update readme.md
hassan-elsheikha Jul 28, 2024
4af7301
Update readme.md
hassan-elsheikha Jul 29, 2024
cc7a6d1
Update readme.md
hassan-elsheikha Jul 29, 2024
e9d2931
Update readme.md
hassan-elsheikha Jul 29, 2024
e72a0b7
Update readme.md
hassan-elsheikha Jul 30, 2024
8c45115
Update readme.md
hassan-elsheikha Jul 31, 2024
2ce5469
Update readme.md
hassan-elsheikha Jul 31, 2024
5bd61c6
Update readme.md
hassan-elsheikha Jul 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions Training1/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -1008,7 +1008,7 @@ run in the pipeline at each stage. The leftmost column indicates the
loop iteration for the instructions in the row starting (from Iteration
0). For function pipelines, Iteration 0 corresponds to the first input.

If you hold you mouse over an instruction you will see more details
If you hold your mouse over an instruction you will see more details
about the operation type.

<p align="center"><img src=".//media/image49.png" /></br>Figure 15: SmartHLS Schedule Viewer: Pipeline Viewer</p></br>
Expand Down Expand Up @@ -1569,7 +1569,7 @@ SmartHLS.
Pipelining is a common HLS optimization used to increase hardware
throughput and to better utilize FPGA hardware resources. We also
covered the concept of loop pipelining in the SmartHLS Sobel Filter
Tutorial. In Figure 18a) shows a loop to be scheduled with 3
Tutorial. Figure 18a) shows a loop to be scheduled with 3
single-cycle operations: Load, Comp, Store. We show a comparison of the
cycle-by-cycle operations when hardware operations in a loop are
implemented b) sequentially (default) or c) pipelined (with SmartHLS
Expand Down Expand Up @@ -1598,7 +1598,7 @@ pragma or the function pipeline pragma:
```
Loop pipelining only applies to a specific loop in a C++ function.
Meanwhile, function pipelining is applied to an entire C++ function and
SmartHLS will automatically unrolls all loops in that function.
SmartHLS will automatically unroll all loops in that function.

## SmartHLS Pipelining Hazards: Why Initiation Interval Cannot Always Be 1

Expand Down Expand Up @@ -1742,7 +1742,7 @@ is presented in Figure 20.
void functional_unit_contention( volatile int array[N] ) {
#pragma HLS loop unroll factor(1)
#pragma HLS loop pipeline
for (int I = 0; i < N; i++) {
for (int i = 0; i < N; i++) {
int mult1 = coeff1 * coeff1;
int mult2 = coeff2 * coeff2;
array[i] = mult1 + mult2;
Expand Down Expand Up @@ -2716,14 +2716,12 @@ the user specifies an incorrect value in a SmartHLS pragma. For example,
specifying an incorrect depth on a memory interface such as the
following on line 29:
```c
#pragma HLS interface argument(input_buffer) type(memory)
num_elements(SIZE)
#pragma HLS interface argument(input_buffer) type(memory) num_elements(SIZE)
```
For example, we can try changing the correct SIZE array depth to a wrong
value like 10:
```c
#pragma HLS interface argument(input_buffer) type(memory)
num_elements(10)
#pragma HLS interface argument(input_buffer) type(memory) num_elements(10)
```

Now we rerun SmartHLS to generate the hardware
Expand Down
61 changes: 30 additions & 31 deletions Training2/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,7 @@ performance.
34 // Pipeline for extra performance.
35 #pragma HLS loop pipeline
36 for (int i = 0; i < 100; i++)
37 sum += buf[i\];
37 sum += buf[i];
38 done = false;
39 output_fifo.write(sum);
40 }
Expand Down Expand Up @@ -782,7 +782,7 @@ int main() {

## Verification: Co-simulation of Multi-threaded SmartHLS Code

As mentioned before the `producer_consumer` project cannot be simulated
As mentioned before, the `producer_consumer` project cannot be simulated
with co-simulation. This is because the `producer_consumer` project has
threads that run forever and do not finish before the top-level function
returns. SmartHLS co-simulation supports a single call to the top-level
Expand Down Expand Up @@ -1126,11 +1126,11 @@ is defined by the number of rows, columns, and the depth.
Convolution layers are good for extracting geometric features from an
input tensor. Convolution layers work in the same way as an image
processing filter (such as the Sobel filter) where a square filter
(called a **kernel**) is slid across an input image. The **size** of
filter is equal to the side length of the square filter, and the size of
the step when sliding the filter is called the **stride**. The values of
the input tensor under the kernel (called the **window**) and the values
of the kernel are multiplied and summed at each step, which is also
(called a **kernel**) is slid across an input image. The **size** of a
filter is equal to its side length, and the size of the step when sliding
the filter is called the **stride**. The values of the input tensor
under the kernel (called the **window**) and the values of the
kernel are multiplied and summed at each step, which is also
called a convolution. Figure 13 shows an example of a convolution layer
processing an input tensor with a depth of 1.

Expand Down Expand Up @@ -1579,16 +1579,16 @@ we show the input tensor values and convolution filters involved in the
computation of the set of colored output tensor values (see Loop 3
arrow).

Loop 1 and Loop 2 the code traverses along the row and column dimensions
For Loop 1 and Loop 2, the code traverses along the row and column dimensions
of the output tensor. Loop 3 traverses along the depth dimension of the
output tensor, each iteration computes a `PARALLEL_KERNELS` number of
output tensor, and each iteration computes a total of `PARALLEL_KERNELS`
outputs. The `accumulated_value` array will hold the partial
dot-products. Loop 4 traverses along the row and column dimensions of
the input tensor and convolution filter kernels. Then Loop 5 walks
through each of the `PARALLEL_KERNELS` number of selected convolution
the input tensor and convolution filter kernels. Then, Loop 5 walks
through each of the `PARALLEL_KERNELS` selected convolution
filters and Loop 6 traverses along the depth dimension of the input
tensor. Loop 7 and Loop 8 add up the partial sums together with biases
to produce `PARALLEL_KERNEL` number of outputs.
to produce `PARALLEL_KERNEL` outputs.

```C
const static unsigned PARALLEL_KERNELS = NUM_MACC / INPUT_DEPTH;
Expand Down Expand Up @@ -2202,7 +2202,7 @@ instructions that always run together with a single entry point at the
beginning and a single exit point at the end. A basic block in LLVM IR
always has a label at the beginning and a branching instruction at the
end (br, ret, etc.). An example of LLVM IR is shown below, where the
`body.0` basic block performs an addition (add) and subtraction (sub) and
`body.0` basic block performs an addition (add) and subtraction (sub), and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then branches unconditionally (br) to another basic block labeled
`body.1`. Control flow occurs between basic blocks.

Expand Down Expand Up @@ -2230,7 +2230,7 @@ button (![](.//media/image28.png)) to build the design and generate the
schedule.

We can ignore the `printWarningMessageForGlobalArrayReset` warning message
for global variable a in this example as described in the producer
for global variable `a` in this example as described in the producer
consumer example in the [section 'Producer Consumer Example'](#producer-consumer-example).

The first example we will look at is the `no_dependency` example on line
Expand All @@ -2239,7 +2239,7 @@ The first example we will look at is the `no_dependency` example on line

<p align="center"><img src=".//media/image19.png" /></p>

```
```c++
8 void no_dependency() {
9 #pragma HLS function noinline
10 e = b + c;
Expand All @@ -2251,10 +2251,10 @@ The first example we will look at is the `no_dependency` example on line
<p align="center">Figure 28: Source code and data dependency graph for no_dependency
function.</p>

In this example, values are loaded from b, c, and d and additions happen
before storing to *e*, *f*, and *g*. None of the adds use results from
In this example, values are loaded from `b`, `c`, and `d`, and additions happen
before storing to `e`, `f`, and `g`. None of the adds use results from
the previous adds and thus all three adds can happen in parallel. The
*noinline* pragma is used to prevent SmartHLS from automatically
`noinline` pragma is used to prevent SmartHLS from automatically
inlining this small function and making it harder for us to understand
the schedule. Inlining is when the instructions in the called function
get copied into the caller, to remove the overhead of the function call
Expand Down Expand Up @@ -2289,17 +2289,17 @@ the store instruction highlighted in yellow depends on the result of the
add instruction as we expect.

We have declared all the variables used in this function as
**volatile**. The volatile C/C++ keyword specifies that the variable can
**volatile**. The `volatile` C/C++ keyword specifies that the variable can
be updated by something other than the program itself, making sure that
any operation with these variables do not get optimized away by the
compiler as every operation matters. An example of where the compiler
handles this incorrectly is seen in the [section 'Producer Consumer Example'](#producer-consumer-example), where we had to
declare a synchronization signal between two threaded functions as
volatile. Using volatile is required for toy examples to make sure each
`volatile`. Using `volatile` is required for toy examples to make sure each
operation we perform with these variables will be generated in hardware
and viewable in the Schedule Viewer.

```
```c++
4 volatile int a[5] = {0};
5 volatile int b = 0, c = 0, d = 0;
6 volatile int e, f, g;
Expand All @@ -2314,7 +2314,7 @@ code and SmartHLS cannot schedule all instructions in the first cycle.

<p align="center"><img src=".//media/image68.png" /></p>

```
```c++
15 void data_dependency() {
16 #pragma HLS function noinline
17 e = b + c;
Expand All @@ -2336,8 +2336,8 @@ second add is also used in the third add. These are examples of data
dependencies as later adds use the data result of previous adds. Because
we must wait for the result `e` to be produced before we can compute `f`,
and then the result `f` must be produced before we can compute `g`, not all
instructions can be scheduled immediately. They must wait for their
dependent instructions to finish executing before they can start, or
instructions can be scheduled immediately. They must wait for the instructions
they depend on to finish executing before they can start, or
they would produce the wrong result.

<p align="center"><img src=".//media/image70.png" /></br>
Expand Down Expand Up @@ -2374,7 +2374,7 @@ memories.

<p align="center"><img src=".//media/image72.png" /></p>

```
```c++
22 void memory_dependency() {
23 #pragma HLS function noinline
24 volatile int i = 0;
Expand Down Expand Up @@ -2418,7 +2418,7 @@ resource cannot be scheduled in parallel due to a lack of resources.
`resource_contention` function on line 30 of
`instruction_level_parallelism.cpp`.

```
```c++
30 void resource_contention() {
31 #pragma HLS function noinline
32 e = a[0];
Expand Down Expand Up @@ -2451,7 +2451,7 @@ when generating the schedule for a design.
Next, we will see an example of how loops prevent operations from being
scheduled in parallel.

```
```c++
37 void no_loop_unroll() {
38 #pragma HLS function noinline
39 int h = 0;
Expand Down Expand Up @@ -2480,10 +2480,9 @@ has no unrolling on the loop and `loop_unroll` unrolls the loop
completely. This affects the resulting hardware by removing the control
signals needed to facilitate the loop and combining multiple loop bodies
into the same basic block, allowing more instructions to be scheduled in
parallel. The trade-off here is an unrolled loop does not reuse hardware
resources and can potentially use a lot of resources. However, the
unrolled loop would finish earlier depending on how inherently parallel
the loop body is.
parallel. The trade-off here is that an unrolled loop does not reuse hardware
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resources and can potentially use a lot of resources, however it will
finish earlier depending on how inherently parallel the loop body is.

![](.//media/image3.png)To see the effects of this, open the Schedule
Viewer and first click on the `no_loop_unroll` function shown in Figure
Expand Down
2 changes: 1 addition & 1 deletion Training3/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -1997,7 +1997,7 @@ format to write and read from the addresses that line up with addresses
in the AXI target interface in the SmartHLS core. For burst mode, the
processor will also write to and read from addresses corresponding to
the DDR memory. Note, the pointers are cast as volatile to prevent the
SoftConsole compiler from optimization away these reads and writes. The
SoftConsole compiler from optimizating away these reads and writes. The
Mi-V then asserts the run signal and waits until the accelerator
de-asserts it, signally the computing is done. The Mi-V then reads from
the memory to issue read requests and get the results from the
Expand Down
6 changes: 3 additions & 3 deletions Training4/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ array.

```c
24 // The core logic of this example
25 void vector_add_sw(int a, int b, int result) {
25 void vector_add_sw(int* a, int* b, int* result) {
26 for (int i = 0; i < SIZE; i++) {
27 result[i] = a[i] + b[i];
28 }
Expand All @@ -391,7 +391,7 @@ Now we look on line 70 at the `vector_add_axi_target_memcpy` top-level
C++ function as shown in Figure 6‑12.

```c
70 void vector_add_axi_target_memcpy(int a, int b, int result) {
70 void vector_add_axi_target_memcpy(int* a, int* b, int* result) {
71 #pragma HLS function top
72 #pragma HLS interface control type(axi_target)
73 #pragma HLS interface argument(a) type(axi_target) num_elements(SIZE)
Expand Down Expand Up @@ -493,7 +493,7 @@ corresponding to the C++ top-level function as shown below in Figure
clock and a single AXI4 Target port. Due to the large number of AXI4
ports in the RTL, SmartHLS uses a wildcard “`axi4target_*`” to
simplify the table. The “Control AXI4 Target” indicates that
start/finish control as done using the AXI target interface. Each of the
start/finish control is done using the AXI target interface. Each of the
function’s three arguments also use the AXI target interface. The
address map of the AXI target port is given later in the report.

Expand Down