Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump nvidia cuda docker image version #2821

Merged
merged 5 commits into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ insert_final_newline = unset
# Makefiles/Dockerfile/golang files
[{Makefile,Dockerfile{,.debian},*.go}]
indent_style = tab
indent_size = 8

# YAML/JSON Files
[{.ecrc,*.{yml,yaml,sh,json}}]
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,8 @@ jobs:
developer-certificate-password: ${{ secrets.CI_MACOS_CERTIFICATE_PASSWORD }}
app-notarization-email: ${{ secrets.CI_MACOS_NOTARIZATION_USER }}
app-notarization-password: ${{ secrets.CI_MACOS_NOTARIZATION_PASSWORD }}
app-notarization-team-id: ${{ secrets.CI_MACOS_NOTARIZATION_TEAM_ID }}
binary-path: "lp-builds/"
app-bundle-id: "org.livepeer.livepeer"

- name: Upload build
if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository
Expand Down Expand Up @@ -288,7 +288,7 @@ jobs:
update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-8 30 \
&& update-alternatives --install /usr/bin/clang clang /usr/bin/clang-8 30

LIBTENSORFLOW_VERSION=2.3.4 \
LIBTENSORFLOW_VERSION=2.12.1 \
&& curl -LO https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-${LIBTENSORFLOW_VERSION}.tar.gz \
&& tar -C /usr/local -xzf libtensorflow-gpu-linux-x86_64-${LIBTENSORFLOW_VERSION}.tar.gz \
&& ldconfig
Expand Down
6 changes: 3 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ This release also includes a darwin arm64 build and darwin/linux binaries compil

*February 11th 2022*

This release supports connecting to Arbitrum Mainnet using the `-network arbitrum-one-mainnet` flag after the L1 Ethereum block 14207040 which is the block at which [LIP-73 i.e. the Confluence upgrade](https://github.com/livepeer/LIPs/blob/master/LIPs/LIP-73.md#specification) will be activated. Prior to this block, running the node wtih `-network arbitrum-one-mainnet` will result in a startup error so it is recommended to wait until after block 14207040 to run the node with the `-network arbitrum-one-mainnet` flag. **We strongly encourage all node operators to upgrade to this release so they can connect to Arbitrum Mainnet after the LIP-73 block**.
This release supports connecting to Arbitrum Mainnet using the `-network arbitrum-one-mainnet` flag after the L1 Ethereum block 14207040 which is the block at which [LIP-73 i.e. the Confluence upgrade](https://github.com/livepeer/LIPs/blob/master/LIPs/LIP-73.md#specification) will be activated. Prior to this block, running the node with `-network arbitrum-one-mainnet` will result in a startup error so it is recommended to wait until after block 14207040 to run the node with the `-network arbitrum-one-mainnet` flag. **We strongly encourage all node operators to upgrade to this release so they can connect to Arbitrum Mainnet after the LIP-73 block**.

Additional updates in this release include various improvements to compatibility with Arbitrum networks as well as the initial groundwork for enabling H.265/HEVC encoding/decoding and VP8/VP9 decoding jobs on the network.

Expand Down Expand Up @@ -579,7 +579,7 @@ Thanks to everyone that submitted bug reports and assisted in testing!

*August 10 2021*

This release includes another gas price monitoring fix to address additional cases where Ethereum JSON-RPC providers occassionally return really low gas prices for the `eth_gasPrice` RPC call, automatic replacements for pending transactions that timeout, fixes for broadcaster stream recording, support for downloading stream recordings as mp4 files as well as variety of other bug fixes and enhancements.
This release includes another gas price monitoring fix to address additional cases where Ethereum JSON-RPC providers occasionally return really low gas prices for the `eth_gasPrice` RPC call, automatic replacements for pending transactions that timeout, fixes for broadcaster stream recording, support for downloading stream recordings as mp4 files as well as variety of other bug fixes and enhancements.

In addition to the gas price monitoring fix and support for automatic replacements for pending transactions that timeout, a few additional configuration options are introduced to give node operators more control over gas prices and transactions:

Expand Down Expand Up @@ -639,7 +639,7 @@ Thanks to everyone that submitted bug reports and assisted in testing!

*May 18 2021*

This release includes an important gas price monitoring fix that addresses cases where Ethereum JSON-RPC providers occassionally return really low gas prices for the `eth_gasPrice` RPC call, reductions in the gas cost for staking actions (under certain circumstances) using `livepeer_cli` and improvements to split orchestrator and transcoder setups that help remote transcoders retain streams. We strongly recommend all orchestrator and transcoder operators to upgrade to this version as soon as possible to access this latest set of bug fixes and improvements.
This release includes an important gas price monitoring fix that addresses cases where Ethereum JSON-RPC providers occasionally return really low gas prices for the `eth_gasPrice` RPC call, reductions in the gas cost for staking actions (under certain circumstances) using `livepeer_cli` and improvements to split orchestrator and transcoder setups that help remote transcoders retain streams. We strongly recommend all orchestrator and transcoder operators to upgrade to this version as soon as possible to access this latest set of bug fixes and improvements.

Thanks to everyone that submitted bug reports and assisted in testing!

Expand Down
1 change: 1 addition & 0 deletions CHANGELOG_PENDING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## vX.X

### Breaking Changes 🚨🚨
- \#2821 Bump nvidia/cuda base version for docker builds (@stronk-dev and @hjpotter92)

### Features ⚒

Expand Down
9 changes: 5 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -46,18 +46,19 @@ ifeq ($(uname_s),Linux)
endif
endif

.PHONY: livepeer
.PHONY: livepeer livepeer_bench livepeer_cli livepeer_router docker

livepeer:
GO111MODULE=on CGO_ENABLED=1 CC="$(cc)" CGO_CFLAGS="$(cgo_cflags)" CGO_LDFLAGS="$(cgo_ldflags)" go build -o $(GO_BUILD_DIR) -tags "$(BUILD_TAGS)" -ldflags="$(ldflags)" cmd/livepeer/*.go

.PHONY: livepeer_cli
livepeer_cli:
GO111MODULE=on CGO_ENABLED=1 CC="$(cc)" CGO_CFLAGS="$(cgo_cflags)" CGO_LDFLAGS="$(cgo_ldflags)" go build -o $(GO_BUILD_DIR) -tags "$(BUILD_TAGS)" -ldflags="$(ldflags)" cmd/livepeer_cli/*.go

.PHONY: livepeer_bench
livepeer_bench:
GO111MODULE=on CGO_ENABLED=1 CC="$(cc)" CGO_CFLAGS="$(cgo_cflags)" CGO_LDFLAGS="$(cgo_ldflags)" go build -o $(GO_BUILD_DIR) -ldflags="$(ldflags)" cmd/livepeer_bench/*.go

.PHONY: livepeer_router
livepeer_router:
GO111MODULE=on CGO_ENABLED=1 CC="$(cc)" CGO_CFLAGS="$(cgo_cflags)" CGO_LDFLAGS="$(cgo_ldflags)" go build -o $(GO_BUILD_DIR) -ldflags="$(ldflags)" cmd/livepeer_router/*.go

docker:
docker build --build-arg='BUILD_TAGS=mainnet,experimental' -f docker/Dockerfile .
6 changes: 3 additions & 3 deletions common/readfromfile_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ func TestReadFromFileNoFileExists(t *testing.T) {
output, err := ReadFromFile(input)

assert.Nil(err)
// ReadFromFile should return the originaly supplied string
// ReadFromFile should return the originally supplied string
assert.Equal(expectedOutput, output)
}

Expand All @@ -33,7 +33,7 @@ func TestReadFromFileDirectoryExists(t *testing.T) {
output, err := ReadFromFile(input)

assert.NotNil(err)
// ReadFromFile should return the originaly supplied string
// ReadFromFile should return the originally supplied string
assert.Equal(expectedOutput, output)
}

Expand All @@ -57,7 +57,7 @@ func TestReadFromFileEmptyFileExists(t *testing.T) {
output, err := ReadFromFile(tmpFile)

assert.NotNil(err)
// ReadFromFile should return the originaly supplied string
// ReadFromFile should return the originally supplied string
assert.Equal(expectedOutput, output)
}

Expand Down
25 changes: 12 additions & 13 deletions doc/redeemer.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@ The Ticket Redemption Service allows Orchestrator nodes to redeem winning ticket

It is responsible for redeeming winning tickets as well as pushing _max float_ updates about broadcasters back to its connected Orchestrators.

_Max float_ is the guaranteed value an Orchestrator will be able to claim from a Broadcaster's reserve. It accounts for the current reserve allocation from a Broadcaster to an Orchestrator as well as pending winning ticket redemptions.
_Max float_ is the guaranteed value an Orchestrator will be able to claim from a Broadcaster's reserve. It accounts for the current reserve allocation from a Broadcaster to an Orchestrator as well as pending winning ticket redemptions.

\**A more detailed description about max float and it's relation to a broadcaster's reserve can be found in the [PM protocol spec](https://github.com/livepeer/wiki/blob/master/spec/streamflow/pm.md#reserve).*

\**This document uses the term `sender`, it can be used interchangeably with Broadcaster.*

## TicketQueue

1. The `ticketQueue` is a loop that runs every time a new block is seen. It will then pop tickets off the queue starting with the oldest ticket first, and sends it to the `LocalSenderMonitor` for redemption if the `recipientRand` for the ticket has expired.
1. The `ticketQueue` is a loop that runs every time a new block is seen. It will then pop tickets off the queue starting with the oldest ticket first, and sends it to the `LocalSenderMonitor` for redemption if the `recipientRand` for the ticket has expired.

2. When the `LocalSenderMonitor` receives a ticket from the `ticketQueue` it will substract `ticket.faceValue` from the outstanding `maxFloat` as long as the ticket is in limbo.
2. When the `LocalSenderMonitor` receives a ticket from the `ticketQueue` it will subtract `ticket.faceValue` from the outstanding `maxFloat` as long as the ticket is in limbo.

           _This will trigger a `LocalSenderMonitor.SubscribeMaxFloatChange(ticket.sender)` notification_

Expand All @@ -26,35 +26,34 @@ _Max float_ is the guaranteed value an Orchestrator will be able to claim from a

## Monitoring Max Float

1. When max float for a `sender` is requested from the `RedeemerClient` but no local cache is available, an (unary) RPC call will be sent to the `Redeemer`.
1. When max float for a `sender` is requested from the `RedeemerClient` but no local cache is available, an (unary) RPC call will be sent to the `Redeemer`.

2. A second RPC call to `MonitorMaxFloat(sender)` will open up a server-side gRPC stream to receive future update.
2. A second RPC call to `MonitorMaxFloat(sender)` will open up a server-side gRPC stream to receive future update.

_If this call fails the response from step 1 is returned, but not kept in cache to prevent it becoming stale due to not being able to receive further updates_

3. The `Redeemer` goroutine started by the RPC call in step 2 will start a subscription to listen for max float changes from the `LocalSenderMonitor` for the specified `sender` using `LocalSenderMonitor.SubscribeMaxFloatChange(sender)`.

_Each open server-side stream will have its own subscription that will be closed when the client closes the stream. This means that each client will have a subscription for each sender it is interested in._
_Each open server-side stream will have its own subscription that will be closed when the client closes the stream. This means that each client will have a subscription for each sender it is interested in._

4. Once the subscription from step 3 emits an event that indicates a state change for the specified `sender`, the `Redeemer` will invoke `LocalSenderMonitor.MaxFloat(sender)` to fetch the latest value.
4. Once the subscription from step 3 emits an event that indicates a state change for the specified `sender`, the `Redeemer` will invoke `LocalSenderMonitor.MaxFloat(sender)` to fetch the latest value.

5. Upon retrieving the latest max float value for `sender` it will be sent over the server-side gRPC stream.

6. Upon receiving a `MaxFloatUpdate` over the server-side gRPC stream for `sender` it will update its local cache for that `sender` accordingly.
6. Upon receiving a `MaxFloatUpdate` over the server-side gRPC stream for `sender` it will update its local cache for that `sender` accordingly.

7. Subsequent calls to `RedeemerClient.MaxFloat(sender)` will return the locally cached value for `sender` as long as it remains available.
7. Subsequent calls to `RedeemerClient.MaxFloat(sender)` will return the locally cached value for `sender` as long as it remains available.

8. The local cache for `sender` will be cleaned up if is not requested for 5 minutes.
8. The local cache for `sender` will be cleaned up if is not requested for 5 minutes.

![Ticket Flow](./assets/redeemer/ticketflow.png)


## Blockchain Events

So far we've discussed `LocalSenderMonitor.addFloat()` and `LocalSenderMonitor.subFloat()` being responsible for triggering `LocalSenderMonitor.SubscribeMaxFloatChange(sender)` notifications, but these can also be triggered by certain Ethereum events related to the Livepeer protocol:
So far we've discussed `LocalSenderMonitor.addFloat()` and `LocalSenderMonitor.subFloat()` being responsible for triggering `LocalSenderMonitor.SubscribeMaxFloatChange(sender)` notifications, but these can also be triggered by certain Ethereum events related to the Livepeer protocol:

- FundReserve: When a broadcaster funds its reserve the `maxFloat` allocation increases by the added reserve divided by the active Orchestrator set size.
- NewRound: If the active Orchestrator set size changes, the `maxFloat` will become the current broadcaster's reserve divided by the new active set size. Since this event impacts all participants in the protocol the `Redeemer` will have to send updates for _every_ `sender` it is keeping track of.
- NewRound: If the active Orchestrator set size changes, the `maxFloat` will become the current broadcaster's reserve divided by the new active set size. Since this event impacts all participants in the protocol the `Redeemer` will have to send updates for _every_ `sender` it is keeping track of.

![Ethereum Events](./assets/redeemer/eth-events.png)

16 changes: 8 additions & 8 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM --platform=$BUILDPLATFORM ubuntu:18.04 as build
FROM --platform=$BUILDPLATFORM ubuntu:20.04 as build

ARG TARGETARCH
ARG BUILDARCH
Expand Down Expand Up @@ -30,7 +30,7 @@ RUN GRPC_HEALTH_PROBE_VERSION=v0.3.6 \
&& ldconfig /usr/local/lib

# note: for runtime, Tensorflow version needs to be compatible with CUDA and CuDNN of the image
RUN LIBTENSORFLOW_VERSION=2.3.4 \
RUN LIBTENSORFLOW_VERSION=2.12.1 \
&& curl -LO https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-${LIBTENSORFLOW_VERSION}.tar.gz \
&& mkdir /tf && tar -C /tf -xzf libtensorflow-gpu-linux-x86_64-${LIBTENSORFLOW_VERSION}.tar.gz

Expand All @@ -46,32 +46,32 @@ RUN mkdir -p /go \
COPY ./install_ffmpeg.sh ./install_ffmpeg.sh

ARG BUILD_TAGS
ENV BUILD_TAGS=${BUILD_TAGS}
ENV BUILD_TAGS=${BUILD_TAGS}

COPY go.mod go.sum ./
RUN go mod download

RUN ./install_ffmpeg.sh \
RUN ./install_ffmpeg.sh \
&& GO111MODULE=on go get -v github.com/golangci/golangci-lint/cmd/[email protected] \
&& go get -v github.com/jstemmer/go-junit-report

COPY . .

RUN make livepeer livepeer_cli livepeer_bench livepeer_router

# cuda 10.2 image is 1.4 Gb smaller, which is critical for our root partition size
FROM --platform=$TARGETPLATFORM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 AS livepeer-amd64-base
FROM --platform=$TARGETPLATFORM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04 AS livepeer-amd64-base

FROM --platform=$TARGETPLATFORM nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04 AS livepeer-arm64-base
FROM --platform=$TARGETPLATFORM nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04 AS livepeer-arm64-base

FROM livepeer-${TARGETARCH}-base
FROM livepeer-${TARGETARCH}-base

ENV NVIDIA_DRIVER_CAPABILITIES=all

COPY --from=build /build/ /usr/local/bin/
COPY --from=build /usr/bin/grpc_health_probe /usr/local/bin/grpc_health_probe
COPY --from=build /src/tasmodel.pb /tasmodel.pb
COPY --from=build /usr/share/misc/pci.ids /usr/share/misc/pci.ids

# libtensorflow.so is required at runtime, because Ffmpeg DNN filter loads it dynamically
COPY --from=build /tf/ /usr/local/

Expand Down
2 changes: 1 addition & 1 deletion install_ffmpeg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ else
echo "clang detected, building with GPU and Tensorflow support"
EXTRA_FFMPEG_FLAGS="$EXTRA_FFMPEG_FLAGS --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvenc --enable-decoder=h264_cuvid,hevc_cuvid,vp8_cuvid,vp9_cuvid --enable-filter=scale_cuda,signature_cuda,hwupload_cuda --enable-encoder=h264_nvenc,hevc_nvenc"
if [[ ! -e "${ROOT}/compiled/lib/libtensorflow_framework.so" ]]; then
LIBTENSORFLOW_VERSION=2.3.4 &&
LIBTENSORFLOW_VERSION=2.12.1 &&
curl -LO https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-${LIBTENSORFLOW_VERSION}.tar.gz &&
tar -C ${ROOT}/compiled/ -xzf libtensorflow-gpu-linux-x86_64-${LIBTENSORFLOW_VERSION}.tar.gz &&
rm libtensorflow-gpu-linux-x86_64-${LIBTENSORFLOW_VERSION}.tar.gz
Expand Down
Loading