Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: further configuration of embedded ipfs daemon #3096

Merged
merged 5 commits into from
Dec 18, 2023
Merged

Conversation

frrist
Copy link
Member

@frrist frrist commented Dec 13, 2023

Why

At present, the embedded IPFS node in bacalhau will listen on random ports. Users can configure the embedded IPFS node to listen on a "preferred" swarm address by setting the BACALHAU_PREFERRED_ADDRESS, but they are currently unable to configure which port the node listens on for swarm connections. The lack of determinism in setup makes it challenging to correctly configure firewall rules when deploying bacalhau to production-like settings, since the port the IPFS node listen on cannot be known at the time of deployment.

What

This PR allows the embedded IPFS nodes gateway, api, and swarm listening multiaddresses to be configured. Users can configure these values via the --ipfs-gateway-listen-addresses, --ipfs-api-listen-addresses, and --ipfs-swarm-listen-addresses flags on the serve command respectively. Alternatively, their correspond environment variables, or config file value may also be set.
To preserve backwards compatibility, users may continue to use the BACALHAU_PREFERRED_ADDRESS environment variable to configure the swarm address the embedded IPFS listens on, but should note this variable does not allow the port to be specified, it will still be randomly assigned.

Additionally, this PR makes changes to the behavior of bacalhau and its embedded IPFS node when the --ipfs-serve-path flag is set. The intent of the flag is to allow users to define a repo location of the embedded IPFS node to store content and its configuration. The current behavior deletes this repo when the bacalhau node shuts down. Resulting in any content stored in the repo being removed, and the identity of the embedded IPFS node being lost. The new behavior will preserve the content of the embedded IPFS nodes repo across bacalhau restarts, maintaining any data the embedded IPFS node stored as well as its identity. If the --ipfs-serve-path flag is not set the behavior remains unchanged - the repo is considered ephemeral and removed when bacalhau shuts down.

Lastly, this PR adds an additional flag, --ipfs-profile, to configured the embedded IPFS nodes configuration profile. The default profiles remain flatfs when --private-internal-ipfs=false and test when --private-internal-ipfs=true.

@frrist frrist marked this pull request as ready for review December 13, 2023 21:28
@frrist frrist self-assigned this Dec 13, 2023
@frrist frrist linked an issue Dec 13, 2023 that may be closed by this pull request
@frrist
Copy link
Member Author

frrist commented Dec 13, 2023

A nice side affect of this change is that we can now interact with the embedded IPFS daemon via the ipfs binary. For example:

  1. Start Bacalhau Server: bacalhau serve --ipfs-api-listen-addresses=/ip4/127.0.0.1/tcp/6001
  2. Use IPFS binary: ipfs --api=/ip4/127.0.0.1/tcp/6001 id

@frrist frrist enabled auto-merge (squash) December 13, 2023 23:28
Copy link
Contributor

@simonwo simonwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the config will end up the same if only the existing flags are used? So fine by me.

@rossjones
Copy link
Contributor

Users can configure the embedded IPFS node to listen on a "preferred" swarm address by setting the BACALHAU_PREFERRED_ADDRESS

This is really just a work-around to how we choose addresses. Because we often choose to bind to 0.0.0.0 - this is a valid choice, but results in the service binding to all networks, whether they're 127, loop-local, private or public addresses. Then one of those addresses is chosen, and it isn't clear what the criteria for the choice is.

Ideally we should be more selective about what IP address we expect IPFS to bind to, using an internal network for private clusters, and a public one otherwise.

If we find ourselves having to set BACALHAU_PREFERRED_ADDRESS we would be better off just binding to the correct network in the first place.

@@ -67,6 +67,10 @@ var Staging = types.BacalhauConfig{
"/ip4/35.245.247.85/tcp/4001/p2p/12D3KooWEztGEJtqtzy7th2d7cTw2iR4CQCPHFUYvj66rhh9Cf7h",
"/ip4/35.245.247.85/udp/4001/quic/p2p/12D3KooWEztGEJtqtzy7th2d7cTw2iR4CQCPHFUYvj66rhh9Cf7h",
},
Profile: "flatfs",
SwarmListenAddresses: []string{"/ip4/0.0.0.0/tcp/0", "/ip6/::1/tcp/0"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two addresses are not the same. ::1 is the same address as 127.0.0.1, if we want to bind v6 to all v6 interfaces, we should use ::

@frrist frrist merged commit 3719eb2 into main Dec 18, 2023
10 checks passed
@frrist frrist deleted the frrist/ipfs-set-ports branch December 18, 2023 16:09
aronchick pushed a commit that referenced this pull request Jan 2, 2024
### Why
At present, the embedded IPFS node in bacalhau will listen on [random
ports](https://github.com/bacalhau-project/bacalhau/blob/main/pkg/ipfs/node.go#L347).
Users can configure the embedded IPFS node to listen on a "preferred"
swarm address by setting the `BACALHAU_PREFERRED_ADDRESS`, but they are
currently unable to configure which port the node listens on for swarm
connections. **The lack of determinism in setup makes it challenging to
correctly configure firewall rules when deploying bacalhau to
production-like settings, since the port the IPFS node listen on cannot
be known at the time of deployment.**

### What
This PR allows the embedded IPFS nodes gateway, api, and swarm listening
multiaddresses to be configured. Users can configure these values via
the `--ipfs-gateway-listen-addresses`, `--ipfs-api-listen-addresses`,
and `--ipfs-swarm-listen-addresses` flags on the `serve` command
respectively. Alternatively, their correspond environment variables, or
config file value may also be set.
To preserve [backwards
compatibility](https://github.com/bacalhau-project/examples/blob/main/multi-region/tf/node_files/start-bacalhau.sh#L15),
users may continue to use the `BACALHAU_PREFERRED_ADDRESS` environment
variable to configure the swarm address the embedded IPFS listens on,
but should note this variable does not allow the port to be specified,
it will still be randomly assigned.
 
Additionally, this PR makes changes to the behavior of bacalhau and its
embedded IPFS node when the `--ipfs-serve-path` flag is set. The intent
of the flag is to allow users to define a repo location of the embedded
IPFS node to store content and its configuration. The current behavior
deletes this repo when the bacalhau node shuts down. Resulting in any
content stored in the repo being removed, and the identity of the
embedded IPFS node being lost. The new behavior will preserve the
content of the embedded IPFS nodes repo across bacalhau restarts,
maintaining any data the embedded IPFS node stored as well as its
identity. If the `--ipfs-serve-path` flag is not set the behavior
remains unchanged - the repo is considered ephemeral and removed when
bacalhau shuts down.

Lastly, this PR adds an additional flag, `--ipfs-profile`, to configured
the embedded IPFS nodes configuration
[profile](https://github.com/ipfs/kubo/blob/master/docs/config.md#profiles).
The default profiles remain `flatfs` when
`--private-internal-ipfs=false` and `test` when
`--private-internal-ipfs=true`.

---------

Co-authored-by: frrist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bacalhau advertise its internal IPFS node on random ports
3 participants