docs: AWS getting started re-write #9095

rothgar · 2024-08-01T20:50:15Z

Updated with mulit-AZ subnets for control plane group and better copy/paste ability

smira · 2024-08-02T11:29:51Z

website/content/v1.7/talos-guides/install/cloud-platforms/aws.md

+
+Change to your desired region and CIDR block and create a VPC:
+
+> Make sure your subnet does not overlap with `10.244.0.0/16` or `10.96.0.0/12` the [default pod and services subnets in Kubernetes]({{% ref "/v1.7/introduction/troubleshooting.md#conflict-on-kubernetes-and-host-subnets" %}}).


we should use relref, not ref, and without the version, to keep docs same across versions

smira · 2024-08-02T11:29:55Z

website/content/v1.7/talos-guides/install/cloud-platforms/aws.md

@@ -7,52 +7,102 @@ aliases:

 ## Creating a Cluster via the AWS CLI

-In this guide we will create an HA Kubernetes cluster with 3 worker nodes.
-We assume an existing VPC, and some familiarity with AWS.
+In this guide we will create an HA Kubernetes cluster with 3 control plane nodes across 3 availability zones.


I was always confused by spreading the etcd across AZes:

increased latency

a chance of network partitioning

As long as you don't have 3x workers spread across AZes same way, what kind of benefit does it give to the availability of the cluster?

AZ spread is recommended for all AWS HA architecture. Single AZs have problems more often than people realize. All AZs are supposed to be in "single digit" ms latency between locations (Azure and GCP don't guarantee that). I know this is a getting started guide, but we should still recommend some HA configurations for the components that matter.

My point is a bit different, imagine I have 3 CPs + 9 workers. CPs are spread across AZs.

Now AZ0 goes down, which means that two other CPs have quorum, that's great.

But what about my workers? Are they spread across AZs same way? If not, what's the point of having a controlplane when workers can't reach it? If they are spread, imagine 3 workers per AZ, then I'm down to 6 "working" workers?

So it's not that I'm trying to say that spreading CPs across AZs is a bad idea, but I want to make sure that users understand what gets actually protected and what does not by spreading CPs across AZs.

This version of the doc didn't have the autoscaling group. I just updated it with the correct version. Workers are created in an ASG and if an AZ goes away the ASG will re-balance nodes or create new nodes in AZs that are available.

smira · 2024-08-02T11:31:18Z

website/content/v1.7/talos-guides/install/cloud-platforms/aws.md

-    --cidr 0.0.0.0/0
+    --group-id $SECURITY_GROUP_ID \
+    --ip-permissions \
+        IpProtocol=tcp,FromPort=50000,ToPort=50001,IpRanges="[{CidrIp=0.0.0.0/0}]" \


I'm confused with this rule. We should expose 50000 on the controlplanes to the world, but we should never expose 50001 to the world.

I realized 6443 should only be exposed on the VPC too. I'll update it.

FWIW the existing guide exposes 6443, 50000, and 50001 to 0.0.0.0/0

50001 should be never exposed outside of the cluster.

50000 - we recommend only controlplanes
6443 - same only controlplanes (unless you use AWS LB), in that case you don't need to expose it?

https://www.talos.dev/v1.7/learn-more/talos-network-connectivity/

smira · 2024-08-02T11:31:47Z

website/content/v1.7/talos-guides/install/cloud-platforms/aws.md

 ```

 ### Create the Machine Configuration Files

-Using the DNS name of the loadbalancer created earlier, generate the base configuration files for the Talos machines.
-> Note that the `port` used here is the externally accessible port configured on the load balancer - 443 - not the internal port of 6443:
+We will create a [machine config patch]({{% ref "/v1.7/talos-guides/configuration/patching.md#rfc6902-json-patches" %}}) to use the AWS time servers.


same about relref

elreydetoda

Thanks for linking me to these updated instructions @rothgar, they were awesome!
When walking through them on the AWS cloudshell I ran into a few issues, and I've commented on the PR to let you know what I ran into. Hope this helps 😁
(P.S. I only looked at the v1.7 docs)

One of the other things to consider is possible mentioning or integrating (I haven't tested it yet) is the AWS Cloud Controller Manager (CCM) that the talos blog re-posted here: https://www.siderolabs.com/blog/deploying-talos-on-aws-with-cdk/

elreydetoda · 2024-08-08T07:45:39Z

website/content/v1.7/talos-guides/install/cloud-platforms/aws.md

+Next create a subnet in each availability zones.
+
+```bash
+CIDR=1


I'm guessing you did the AWS cli commands in a zsh shell, but since zsh starts it's indexing at 1 this causes a bash shell (i.e. default in AWS cloud shell) to skip the first IPV4_CIDRS array object (i.e. index 0 in bash).

So, I don't know how you'd like to handle that, but just wanted to give you a heads up about that nuance.

Thanks for calling that out. I ran through an early version of the doc in bash and didn't test it with the final version. I'll figure something out.

elreydetoda · 2024-08-08T07:49:56Z

website/content/v1.7/talos-guides/install/cloud-platforms/aws.md

-talosctl --talosconfig talosconfig config endpoint <control plane 1 PUBLIC IP>
-talosctl --talosconfig talosconfig config node <control plane 1 PUBLIC IP>
+talosctl config endpoints $(aws ec2 describe-instances \
+    --instance-ids $(echo $CP_INSTANCES[@]) \


The main suggestion here is ensuring that you wrap your array accesses with {} (it caused it to completely fail for me without that), but I wanted to also mention that you can use [*] as well & that should output the array as a space delimited string.

Suggested change

--instance-ids $(echo $CP_INSTANCES[@]) \

--instance-ids ${CP_INSTANCES[*]} \

elreydetoda · 2024-08-08T16:43:49Z

website/content/v1.7/talos-guides/install/cloud-platforms/aws.md

-    --region $REGION \
-    --vpc-id $VPC \
-    --cidr-block ${CIDR_BLOCK}
+IPV4_CIDRS=( $(ipcalc -S 22 --no-decorate $IPV4_CIDR | head -n 3) )


the ipcalc didn't work for me at all on linux (AWS cloudshell), but I just did a space separated list of IP addresses (i.e. '10.1.0.0/24' '10.1.1.0/24' '10.1.2.0/24' ). I didn't dig into it much though, so it might just require another package.

IDK what kind of CIDRs this generates, but you could offer something like this as a substitution:

IPV4_CIDRS=( $(printf '10.1.0.0/24\n10.1.1.0/24\n10.1.2.0/24\n') )

I think this is a bug with ipcalc packaged with brew. I'll change it to a manual step and make a note that people need to adjust it for their CIDRs

elreydetoda · 2024-08-08T16:47:50Z

website/content/v1.7/talos-guides/install/cloud-platforms/aws.md

+    --output text)
+
+talosctl config nodes $(aws ec2 describe-instances \
+    --instance-ids $(echo $CP_INSTANCES[1]) \


I didn't actually execute anything past here, since I knew how to grab this content, but again with the {} for your arrays:

Suggested change

--instance-ids $(echo $CP_INSTANCES[1]) \

--instance-ids $(echo ${CP_INSTANCES[1]}) \

rothgar · 2024-08-09T17:17:31Z

/m

Updated with autoscaling group for workers, better copy/paste ability, and not using default VPC Signed-off-by: Justin Garrison <[email protected]>

rothgar · 2024-08-09T19:15:06Z

/m

talos-bot added the status/ok-to-test label Aug 1, 2024

smira reviewed Aug 2, 2024

View reviewed changes

rothgar force-pushed the aws-docs branch 4 times, most recently from 6dcbf14 to bf1a87e Compare August 7, 2024 20:20

elreydetoda reviewed Aug 8, 2024

View reviewed changes

smira approved these changes Aug 8, 2024

View reviewed changes

rothgar force-pushed the aws-docs branch 2 times, most recently from 4587b4d to c05fad3 Compare August 9, 2024 00:31

talos-bot added the status/ok-to-merge label Aug 9, 2024

docs: aws getting started re-write

0698a49

Updated with autoscaling group for workers, better copy/paste ability, and not using default VPC Signed-off-by: Justin Garrison <[email protected]>

rothgar force-pushed the aws-docs branch from c05fad3 to 0698a49 Compare August 9, 2024 17:22

talos-bot merged commit 0698a49 into siderolabs:main Aug 9, 2024
50 checks passed

talos-bot removed status/ok-to-test status/ok-to-merge labels Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: AWS getting started re-write #9095

docs: AWS getting started re-write #9095

rothgar commented Aug 1, 2024

smira Aug 2, 2024

smira Aug 2, 2024

rothgar Aug 2, 2024

smira Aug 5, 2024

smira Aug 5, 2024

rothgar Aug 5, 2024

smira Aug 2, 2024

rothgar Aug 2, 2024

rothgar Aug 5, 2024

smira Aug 5, 2024

smira Aug 5, 2024

smira Aug 2, 2024

elreydetoda left a comment

elreydetoda Aug 8, 2024

rothgar Aug 8, 2024

elreydetoda Aug 8, 2024

elreydetoda Aug 8, 2024

rothgar Aug 8, 2024

elreydetoda Aug 8, 2024

rothgar commented Aug 9, 2024

rothgar commented Aug 9, 2024


		Change to your desired region and CIDR block and create a VPC:

		> Make sure your subnet does not overlap with `10.244.0.0/16` or `10.96.0.0/12` the [default pod and services subnets in Kubernetes]({{% ref "/v1.7/introduction/troubleshooting.md#conflict-on-kubernetes-and-host-subnets" %}}).

	--instance-ids $(echo $CP_INSTANCES[@]) \
	--instance-ids ${CP_INSTANCES[*]} \

	--instance-ids $(echo $CP_INSTANCES[1]) \
	--instance-ids $(echo ${CP_INSTANCES[1]}) \

docs: AWS getting started re-write #9095

docs: AWS getting started re-write #9095

Conversation

rothgar commented Aug 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elreydetoda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rothgar commented Aug 9, 2024

rothgar commented Aug 9, 2024