Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgresql-k8s-endpoints service lost #392

Open
AmberCharitos opened this issue Feb 13, 2024 · 2 comments
Open

postgresql-k8s-endpoints service lost #392

AmberCharitos opened this issue Feb 13, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@AmberCharitos
Copy link

Steps to reproduce

juju deploy postgresql-k8s --trust
juju scale-application postgresql-k8s 3
kubectl delete svc postgresql-k8s-endpoints

Expected behavior

The service is recreated and the units are active

Actual behavior

We see the following error in postgresql container:

psql: error: connection to server at "<ip>", port 5432 failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?

The units are not able to reach an active status and remain in waiting.

Versions

Operating system: 22.04

Juju CLI: 2.9.46-ubuntu-amd64

Juju agent: 3.1.6

Charm revision: 14/edge 198

Log output

unit-postgresql-k8s-0: 23:11:48 INFO juju.worker.uniter awaiting error resolution for "update-status" hook
unit-postgresql-k8s-1: 23:11:48 DEBUG unit.postgresql-k8s/1.juju-log Starting new HTTP connection (1): postgresql-k8s-1.postgresql-k8s-endpoints:8008
unit-postgresql-k8s-1: 23:11:50 DEBUG unit.postgresql-k8s/1.juju-log Starting new HTTP connection (1): postgresql-k8s-1.postgresql-k8s-endpoints:8008
unit-postgresql-k8s-1: 23:11:50 ERROR unit.postgresql-k8s/1.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/connection.py", line 203, in _new_conn
    sock = connection.create_connection(
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/connectionpool.py", line 791, in urlopen
    response = self._make_request(
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/connectionpool.py", line 497, in _make_request
    conn.request(
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/connection.py", line 395, in request
    self.endheaders()
  File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/connection.py", line 243, in connect
    self.sock = self._new_conn()
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/connection.py", line 210, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPConnection object at 0x7f971738e2c0>: Failed to resolve 'postgresql-k8s-1.postgresql-k8s-endpoints' ([Errno -2] Name or service not known)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/connectionpool.py", line 845, in urlopen
    retries = retries.increment(
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='postgresql-k8s-1.postgresql-k8s-endpoints', port=8008): Max retries exceeded with url: /cluster (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f971738e2c0>: Failed to resolve 'postgresql-k8s-1.postgresql-k8s-endpoints' ([Errno -2] Name or service not known)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/src/patroni.py", line 148, in cluster_members
    r = requests.get(f"{self._patroni_url}/cluster", verify=self._verify)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='postgresql-k8s-1.postgresql-k8s-endpoints', port=8008): Max retries exceeded with url: /cluster (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f971738e2c0>: Failed to resolve 'postgresql-k8s-1.postgresql-k8s-endpoints' ([Errno -2] Name or service not known)"))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/./src/charm.py", line 1574, in <module>
    main(PostgresqlOperatorCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/ops/main.py", line 434, in main
    framework.reemit()
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/ops/framework.py", line 863, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/ops/framework.py", line 942, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/./src/charm.py", line 370, in _on_peer_relation_changed
    self._add_members(event)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/./src/charm.py", line 530, in _add_members
    if self._patroni.cluster_members == self._hosts:
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/var/lib/juju/agents/unit-postgresql-k8s-1/charm/venv/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f971738edd0 state=finished raised ConnectionError>]
unit-postgresql-k8s-1: 23:11:50 ERROR juju.worker.uniter.operation hook "update-status" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

Matrix conversation

@AmberCharitos AmberCharitos added the bug Something isn't working label Feb 13, 2024
Copy link
Contributor

@taurus-forever taurus-forever added enhancement New feature or request and removed bug Something isn't working labels Mar 5, 2024
@taurus-forever
Copy link
Contributor

I have converted this from bug to enhancement. At the moment Juju is responsible for K8s resources and charm is not re-creating them after the bootstrap. I see the valid scenarios when K8s resources (services in this case) can be lost and automated recovery could help, but it should be properly planned and implemented. No quick fix expected here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants