Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Autoscaler controller got restarted due to a "404" issue when scaling in runners

Aaron.Luo May 15, 2022

Hi team,

Not sure if it's a bug or not, I ran into an "404: jobs.batch \"runner-9c3a***-c5d5-5686-883c-e6****\" not found" issue with runner autoscaler controller, which caused the autoscaler controller being restarted. However I was able to get the job with kubectl command. The orphaned jobs and secrets were left even though their related runner information in bitbucket cloud were removed.

Below is the error logs:

WARNING: [DevExp] Runners count 1 with the next UUID will be deleted: ['9c3a***-c5d5-5686-883c-e6****']
INFO: [Group1] Starting to delete runner 9c3a***-c5d5-5686-883c-e6**** from Bitbucket workspace: <my-workspace> ...
INFO: [Group1] Starting to delete job for runner 9c3a***-c5d5-5686-883c-e6**** from namespace <runners-namespace>
Traceback (most recent call last):
File "/home/bitbucket/autoscaler/start.py", line 170, in <module>
main()
File "/home/bitbucket/autoscaler/start.py", line 166, in main
poller.start()
File "/home/bitbucket/autoscaler/start.py", line 86, in start
fut.result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in result
return self.__get_result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/bitbucket/autoscaler/strategy/pct_runners_idle.py", line 45, in process
self.run()
File "/home/bitbucket/autoscaler/strategy/pct_runners_idle.py", line 255, in run
self.delete_runners(runners_idle_to_delete)
File "/home/bitbucket/autoscaler/strategy/pct_runners_idle.py", line 134, in delete_runners
self.kubernetes_service.delete_job(runner_uuid, self.runner_data.namespace)
File "/home/bitbucket/autoscaler/services/kubernetes.py", line 92, in delete_job
kube_python_api.delete_job(runner_uuid, namespace)
File "/home/bitbucket/autoscaler/clients/kubernetes/base.py", line 61, in delete_job
batch_v1.delete_namespaced_job(
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api/batch_v1_api.py", line 386, in delete_namespaced_job
return self.delete_namespaced_job_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api/batch_v1_api.py", line 493, in delete_namespaced_job_with_http_info
return self.api_client.call_api(
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 415, in request
return self.rest_client.DELETE(url,
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 266, in DELETE
return self.request("DELETE", url,
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 234, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': '1fcce****-0629-4ceb-8301-d4****', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '2117****-96b5-4054-b3ff-127****', 'X-Kubernetes-Pf-Prioritylevel-Uid': '475e****-028c-4bc2-b8b4-de38****', 'Date': 'Mon, 16 May 2022 05:21:45 GMT', 'Content-Length': '276'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"jobs.batch \"runner-9c3a***-c5d5-5686-883c-e6****\" not found","reason":"NotFound","details":{"name":"runner-9c3a***-c5d5-5686-883c-e6****","group":"batch","kind":"jobs"},"code":404}

Here is the result with kubectl:

>kubectl get job -n <runners-namespace>
NAME COMPLETIONS DURATION AGE
...
bitbucket-pipelines-runner-9c3a***-c5d5-5686-883c-e6**** 0/1 91m 91m
...

Could you please investigate it when you get a chance? Thanks!

Kind regards,

Aaron

2 answers

1 accepted

Suggest an answer

Log in or Sign up to answer
0 votes
Answer accepted
Aaron.Luo May 17, 2022

Finally found the cause of the issue. I setup the job name as the following by accident, while we have "runner-<%runnerUuid%>" hardcoded in the code.

name"bitbucket-pipelines-runner-<%runnerUuid%>"

Though this was my mistake, I wonder if we could avoid the hardcoded job/secret name in the code because we've already got the job template.

Igor Stoyanov
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
December 23, 2022

The job name is not hardcoded, it's automatically generated and should not me modified. Check job template.

0 votes
Igor Stoyanov
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
December 23, 2022

@Aaron.Luo hi.

Starting from version 1.8.0 we implemented cleaner, that will automatically delete orphaned jobs and secrets. More information about cleaner you could find at README Cleaner section.

If have you any questions about the cleaner implementation feel free to ask them.

 

Regards, Igor

TAGS
AUG Leaders

Atlassian Community Events