Hi there,
We've been migrating our Bitbucket Runners to the AutoScaler Runners and the majority of our Pipelines run successfully. However, several times we've received this error in the pipeline steps.
Status 404: {"message":"No such container: <runnerID_stepID>_system_auth-proxy"}
The runner logs are not indicating any cause for the problem, it just seems like the container is going away.
[2024-03-26 22:12:57,079] Inspecting container (id: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_system_auth-proxy).
[2024-03-26 22:12:57,083] An error occurred whilst inspecting container (id: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_system_auth-proxy).
com.github.dockerjava.api.exception.NotFoundException: Status 404: {"message":"No such container: 1ca2f
af9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_system_auth-proxy"}
at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:97)
...
[2024-03-26 22:12:57,083] Removing container (name: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_build)
[2024-03-26 22:12:57,084] Not uploading caches. (numberOfCaches: 0, resultOrError: ERROR)
[2024-03-26 22:12:57,085] Not uploading artifacts. (numberOfArtifacts: 1, resultOrError: ERROR)
[2024-03-26 22:12:57,085] Updating step progress to PARSING_TEST_RESULTS.
[2024-03-26 22:12:57,186] Test report processing complete.
[2024-03-26 22:12:57,186] Removing container (name: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_clone)
[2024-03-26 22:12:57,253] Container removed (name: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_clone)
[2024-03-26 22:12:57,253] Removing container (name: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_clone)
[2024-03-26 22:12:57,254] Removing container (name: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_build)
[2024-03-26 22:12:57,255] Removing container (name: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_system_auth-proxy)
[2024-03-26 22:12:57,256] Removing container (name: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_pause)
[2024-03-26 22:12:57,307] Appending log line to main log.
[2024-03-26 22:13:02,225] Container removed (name: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_pause)
[2024-03-26 22:13:02,225] Updating step progress to COMPLETING_LOGS.
[2024-03-26 22:13:02,340] Shutting down log uploader.
[2024-03-26 22:13:02,340] Tearing down directories.
[2024-03-26 22:13:02,412] Cancelling timeout
[2024-03-26 22:13:02,413] Completing step with result Result{status=ERROR, error=Some(Error{key='runner.bitbucket-pipelines.build-container-failure', message='Status 404: {"message":"No such container: 1ca2faf9-cb4b-533d-8196-85c616d6c111_42a25bbe-66e1-42d7-8dcd-7eeb2b7b2da7_system_auth-proxy"}', arguments={}})}.
We are not seeing any of the runner pods crash, get evicted, or any sign of resource pressure. The worker nodes are all running 4CPU, 32GB of RAM, and 150GB of disk space.
Also, we have not found any way to reproduce this error.
Has anyone seen this problem, have any solutions, or even tips for further troubleshooting?