https://rancher.com/ logo
#general
Title
# general
g

gifted-branch-26934

09/15/2022, 4:55 PM
Hello dears, i hope you can support me on this issue, I am doing a performance test to an application running in a single pod using jmeter. Increasing the number of requests per 60 seconds showed the expected result that CPU consumption gets higher and so on. Then i decided to scale up the application to 2 replicas, the expected result that the performance will be much better, but unfortunately a lot of requests failed and the CPU consumption got really high! so i scaled down the app to 1 replica as it was before, and the results were good with no failed requests at all! Note: pods are running on different node with same number of cores and same processor and ram...
m

melodic-umbrella-19641

09/15/2022, 5:00 PM
This sounds like a more generic K8s problem rather than a Rancher one, and it might depend on how your cluster is routing traffic. Is the application querying a backend that might be a bottleneck? (Like a database). Are you seeing network errors between the 2 nodes?
g

gifted-branch-26934

09/15/2022, 5:03 PM
no at all, it's just a simple api service that has a single route. how i can check for network errors between the 2 nodes?
anyways the cpu consumption on both pods was high, that means that the traffic is distributed to both pods
m

melodic-umbrella-19641

09/15/2022, 5:07 PM
Any idea where the requests are failing? Inside your application but only on 1 node? or on both nodes?
g

gifted-branch-26934

09/15/2022, 5:10 PM
@melodic-umbrella-19641 i can't tell tbh but both pods had high CPU consumption and then the requests started to fail. like am sending 60 requests, the first 30 succeeded but the last 30 failed, so i think both pods are failing
m

melodic-umbrella-19641

09/15/2022, 5:13 PM
My next step would probably be to enable logging in your application and compare on both pods to confirm that the failures are on both. I'm no expert, but that's where I would go next
g

gifted-branch-26934

09/15/2022, 5:24 PM
thanks for your advice, i did that and am seeing logs in 1 pod only, nothing in the 2nd one!
m

melodic-umbrella-19641

09/15/2022, 5:28 PM
Interesting! If both pods restarted and picked up the log setting, then maybe you're not getting traffic routing to both pods. If your test sends lots of requests in the same HTTP connection, then k8s won't know how to loadbalance the traffic to both pods and it'll all go to a single pod (but that's dependent on how your application works). I'd be surprised if you weren't routing traffic to both nodes, since you said the CPU went up on both nodes.
g

gifted-branch-26934

09/15/2022, 5:33 PM
but i set the ramp-up on jmeter to 60 seconds which means the traffic will be sent distributed in the 60 seconds no?
m

melodic-umbrella-19641

09/15/2022, 5:35 PM
Yes probably. Sorry I don't know about jmeter. If it makes multiple connections, then it should be distributed, but if it puts all its requests through 1 connection, then that wont be loadbalanced between both pods
g

gifted-branch-26934

09/15/2022, 5:53 PM
but even though, let's assume that it's sending to a single pod with 1 connection, why almost 50% of the requests are failing? when having a single replica only, all the requests are successful
m

melodic-umbrella-19641

09/15/2022, 6:02 PM
Good point. That shouldn't happen. Unless k8s is routing requests and dropping the ones to your 2nd node
g

gifted-branch-26934

09/15/2022, 6:04 PM
That's probably what you said. do you think it's related to the application itself? like should the app support load balancing or something? it's just a simple flask app with a single route
m

melodic-umbrella-19641

09/15/2022, 8:35 PM
If it's flask and it's just returning something basic in an HTTP query, then I see no reason why it'd need to know about load-balancing. The deployment examples on the Kubernetes website seem to do something similar and they work just fine. Maybe try deploying one of those helloworld apps to your cluster and see if you get responses from both pods? If that doesn't work, then it's probably some kind of issue with your cluster.