This message was deleted.
# opni
a
This message was deleted.
e
checking the pods status, we see this
Copy code
ubuntu@opni-2:~$ kubectl -n opni-agent get pods
NAME                                                   READY   STATUS    RESTARTS        AGE
opni-agent-prometheus-node-exporter-qtmnb              1/1     Running   0               15m
opni-agent-kube-prometheus-operator-6fb47b9846-n6964   1/1     Running   0               15m
opni-agent-kube-state-metrics-ddf5c9f74-9lljw          1/1     Running   0               15m
opni-agent-8496894d46-7qj76                            2/3     Running   7 (3m53s ago)   15m
and the agent’s logs
Copy code
ubuntu@opni-2:~$ kubectl -n opni-agent logs opni-agent-8496894d46-7qj76
Defaulted container "agent" out of: agent, client, kube-rbac-proxy
2023-09-21T14:35:20Z INFO commands/agent_v2.go:68 using config file {"path": "/etc/opni/config.yaml"}
2023-09-21T14:35:20Z INFO commands/agent_v2.go:156 loading bootstrap tokens from config file
2023-09-21T14:35:20Z DEBUG agent v2/agent.go:130 using log level: debug
2023-09-21T14:35:20Z INFO agent v2/agent.go:234 loaded existing keyring
2023-09-21T14:35:20Z WARN agent v2/agent.go:290 error loading ephemeral keys {"error": "open /run/opni-agent/keyring: no such file or directory"}
2023-09-21T14:35:20Z DEBUG agent.agent-updater update/syncer.go:35 sending manifest sync request {"type": "agent", "entries": 2}
2023-09-21T14:35:20Z INFO agent.agent-updater update/syncer.go:43 received sync response {"type": "agent"}
2023-09-21T14:35:20Z INFO agent.agent-upgrader client/client.go:148 update not required {"urn": "urn:opni:agent:kubernetes:agent"}
2023-09-21T14:35:20Z INFO agent.agent-upgrader client/client.go:148 update not required {"urn": "urn:opni:agent:kubernetes:controller"}
2023-09-21T14:35:20Z INFO agent.agent-updater update/syncer.go:50 manifest sync complete {"type": "agent", "entries": 2}
2023-09-21T14:35:20Z DEBUG agent.plugin-upgrader patch/manifest.go:98 found 0 plugins
2023-09-21T14:35:20Z DEBUG agent.plugin-upgrader patch/manifest.go:155 loaded plugin manifest {"plugins": 0}
2023-09-21T14:35:20Z DEBUG agent.plugin-updater update/syncer.go:35 sending manifest sync request {"type": "plugin", "entries": 1}
any help is much appreciated!
b
Out of curiosity, how long has the agent been trying to bootstrap?
e
ok, just now the status changed to CrashloopBackOff - age is around 20 m
Copy code
ubuntu@opni-2:~$ kubectl -n opni-agent get pods
NAME                                                   READY   STATUS             RESTARTS        AGE
opni-agent-prometheus-node-exporter-qtmnb              1/1     Running            0               20m
opni-agent-kube-prometheus-operator-6fb47b9846-n6964   1/1     Running            0               20m
opni-agent-kube-state-metrics-ddf5c9f74-9lljw          1/1     Running            0               20m
opni-agent-8496894d46-7qj76                            2/3     CrashLoopBackOff   7 (4m28s ago)   20m
the new logs
Copy code
ubuntu@opni-2:~$ kubectl -n opni-agent logs opni-agent-8496894d46-7qj76
Defaulted container "agent" out of: agent, client, kube-rbac-proxy
2023-09-21T14:42:12Z INFO commands/agent_v2.go:68 using config file {"path": "/etc/opni/config.yaml"}
2023-09-21T14:42:12Z INFO commands/agent_v2.go:156 loading bootstrap tokens from config file
2023-09-21T14:42:12Z DEBUG agent v2/agent.go:130 using log level: debug
2023-09-21T14:42:12Z INFO agent v2/agent.go:234 loaded existing keyring
2023-09-21T14:42:12Z WARN agent v2/agent.go:290 error loading ephemeral keys {"error": "open /run/opni-agent/keyring: no such file or directory"}
2023-09-21T14:42:12Z DEBUG agent.agent-updater update/syncer.go:35 sending manifest sync request {"type": "agent", "entries": 2}
2023-09-21T14:42:12Z INFO agent.agent-updater update/syncer.go:43 received sync response {"type": "agent"}
2023-09-21T14:42:12Z INFO agent.agent-upgrader client/client.go:148 update not required {"urn": "urn:opni:agent:kubernetes:agent"}
2023-09-21T14:42:12Z INFO agent.agent-upgrader client/client.go:148 update not required {"urn": "urn:opni:agent:kubernetes:controller"}
2023-09-21T14:42:12Z INFO agent.agent-updater update/syncer.go:50 manifest sync complete {"type": "agent", "entries": 2}
2023-09-21T14:42:12Z DEBUG agent.plugin-upgrader patch/manifest.go:98 found 0 plugins
2023-09-21T14:42:12Z DEBUG agent.plugin-upgrader patch/manifest.go:155 loaded plugin manifest {"plugins": 0}
2023-09-21T14:42:12Z DEBUG agent.plugin-updater update/syncer.go:35 sending manifest sync request {"type": "plugin", "entries": 1}
b
Could you check your gateway for
gateway.update-server
logs? They might hold some more clues
👍 1
it looks like the gateway might be sending empty/partial patches to the agent, could you chime in here @brief-jordan-43130 ?
e
seeing this under the
gateway.update-server
Copy code
2023-09-21T14:52:31Z INFO gateway.update-server update/server.go:58 syncing agent manifest {"strategy": "kubernetes"}
2023-09-21T14:52:31Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 2}
2023-09-21T14:52:31Z INFO gateway.update-server update/server.go:58 syncing agent manifest {"strategy": "binary"}
2023-09-21T14:52:32Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 6}
b
that looks normal, have you tried restarting the gateway pod? that fixes some classes of errors that can come up
e
let me try that
unfortunately still the same error
restarted the gateway pod
also tried re-installing the agent
b
Did you install the agent using the command-line that is generated via the Add cluster command?
e
yep
b
Can you try running
opni clusters show <id or name>
from within the gateway pod for that agent id?
e
@brief-jordan-43130
Copy code
opni-gateway-569f95fb66-45mv6:/# opni clusters show opni-agent-1
 opni-agent-1 [e143eeec-8008-4a5f-a589-0b73407115d7]
 Created  21 Sep 23 15:09 UTC (1h29m31s ago)
 Labels   <http://opni.io/agent-version|opni.io/agent-version>               v2
          <http://opni.io/name|opni.io/name>                        opni-agent-1
b
What hardware are you using for your clusters?
e
it is running on an ubuntu VM
b
cores/memory?
can you find any gateway logs matching the string "generating patch" or "patch generated"
e
this for the GW
Copy code
ubuntu@opni-1:~$ lscpu
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      40 bits physical, 48 bits virtual
CPU(s):                             16
On-line CPU(s) list:                0-15
Thread(s) per core:                 1
Core(s) per socket:                 1
Socket(s):                          16
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              85
Model name:                         Intel Xeon Processor (Skylake, IBRS)
Stepping:                           4
CPU MHz:                            2294.638
BogoMIPS:                           4589.27
Hypervisor vendor:                  KVM
Virtualization type:                full
L1d cache:                          512 KiB
L1i cache:                          512 KiB
L2 cache:                           64 MiB
L3 cache:                           256 MiB
NUMA node0 CPU(s):                  0-15
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:        KVM: Vulnerable
Vulnerability L1tf:                 Mitigation; PTE Inversion
Vulnerability Mds:                  Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:             Mitigation; IBRS
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; IBRS, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb rdtscp lm constant_tsc re
                                    p_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave av
                                    x f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ibrs ibpb fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm
                                    mpx avx512f avx512dq rdseed adx smap clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat
âś… 1
Memory
Copy code
ubuntu@opni-1:~$ lsmem
RANGE                                 SIZE  STATE REMOVABLE  BLOCK
0x0000000000000000-0x00000000bfffffff   3G online       yes   0-23
0x0000000100000000-0x000000083fffffff  29G online       yes 32-263

Memory block size:       128M
Total online memory:      32G
Total offline memory:      0B
b
ok that's fine
e
This is what I see for the logs with the word “patch”
Copy code
ubuntu@opni-1:~$ kubectl -n opni logs opni-gateway-569f95fb66-45mv6 | grep -i patch
2023-09-21T16:55:07Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 2}
2023-09-21T16:55:10Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 6}
2023-09-21T16:56:47Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 2}
2023-09-21T16:56:51Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 6}
2023-09-21T17:03:40Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 2}
2023-09-21T17:03:42Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 6}
2023-09-21T17:05:17Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 2}
2023-09-21T17:05:19Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 6}
2023-09-21T17:12:02Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 2}
2023-09-21T17:12:07Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 6}
2023-09-21T17:13:38Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 2}
2023-09-21T17:13:40Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 6}
2023-09-21T17:20:24Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 2}
2023-09-21T17:20:27Z INFO gateway.update-server update/server.go:77 computed updates {"patches": 6}
nothing for “generating patch” or “patch generated”
b
if you can, would you mind sharing the complete log files for both?
alternatively we can jump on a call and I can try to help you debug it
e
sure @brief-jordan-43130! if you can send me your email and I can set up a call for us to troubleshoot