Hi all,
First of all: a bigh thanks to Felix for this great plugin! The "magic" to have all netbox devices automatically in prometheus is nothing I want to miss.
I updated my personal dockerized netbox at home (~20 devices) to NetBox 4.6.2.
This is with plugin version 1.3.0 according to NB interface.
Prometheus started having issues to update targets after some 10-15minutes (doing refreshes all 300s).
I went looking for issues, but didn't find them. Then discovered the great PR here: #255
which I manually overlayed to the docker image and am currently running. Now prometheus can discover targets again, but
my netbox server continues to look like this:
load average: 4.12 4.10 4.02
with processes like these
70 3282717 55.2 2.4 486244 198756 ? Rs 13:59 2:20 postgres: netbox netbox 172.18.0.5(55726) SELECT
70 3290348 63.8 2.0 455484 169620 ? Rs 14:02 1:06 postgres: netbox netbox 172.18.0.5(34420) SELECT
popping up and eating up all the cpu.
netbox | 500119 | active | SELECT DISTINCT "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated",
"dcim_device"."custom_field_data", "dcim_device"."owner_id", "dcim_device"."description", "dcim_device"."comments",
"dcim_device"."local_context_data", "dcim_device"."config_template_id", "dcim_device"."device_type_id", "dcim_device"."role_id",
"dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."serial", "dcim_device"."asset_tag",
"dcim_device"."site_id", "dcim_device"."location_id", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face",
"dcim_device"."status", "dcim_device"."airflow", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id",
"dcim_device"."oob_ip_id", "dcim_device"."cluster_id", "dcim_device"."virtual_chassis_id", "dcim_device"."vc_position",
"dcim_device"."vc_priority", "dcim_device"."latitude", "dcim_device"."longitude", "dcim_device"."console_port_count",
"dcim_device"."console_server_port_count", "dcim_device"."power_port_count", "dc | 00:00:04.634362
netbox | 500082 | active | SELECT DISTINCT "dcim_device"."id", "dcim_device"."created", "dcim_device"."last_updated",
"dcim_device"."custom_field_data", "dcim_device"."owner_id", "dcim_device"."description", "dcim_device"."comments",
"dcim_device"."local_context_data", "dcim_device"."config_template_id", "dcim_device"."device_type_id", "dcim_device"."role_id",
"dcim_device"."tenant_id", "dcim_device"."platform_id", "dcim_device"."name", "dcim_device"."serial", "dcim_device"."asset_tag",
"dcim_device"."site_id", "dcim_device"."location_id", "dcim_device"."rack_id", "dcim_device"."position", "dcim_device"."face",
"dcim_device"."status", "dcim_device"."airflow", "dcim_device"."primary_ip4_id", "dcim_device"."primary_ip6_id",
"dcim_device"."oob_ip_id", "dcim_device"."cluster_id", "dcim_device"."virtual_chassis_id", "dcim_device"."vc_position",
"dcim_device"."vc_priority", "dcim_device"."latitude", "dcim_device"."longitude", "dcim_device"."console_port_count",
"dcim_device"."console_server_port_count", "dcim_device"."power_port_count", "dc | 00:00:06.117863
the queries are cut-off of course. I'm not too used to postgres, so my debugging stopped here.
I already tried disabling other plugins (bgp, qecode, topology views, reorder racked devices, documents, interface synchronization, lifecycle), but without noticeable differences. The only plugin actively used is this one (prometheus sd).
Am I the only one having this issue? I have three netbox instances running, all with similar config, but different scale. The instances I'm using prometheus SD with, all have this issue since the upgrade to 4.6(.2).
Any hints or ideas, what might be my problem or how I can help debug some more?
Thanks for reading, any help will be appreciated and please keep maintaining this great plugin 💯
irrwitzer
Hi all,
First of all: a bigh thanks to Felix for this great plugin! The "magic" to have all netbox devices automatically in prometheus is nothing I want to miss.
I updated my personal dockerized netbox at home (~20 devices) to NetBox 4.6.2.
This is with plugin version 1.3.0 according to NB interface.
Prometheus started having issues to update targets after some 10-15minutes (doing refreshes all 300s).
I went looking for issues, but didn't find them. Then discovered the great PR here: #255
which I manually overlayed to the docker image and am currently running. Now prometheus can discover targets again, but
my netbox server continues to look like this:
load average: 4.12 4.10 4.02with processes like these
popping up and eating up all the cpu.
the queries are cut-off of course. I'm not too used to postgres, so my debugging stopped here.
I already tried disabling other plugins (bgp, qecode, topology views, reorder racked devices, documents, interface synchronization, lifecycle), but without noticeable differences. The only plugin actively used is this one (prometheus sd).
Am I the only one having this issue? I have three netbox instances running, all with similar config, but different scale. The instances I'm using prometheus SD with, all have this issue since the upgrade to 4.6(.2).
Any hints or ideas, what might be my problem or how I can help debug some more?
Thanks for reading, any help will be appreciated and please keep maintaining this great plugin 💯
irrwitzer