Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.0.0: loki backend SIGSEGV if index_gateway.mode: ring #12270

Open
awoimbee opened this issue Mar 20, 2024 · 14 comments
Open

v3.0.0: loki backend SIGSEGV if index_gateway.mode: ring #12270

awoimbee opened this issue Mar 20, 2024 · 14 comments
Labels
type/bug Somehing is not working as expected

Comments

@awoimbee
Copy link

Describe the bug
Running version grafana/loki:main-0bf894b, loki-backend (replicas: 1) crashes:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x223f470]

goroutine 1 [running]:
github.com/grafana/loki/pkg/loki.(*Loki).updateConfigForShipperStore(0xc000638be0?)
	/src/loki/pkg/loki/modules.go:709 +0xb0
github.com/grafana/loki/pkg/loki.(*Loki).initBloomStore(0xc000d3c000)
	/src/loki/pkg/loki/modules.go:663 +0x68
github.com/grafana/dskit/modules.(*Manager).initModule(0xc000c86720, {0x7ffe92a04bb1, 0x7}, 0x0?, 0x42?)
	/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc000ce2990, 0x1, 0x40d39a?})
	/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/pkg/loki.(*Loki).Run(0xc000d3c000, {0x0?, {0x4?, 0x3?, 0x4751b00?}})
	/src/loki/pkg/loki/loki.go:431 +0x9d

Workaround: edit the configmap, change index_gateway.mode from ring to simple.
Note that I use tsdb, having a boltdb config or not in storage_config does not change anything.

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: Helm
@awoimbee awoimbee changed the title main branch: loki backend sigsev if index_gateway.mode: ring main branch: loki backend SIGSEGV if index_gateway.mode: ring Mar 20, 2024
@JStickler JStickler added the type/bug Somehing is not working as expected label Mar 25, 2024
@awoimbee
Copy link
Author

Closing since there have been some releases since, if it still happens I'll reopen

@Nissou31
Copy link

Happend for me today while deploying a simple scalable loki 3.0.0 only on backend pod

@awoimbee awoimbee reopened this May 2, 2024
@awoimbee awoimbee changed the title main branch: loki backend SIGSEGV if index_gateway.mode: ring v3.0.0: loki backend SIGSEGV if index_gateway.mode: ring May 2, 2024
@alexandergoncharovaspecta

The same problem only the difference i have 3 pods 2 are ok 1 - CrashLoopBack

k8 logs -n observability loki-backend-1 -c loki
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x22f02b0]

goroutine 1 [running]:
github.com/grafana/loki/v3/pkg/loki.(*Loki).updateConfigForShipperStore(0xc0006d5ea0?)
/src/loki/pkg/loki/modules.go:755 +0xb0
github.com/grafana/loki/v3/pkg/loki.(*Loki).initBloomStore(0xc000cab500)
/src/loki/pkg/loki/modules.go:715 +0x68
github.com/grafana/dskit/modules.(*Manager).initModule(0xc0004f2f90, {0x7fffb01fda84, 0x7}, 0x1?, 0xc00096e1e0?)
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc00097ca80, 0x1, 0xc0005a9b30?})
/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc000cab500, {0x0?, {0x4?, 0x3?, 0x4912940?}})
/src/loki/pkg/loki/loki.go:453 +0x9d
main.main()
/src/loki/cmd/loki/main.go:122 +0x113b

@chaudum
Copy link
Contributor

chaudum commented May 3, 2024

@alexandergoncharovaspecta Can you provide your config?

@chaudum
Copy link
Contributor

chaudum commented May 3, 2024

@alexandergoncharovaspecta Can you provide your config?

I am able to reproduce the bug on the release-3.0.x branch using

$ ./cmd/loki/loki -target=backend -index-gateway.mode=ring
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x22efff0]

goroutine 1 [running]:
github.com/grafana/loki/v3/pkg/loki.(*Loki).updateConfigForShipperStore(0xc0008b8960?)
	/home/christian/sandbox/grafana/loki/pkg/loki/modules.go:755 +0xb0
github.com/grafana/loki/v3/pkg/loki.(*Loki).initBloomStore(0xc0007c9500)
	/home/christian/sandbox/grafana/loki/pkg/loki/modules.go:715 +0x68
github.com/grafana/dskit/modules.(*Manager).initModule(0xc00063c780, {0x7fffab192a32, 0x7}, 0x1?, 0xc000eb8d20?)
	/home/christian/sandbox/grafana/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc000a0dc20, 0x1, 0xc000eb8bd0?})
	/home/christian/sandbox/grafana/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc0007c9500, {0x0?, {0x4?, 0x3?, 0x493d3e0?}})
	/home/christian/sandbox/grafana/loki/pkg/loki/loki.go:453 +0x9d
main.main()
	/home/christian/sandbox/grafana/loki/cmd/loki/main.go:122 +0x113b

chaudum added a commit that referenced this issue May 3, 2024
The bloom store initialisation updates the shipper configuration
which in turn requires the index gateway ring to be initialized in case
`-index-gateway.mode` is set to `ring`.

Therefore the `BloomStore` module needs to depend on the
`IndexGatewayRing` module.

Fixes #12270

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@alexandergoncharovaspecta

@alexandergoncharovaspecta Can you provide your config?

I am able to reproduce the bug on the release-3.0.x branch using

$ ./cmd/loki/loki -target=backend -index-gateway.mode=ring
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x22efff0]

goroutine 1 [running]:
github.com/grafana/loki/v3/pkg/loki.(*Loki).updateConfigForShipperStore(0xc0008b8960?)
	/home/christian/sandbox/grafana/loki/pkg/loki/modules.go:755 +0xb0
github.com/grafana/loki/v3/pkg/loki.(*Loki).initBloomStore(0xc0007c9500)
	/home/christian/sandbox/grafana/loki/pkg/loki/modules.go:715 +0x68
github.com/grafana/dskit/modules.(*Manager).initModule(0xc00063c780, {0x7fffab192a32, 0x7}, 0x1?, 0xc000eb8d20?)
	/home/christian/sandbox/grafana/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc000a0dc20, 0x1, 0xc000eb8bd0?})
	/home/christian/sandbox/grafana/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc0007c9500, {0x0?, {0x4?, 0x3?, 0x493d3e0?}})
	/home/christian/sandbox/grafana/loki/pkg/loki/loki.go:453 +0x9d
main.main()
	/home/christian/sandbox/grafana/loki/cmd/loki/main.go:122 +0x113b

Yes

Source: loki/templates/config.yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: loki
namespace: observability
labels:
helm.sh/chart: loki-6.3.4
app.kubernetes.io/name: loki
app.kubernetes.io/instance: loki
app.kubernetes.io/version: "3.0.0"
app.kubernetes.io/managed-by: Helm
data:
config.yaml: |

auth_enabled: false
chunk_store_config:
  chunk_cache_config:
    background:
      writeback_buffer: 500000
      writeback_goroutines: 1
      writeback_size_limit: 500MB
    default_validity: 0s
    memcached:
      batch_size: 4
      parallelism: 5
    memcached_client:
      addresses: dnssrvnoa+_memcached-client._tcp.loki-chunks-cache.observability.svc
      consistent_hash: true
      max_idle_conns: 72
      timeout: 2000ms
common:
  compactor_address: 'http://loki-backend:3100'
  path_prefix: /var/loki
  replication_factor: 3
  storage:
    azure:
      account_key: ${LOKI_AZURE_ACCOUNT_KEY}
      account_name: ${LOKI_AZURE_ACCOUNT_NAME}
      container_name: chunks
      use_federated_token: false
      use_managed_identity: false
frontend:
  scheduler_address: ""
  tail_proxy_url: http://loki-querier.observability.svc.cluster.local:3100
frontend_worker:
  scheduler_address: ""
index_gateway:
  mode: ring
limits_config:
  allow_structured_metadata: false
  max_cache_freshness_per_query: 10m
  max_query_parallelism: 32
  max_query_series: 100000
  query_timeout: 300s
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  retention_period: 720h
  split_queries_by_interval: 15m
  tsdb_max_query_parallelism: 512
  volume_enabled: true
memberlist:
  join_members:
  - loki-memberlist
pattern_ingester:
  enabled: false
querier:
  max_concurrent: 16
query_range:
  align_queries_with_step: true
  cache_results: true
  results_cache:
    cache:
      background:
        writeback_buffer: 500000
        writeback_goroutines: 1
        writeback_size_limit: 500MB
      default_validity: 12h
      memcached_client:
        addresses: dnssrvnoa+_memcached-client._tcp.loki-results-cache.observability.svc
        consistent_hash: true
        timeout: 500ms
        update_interval: 1m
query_scheduler:
  max_outstanding_requests_per_tenant: 32768
ruler:
  storage:
    azure:
      account_key: ${LOKI_AZURE_ACCOUNT_KEY}
      account_name: ${LOKI_AZURE_ACCOUNT_NAME}
      container_name: ruler
      use_federated_token: false
      use_managed_identity: false
    type: azure
runtime_config:
  file: /etc/loki/runtime-config/runtime-config.yaml
schema_config:
  configs:
  - from: "2024-02-29"
    index:
      period: 24h
      prefix: loki_index_
    object_store: azure
    schema: v13
    store: tsdb
server:
  grpc_listen_port: 9095
  http_listen_port: 3100
  http_server_read_timeout: 600s
  http_server_write_timeout: 600s
storage_config:
  boltdb_shipper:
    index_gateway_client:
      server_address: dns+loki-backend-headless.observability.svc.cluster.local:9095
  hedging:
    at: 250ms
    max_per_second: 20
    up_to: 3
  tsdb_shipper:
    index_gateway_client:
      server_address: dns+loki-backend-headless.observability.svc.cluster.local:9095
tracing:
  enabled: false

chaudum added a commit that referenced this issue May 6, 2024
The bloom store initialisation updates the shipper configuration which in turn requires the index gateway ring to be initialised in case `-index-gateway.mode` is set to `ring`.

Therefore the `BloomStore` module needs to depend on the `IndexGatewayRing` module.

Fixes #12270

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@sslny57
Copy link

sslny57 commented May 23, 2024

I am experiencing the same

kubectl logs loki-backend-1 -c loki

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x288 pc=0x22f02b0]

goroutine 1 [running]:
github.com/grafana/loki/v3/pkg/loki.(*Loki).updateConfigForShipperStore(0xc0009e0f00?)
        /src/loki/pkg/loki/modules.go:755 +0xb0
github.com/grafana/loki/v3/pkg/loki.(*Loki).initBloomStore(0xc00178c000)
        /src/loki/pkg/loki/modules.go:715 +0x68
github.com/grafana/dskit/modules.(*Manager).initModule(0xc000010ea0, {0x7ffde2dd827d, 0x7}, 0x1?, 0xc0017800c0?)
        /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:136 +0x1f7
github.com/grafana/dskit/modules.(*Manager).InitModuleServices(0x0?, {0xc000b8bef0, 0x1, 0xc000b3fa40?})
        /src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108 +0xd8
github.com/grafana/loki/v3/pkg/loki.(*Loki).Run(0xc00178c000, {0x0?, {0x4?, 0x3?, 0x4912940?}})
        /src/loki/pkg/loki/loki.go:453 +0x9d
main.main()
        /src/loki/cmd/loki/main.go:122 +0x113b

@sslny57
Copy link

sslny57 commented May 23, 2024

What is the fix for this issue?

@sslny57
Copy link

sslny57 commented May 23, 2024

I see index_gateway.mode from ring to simple. was the fix
but now I am stuck with some other error in gateway pod #12912

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  7m18s                  default-scheduler  Successfully assigned vector/my-loki-gateway-66f8b59d65-jx7lw to ip-10-0-3-21.eu-west-2.compute.internal
  Normal   Pulled     7m18s                  kubelet            Container image "docker.io/nginxinc/nginx-unprivileged:1.24-alpine" already present on machine
  Normal   Created    7m18s                  kubelet            Created container nginx
  Normal   Started    7m18s                  kubelet            Started container nginx
  Warning  Unhealthy  2m8s (x33 over 6m58s)  kubelet            Readiness probe errored: strconv.Atoi: parsing "http": invalid syntax

@sslny57
Copy link

sslny57 commented May 23, 2024

fixed this making change to helm

https://github.com/grafana/loki/blob/main/production/helm/loki/values.yaml#L337-L345

  readinessProbe:
    httpGet:
      path: /
      port: http-metrics
    initialDelaySeconds: 15
    timeoutSeconds: 1``

@sslny57
Copy link

sslny57 commented May 24, 2024

pod is coming up but loki is not working as expected

A Status: 500. Message: Get "http://loki-gateway.vector.svc.cluster.local/loki/api/v1/query_range?direction=backward&end=1716517388130000000&query=sum+by%28MAC%29+%28count_over_time%28%7BSTATUS%3D%22errObj.error.status%22%7D%5B15s%5D%29%29&start=1716495780000000000&step=15000ms": dial tcp: lookup loki-gateway.vector.svc.cluster.local: no such host
in 5.47.2 this used to work:

  readinessProbe:
    httpGet:
      path: /
      port: http
    initialDelaySeconds: 15
    timeoutSeconds: 1

when used I am getting same error:

   TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  5m7s                  default-scheduler  Successfully assigned vector/my-loki-gateway-66f8b59d65-drkgc to ip-10-0-1-223.eu-west-2.compute.internal
  Normal   Pulled     5m7s                  kubelet            Container image "docker.io/nginxinc/nginx-unprivileged:1.24-alpine" already present on machine
  Normal   Created    5m7s                  kubelet            Created container nginx
  Normal   Started    5m7s                  kubelet            Started container nginx
  Warning  Unhealthy  97s (x22 over 4m47s)  kubelet            Readiness probe errored: strconv.Atoi: parsing "http": invalid syntax

@sslny57
Copy link

sslny57 commented May 24, 2024

i had to use the service IP in vector endpoint along with previous fix

  readinessProbe:
    httpGet:
      path: /
      port: http-metrics
    initialDelaySeconds: 15
    timeoutSeconds: 1``
sinks:
    loki:
      type: "loki"
      inputs:
      - "lambda_source"
      # endpoint: "http://loki-gateway.vector.svc.cluster.local"
      endpoint: "http://10.160.197.234"
      path: "/loki/api/v1/push"
      encoding:
        codec: "json"
      tenant_id: "lokiprod"
      healthcheck:
        enabled: true
      labels:

now its working as expected

@abh
Copy link

abh commented Jun 6, 2024

I ran into this crash too upgrading from v2.9.x to v3.0.0. Changing the mode from ring to simple fixed this crash (but still working through other problems).

@acar-ctpe
Copy link

I'm hitting the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Somehing is not working as expected
Projects
None yet
Development

No branches or pull requests

8 participants