Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beats misreport established Kafka connection #34177

Closed
heck-gd opened this issue Jan 4, 2023 · 3 comments
Closed

Beats misreport established Kafka connection #34177

heck-gd opened this issue Jan 4, 2023 · 3 comments
Labels
bug Stalled Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@heck-gd
Copy link

heck-gd commented Jan 4, 2023

When using Kafka as output, beats will claim that the connection to the broker was established, even if it is impossible because the host in question cannot be reached.

  • Version: 8.4.3 (it's a long-standing bug that was also present in 7.x)
  • Operating System: Windows Server 2016, Windows Server 2019 (probably all OSes though)
  • Discuss Forum URL: https://discuss.elastic.co/t/bug-bug-in-logs-related-to-kafka-connection/269625 (older, went ignored, so I didn't bother with a new forum post - it's an obvious bug and I'm not the only one who noticed it)
  • Steps to Reproduce:
    1. In a VM, set up Filebeat or Winlogbeat with a Kafka output.
    2. Disconnect the VM from the virtual network ("unplug the cable") so that it definitely cannot reach any server.
    3. Start beat.
    4. Beat will lie to your face with the following log lines:
      {"log.level":"info","@timestamp":"2023-01-03T14:06:44.119+0100","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/client_worker.go","file.line":139},"message":"Connecting to kafka(thekafkahost:theport)","service.name":"winlogbeat","ecs.version":"1.6.0"} 
      {"log.level":"info","@timestamp":"2023-01-03T14:06:44.120+0100","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/client_worker.go","file.line":147},"message":"Connection to kafka(thekafkahost:theport) established","service.name":"winlogbeat","ecs.version":"1.6.0"}

In a real-world scenario, a firewall between the two hosts that accidentally blocks the traffic has the exact same effect.

Example output configuration (as used in production, but with fake values for host/port/auth):

output.kafka:
  enabled: true

  hosts: ["thekafkahost:theport"]

  topic: mytopic

  partition.round_robin:
    reachable_only: true

  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000

  ssl.enabled: true
  ssl.certificate_authorities: ["path/to/ca.pem"]
  ssl.supported_protocols: [TLSv1.3]
  ssl.verification_mode: full

  username: 'theuser'
  password: 'thepassword'
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 4, 2023
@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jan 13, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 13, 2023
@belimawr belimawr added bug needs_team Indicates that the issue/PR needs a Team:* label labels Jan 13, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 13, 2023
@belimawr
Copy link
Contributor

The real connection to Kafka is only created when the first event is sent, if the Kafka host is not reachable you will see some errors in the logs like this:

{"log.level":"error","@timestamp":"2023-01-13T16:02:45.758+0100","log.logger":"kafka","log.origin":{"file.name":"kafka/client.go","file.line":337},"message":"Kafka (topic=tiago): kafka: client has run out of available brokers to talk to (Is your cluster reachable?)","service.name":"filebeat","ecs.version":"1.6.0"}

The log message Connection to kafka(thekafkahost:theport) established means the client worker has got its connection configured properly.

The best way to test the connection to an output is to use ./filebeat test output which will correctly show if the connection is possible or not, here are two examples, one with an error and the other successful.

tiago@millennium-falcon beats/filebeat  v1.18.8 🐍 v3.9.13  [$!?] % ./filebeat test output 
Kafka: localhost:9092...
  parse host... OK
  dns lookup... OK
  addresses: ::1, 127.0.0.1
  dial up... ERROR dial tcp 127.0.0.1:9092: connect: connection refused


tiago@millennium-falcon beats/filebeat  v1.18.8 🐍 v3.9.13  [$!?] % ./filebeat test output
Kafka: localhost:9092...
  parse host... OK
  dns lookup... OK
  addresses: ::1, 127.0.0.1
  dial up... OK
tiago@millennium-falcon beats/filebeat  v1.18.8 🐍 v3.9.13  [$!?] % 

Anyway, I agree the log message is misleading.

@botelastic
Copy link

botelastic bot commented Jan 13, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jan 13, 2024
@botelastic botelastic bot closed this as completed Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Stalled Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

3 participants