Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically getting FunctionClauseError from Mariaex.Messages.decode_msg(<<5>>, :running) #151

Open
myronmarston opened this issue Nov 16, 2016 · 15 comments

Comments

@myronmarston
Copy link

I periodically get this in my logs:

** (FunctionClauseError) no function clause matching in Mariaex.Messages.decode_msg/2
    (mariaex) lib/mariaex/messages.ex:215: Mariaex.Messages.decode_msg(<<5>>, :running)
    (mariaex) lib/mariaex/messages.ex:209: Mariaex.Messages.decode/2
    (mariaex) lib/mariaex/protocol.ex:610: Mariaex.Protocol.msg_recv/3
    (mariaex) lib/mariaex/protocol.ex:217: Mariaex.Protocol.disconnect/2
    (db_connection) lib/db_connection/connection.ex:178: DBConnection.Connection.disconnect/2
    (connection) lib/connection.ex:767: Connection.disconnect/3
    (stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:681: :gen_server.handle_msg/5

=CRASH REPORT==== 16-Nov-2016::21:51:15 ===
  crasher:
    initial call: Elixir.DBConnection.Connection:init/1
    pid: <0.1844.0>
    registered_name: []
    exception exit: {function_clause,
                        [{'Elixir.Mariaex.Messages',decode_msg,
                             [<<5>>,running],
                             [{file,"lib/mariaex/messages.ex"},{line,215}]},
                         {'Elixir.Mariaex.Messages',decode,2,
                             [{file,"lib/mariaex/messages.ex"},{line,209}]},
                         {'Elixir.Mariaex.Protocol',msg_recv,3,
                             [{file,"lib/mariaex/protocol.ex"},{line,610}]},
                         {'Elixir.Mariaex.Protocol',disconnect,2,
                             [{file,"lib/mariaex/protocol.ex"},{line,217}]},
                         {'Elixir.DBConnection.Connection',disconnect,2,
                             [{file,"lib/db_connection/connection.ex"},
                              {line,178}]},
                         {'Elixir.Connection',disconnect,3,
                             [{file,"lib/connection.ex"},{line,767}]},
                         {gen_server,try_dispatch,4,
                             [{file,"gen_server.erl"},{line,615}]},
                         {gen_server,handle_msg,5,
                             [{file,"gen_server.erl"},{line,681}]}]}
      in function  gen_server:terminate/7 (gen_server.erl, line 814)
    ancestors: [<0.1841.0>,'Elixir.Delorean.Legion.Shard1Repo.Pool',
                  'Elixir.Delorean.Legion.Shard1Repo',
                  'Elixir.Delorean.Legion.Supervisor',<0.1822.0>]
    messages: []
    links: [<0.1840.0>,<0.1841.0>,#Port<0.5966>,#Port<0.1321426>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 9491627
  neighbours:

=SUPERVISOR REPORT==== 16-Nov-2016::21:51:15 ===
     Supervisor: {<0.1841.0>,poolboy_sup}
     Context:    child_terminated
     Reason:     {function_clause,
                     [{'Elixir.Mariaex.Messages',decode_msg,
                          [<<5>>,running],
                          [{file,"lib/mariaex/messages.ex"},{line,215}]},
                      {'Elixir.Mariaex.Messages',decode,2,
                          [{file,"lib/mariaex/messages.ex"},{line,209}]},
                      {'Elixir.Mariaex.Protocol',msg_recv,3,
                          [{file,"lib/mariaex/protocol.ex"},{line,610}]},
                      {'Elixir.Mariaex.Protocol',disconnect,2,
                          [{file,"lib/mariaex/protocol.ex"},{line,217}]},
                      {'Elixir.DBConnection.Connection',disconnect,2,
                          [{file,"lib/db_connection/connection.ex"},
                           {line,178}]},
                      {'Elixir.Connection',disconnect,3,
                          [{file,"lib/connection.ex"},{line,767}]},
                      {gen_server,try_dispatch,4,
                          [{file,"gen_server.erl"},{line,615}]},
                      {gen_server,handle_msg,5,
                          [{file,"gen_server.erl"},{line,681}]}]}
     Offender:   [{pid,<0.1844.0>},
                  {id,'Elixir.DBConnection.Poolboy.Worker'},
                  {mfargs,
                      {'Elixir.DBConnection.Poolboy.Worker',start_link,
                          undefined}},
                  {restart_type,temporary},
                  {shutdown,5000},
                  {child_type,worker}]

Looks like there's some message mariaex isn't able to handle?

@myronmarston
Copy link
Author

We're also getting similar function clause errors with <<6>>:

Mariaex.Messages.decode_msg(<<6>>, :running)

These errors are happening fairly often (a few times per hour) and I'd love to see this fixed. I can pitch in, if guidance could be provided. I don't really know the MySQL protocol so I'm not sure what to do with this kind of message. Is it safe to add a decode_msg definition that ignores (or logs) unrecognized messages instead of crashing?

@liveforeverx
Copy link
Member

Hi! @myronmarston Do you have any test case with a mock data, how to reproduce it or it really happen somewhere spontaneously during long run?

@myronmarston
Copy link
Author

Unfortunately, I have no idea how to reproduce this. I've only ever seen it in our production logs, and it's not clear what queries were running when this happens.

@liveforeverx
Copy link
Member

@myronmarston Can you check the logs, please and check, if the stacktraces always points to some disconnect routine ( Mariaex.Protocol.disconnect ) or there is something else involved?

@myronmarston
Copy link
Author

Do you mean check to see if (mariaex) lib/mariaex/protocol.ex:217: Mariaex.Protocol.disconnect/2 is always in the stacktrace when this error occurs?

@liveforeverx
Copy link
Member

liveforeverx commented Dec 5, 2016

@myronmarston For example, yes.

@myronmarston
Copy link
Author

I looked through our prod logs and it looks like that line is always part of the stack trace.

@fishcakez
Copy link
Contributor

fishcakez commented Dec 15, 2016

Its not safe to do a socket recv in the disconnect callback of DBConnection unless there's a DBConnection.SojournError as the first argument. The client process might be using the socket or have data in the buffer that is part of a query.

@driv3r
Copy link

driv3r commented Jan 3, 2017

We have similar issues, @fishcakez any hints how to pin point the issue?

@driv3r
Copy link

driv3r commented Jan 3, 2017

After digging in a bit more I found out that we actually get one more error 2-5 seconds earlier always before this decode issue, something like:

Elixir.Mariaex.Error [tcp] `recv` failed with: :timeout 
    lib/ecto/adapters/sql.ex:444 Ecto.Adapters.SQL.execute_or_reset/7
    lib/ecto/repo/queryable.ex:130 Ecto.Repo.Queryable.execute/5
    lib/ecto/repo/queryable.ex:35 Ecto.Repo.Queryable.all/4
    timer.erl:166 :timer.tc/1
    lib/ex_statsd.ex:184 anonymous fn/4 in ExStatsD.timing/3
    lib/preload.ex:24 anonymous fn/2 in Preload.preload_store/2
    timer.erl:166 :timer.tc/1
    lib/ex_statsd.ex:184 anonymous fn/4 in ExStatsD.timing/3

So probably resolving this should resolve the decode issue right?

@fishcakez
Copy link
Contributor

@driv3r the timeout issue is described in #162 and disconnect recv is at

case msg_recv(state) do

@myronmarston
Copy link
Author

We are still getting this periodically (usually a time or two a day) and it would be great to see this get fixed. I often jump into open source projects we use to fix issues we face but I don't really understand how to fix this one. Anything I can do to help?

@MonkeyIsNull
Copy link

On our servers we get the timeout issue and the get decode_msg called with "*" and :running

logdump

** (FunctionClauseError) no function clause matching in Mariaex.Messages.decode_msg/2
    (mariaex) lib/mariaex/messages.ex:261: Mariaex.Messages.decode_msg("*", :running)
    (mariaex) lib/mariaex/messages.ex:225: Mariaex.Messages.decode/2
    (mariaex) lib/mariaex/protocol.ex:1097: Mariaex.Protocol.msg_decode/2
    (mariaex) lib/mariaex/protocol.ex:1065: Mariaex.Protocol.msg_recv/3
    (mariaex) lib/mariaex/protocol.ex:254: Mariaex.Protocol.disconnect/2
    (db_connection) lib/db_connection/connection.ex:178: DBConnection.Connection.disconnect/2
    (connection) lib/connection.ex:767: Connection.disconnect/3
    (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
Last message: {:timeout, #Reference<0.1092615215.1916010497.47981>, {DBConnection.Connection, #PID<0.13935.9>, 15000}}

I have no idea what that message means nor what the library is supposed to do at that point otherwise I would submit a pull request right this second, however, I will go look it up and see if I can contribute anything - especially since this problem dings us pretty hard every few days

@lmarlow
Copy link

lmarlow commented Jan 17, 2018

We are also seeing this after a timeout talking to MySQL.

2018-01-17 21:40:52 =ERROR REPORT====
** Generic server <0.2091.0> terminating 
** Last message in was {timeout,#Ref<0.1184361718.207355906.33227>,{'Elixir.DBConnection.Connection',<0.2539.0>,120000}}
** When Server state == {'Elixir.Mariaex.Protocol',#{'__struct__' => 'Elixir.Mariaex.Protocol',binary_as => field_type_var_string,buffer => <<>>,cache => #Ref<0.1184361718.204341254.50865>,catch_eof => false,connection_id => <0.2091.0>,cursors => #{},deprecated_eof => false,lru_cache => {100,#Ref<0.1184361718.204341254.50866>},opts => [{socket_options,[]},{sock_type,tcp},{cache_size,100},{name,'Elixir.Sample.Repo.Pool'},{otp_app,sample_app},{repo,'Elixir.Sample.Repo'},{adapter,'Elixir.Ecto.Adapters.MySQL'},{username,<<"user">>},{password,<<"password">>},{database,<<"sample_db">>},{hostname,<<"127.0.0.1">>},{timeout,120000},{pool_size,320},{port,3306},{loggers,[{'Elixir.Ecto.LogEntry',log,[]},{'Elixir.Sample.Repo.Metrics',record_metric,[]}]},{pool_timeout,5000},{adapter,'Elixir.Ecto.Adapters.MySQL'},{username,{system,<<"DB_USER">>}},{password,{system,<<"DB_PASS">>}},{database,{system,<<"DB_NAME">>}},{hostname,{system,<<"DB_HOST">>}},{timeout,120000},{pool_size,{system,integer,<<"DB_POOL_SIZE">>,20}},{loggers,[{'Elixir.Ecto.LogEntry',log,[]},{'Elixir.Sample.Repo.Metrics',record_metric,[]}]},{pool,'Elixir.DBConnection.Poolboy'}],seqnum => 0,sock => {'Elixir.Mariaex.Connection.Tcp',#Port<0.4032>},ssl_conn_state => not_used,state => running,state_data => nil,timeout => 120000}}
** Reason for termination == 
** {function_clause,[{'Elixir.Mariaex.Messages',decode_msg,[<<18>>,running],[{file,"lib/mariaex/messages.ex"},{line,261}]},{'Elixir.Mariaex.Messages',decode,2,[{file,"lib/mariaex/messages.ex"},{line,225}]},{'Elixir.Mariaex.Protocol',msg_decode,2,[{file,"lib/mariaex/protocol.ex"},{line,1097}]},{'Elixir.Mariaex.Protocol',msg_recv,3,[{file,"lib/mariaex/protocol.ex"},{line,1065}]},{'Elixir.Mariaex.Protocol',disconnect,2,[{file,"lib/mariaex/protocol.ex"},{line,254}]},{'Elixir.DBConnection.Connection',disconnect,2,[{file,"lib/db_connection/connection.ex"},{line,178}]},{'Elixir.Connection',disconnect,3,[{file,"lib/connection.ex"},{line,767}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]}]}
2018-01-17 21:40:52 =CRASH REPORT====
  crasher:
    initial call: Elixir.DBConnection.Connection:init/1
    pid: <0.2091.0>
    registered_name: []
    exception error: {function_clause,[{'Elixir.Mariaex.Messages',decode_msg,[<<18>>,running],[{file,"lib/mariaex/messages.ex"},{line,261}]},{'Elixir.Mariaex.Messages',decode,2,[{file,"lib/mariaex/messages.ex"},{line,225}]},{'Elixir.Mariaex.Protocol',msg_decode,2,[{file,"lib/mariaex/protocol.ex"},{line,1097}]},{'Elixir.Mariaex.Protocol',msg_recv,3,[{file,"lib/mariaex/protocol.ex"},{line,1065}]},{'Elixir.Mariaex.Protocol',disconnect,2,[{file,"lib/mariaex/protocol.ex"},{line,254}]},{'Elixir.DBConnection.Connection',disconnect,2,[{file,"lib/db_connection/connection.ex"},{line,178}]},{'Elixir.Connection',disconnect,3,[{file,"lib/connection.ex"},{line,767}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]}]}
    ancestors: [<0.2037.0>,'Elixir.Sample.Repo.Pool','Elixir.Sample.Repo','Elixir.Sample.Supervisor',<0.2033.0>]
    message_queue_len: 1
    messages: [{'$gen_cast',{disconnect,#Ref<0.1184361718.206831620.161378>,#{'__exception__' => true,'__struct__' => 'Elixir.Mariaex.Error',action => <<"recv">>,connection_id => <0.2091.0>,mariadb => nil,message => nil,reason => timeout,tag => tcp},#{'__struct__' => 'Elixir.Mariaex.Protocol',binary_as => field_type_var_string,buffer => <<>>,cache => #Ref<0.1184361718.204341254.50865>,catch_eof => false,connection_id => <0.2091.0>,cursors => #{},deprecated_eof => false,lru_cache => {100,#Ref<0.1184361718.204341254.50866>},opts => [{socket_options,[]},{sock_type,tcp},{cache_size,100},{name,'Elixir.Sample.Repo.Pool'},{otp_app,sample_app},{repo,'Elixir.Sample.Repo'},{adapter,'Elixir.Ecto.Adapters.MySQL'},{username,<<"user">>},{password,<<"password">>},{database,<<"sample_db">>},{hostname,<<"127.0.0.1">>},{timeout,120000},{pool_size,320},{port,3306},{loggers,[{'Elixir.Ecto.LogEntry',log,[]},{'Elixir.Sample.Repo.Metrics',record_metric,[]}]},{pool_timeout,5000},{adapter,'Elixir.Ecto.Adapters.MySQL'},{username,{system,<<"DB_USER">>}},{password,{system,<<"DB_PASS">>}},{database,{system,<<"DB_NAME">>}},{hostname,{system,<<"DB_HOST">>}},{timeout,120000},{pool_size,{system,integer,<<"DB_POOL_SIZE">>,20}},{loggers,[{'Elixir.Ecto.LogEntry',log,[]},{'Elixir.Sample.Repo.Metrics',record_metric,[]}]},{pool,'Elixir.DBConnection.Poolboy'}],seqnum => 0,sock => {'Elixir.Mariaex.Connection.Tcp',#Port<0.4032>},ssl_conn_state => not_used,state => running,state_data => nil,timeout => 120000}}}]
    links: [<0.2036.0>,<0.2037.0>,#Port<0.4032>]
    dictionary: [{lru_cache,{100,#Ref<0.1184361718.204341254.50866>}},{cache,#Ref<0.1184361718.204341254.50865>}]
    trap_exit: false
    status: running
    heap_size: 2586
    stack_size: 27
    reductions: 38528
  neighbours:
2018-01-17 21:40:52 =SUPERVISOR REPORT====
     Supervisor: {<0.2037.0>,poolboy_sup}
     Context:    child_terminated
     Reason:     {function_clause,[{'Elixir.Mariaex.Messages',decode_msg,[<<18>>,running],[{file,"lib/mariaex/messages.ex"},{line,261}]},{'Elixir.Mariaex.Messages',decode,2,[{file,"lib/mariaex/messages.ex"},{line,225}]},{'Elixir.Mariaex.Protocol',msg_decode,2,[{file,"lib/mariaex/protocol.ex"},{line,1097}]},{'Elixir.Mariaex.Protocol',msg_recv,3,[{file,"lib/mariaex/protocol.ex"},{line,1065}]},{'Elixir.Mariaex.Protocol',disconnect,2,[{file,"lib/mariaex/protocol.ex"},{line,254}]},{'Elixir.DBConnection.Connection',disconnect,2,[{file,"lib/db_connection/connection.ex"},{line,178}]},{'Elixir.Connection',disconnect,3,[{file,"lib/connection.ex"},{line,767}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]}]}
     Offender:   [{pid,<0.2091.0>},{id,'Elixir.DBConnection.Poolboy.Worker'},{mfargs,{'Elixir.DBConnection.Poolboy.Worker',start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

@michaelst
Copy link

This error has been happening very frequently for us, 186 times in the last 24 hours. This is the stack trace we get:

eloraburns added a commit to eloraburns/mariaex that referenced this issue Nov 16, 2018
This should allow us to remove show_sensitive_data_on_connection_error
as an option, as the sensitive data won't be in the main process' state
(just a pid). There are also cases where passwords are still dumped,
e.g. when we hit xerions#151 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants