Skip to content
This repository has been archived by the owner on Sep 1, 2022. It is now read-only.

NTCP: multiple core issues #187

Closed
1 task done
anonimal opened this issue May 29, 2016 · 9 comments
Closed
1 task done

NTCP: multiple core issues #187

anonimal opened this issue May 29, 2016 · 9 comments

Comments

@anonimal
Copy link
Collaborator

anonimal commented May 29, 2016

By submitting this issue, I confirm the following:

  • I have read and understood the contributor guide.
  • I have checked that the issue I am reporting can be replicated or that the feature I am suggesting is not present.
  • I have checked opened or recently closed pull requests for existing solutions/implementations to my issue/suggestion.

Place an X inside the bracket to confirm

  • I confirm.

Up until very recently, with full knowledge of the quality of this code-base and the many security issues/issues at-large (many of which have been resolved), I had never ran Kovri over clearnet - not once - since the first Kovri commit. That's over 6+ months of non-clearnet development! Instead, I had been using a Tor transproxy to do the myriad of other other critical/major/misc development that was needed - but did not require clearnet access.

Now that I am regularly developing Kovri over clearnet, core issues have been exposed of which I would have caught earlier had I been able to safely develop over clearnet (I had assumed some of these issues were transproxy-related and I also did not have proper debugging output until fe9654a). One of these core issues is an apparently very broken NTCP implementation.

An example, as seen in every NTCP session (Edit: i.e., every session that successfully connects and completes the 4-phase message sequence):

2016:05:29|06:44:40.300303 default:default              NFO             NTCPServer: connected to 184.58.98.218:24234
2016:05:29|06:44:40.300510 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase1, preparing
2016:05:29|06:44:40.300686 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- Phase1, sending
2016:05:29|06:44:40.300866 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** scheduling termination
2016:05:29|06:44:40.301042 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 --> Phase1 sent, receiving
2016:05:29|06:44:40.554746 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase2 received, processing
2016:05:29|06:44:40.554968 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** creating shared key
2016:05:29|06:44:40.556551 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase2 successful, proceeding to Phase3
2016:05:29|06:44:40.556774 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase3, preparing
2016:05:29|06:44:40.557208 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- Phase3, sending
2016:05:29|06:44:40.557440 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 --> Phase3 sent, receiving Phase4
2016:05:29|06:44:40.741753 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase4 received, processing 64 bytes
2016:05:29|06:44:40.742591 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** Phase4, session connected
2016:05:29|06:44:40.742823 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** processing connected session
2016:05:29|06:44:40.743005 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- sending TimeSyncMessage
2016:05:29|06:44:40.743187 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- sending I2NP message
2016:05:29|06:44:40.743658 default:default              DBG             Transports: [oT4e] connecting
2016:05:29|06:44:40.743860 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 --> receiving payload
2016:05:29|06:44:40.744006 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- 16 bytes transferred, 16 total bytes sent
2016:05:29|06:44:40.744208 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- sending I2NP messages
2016:05:29|06:44:40.744437 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- 704 bytes transferred, 720 total bytes sent
2016:05:29|06:44:40.744616 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- sending I2NP messages
2016:05:29|06:44:40.744876 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 <-- 2672 bytes transferred, 3392 total bytes sent
2016:05:29|06:44:40.745062 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** scheduling termination
2016:05:29|06:44:40.745229 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** handling termination timer
2016:05:29|06:44:40.877248 default:default              ERR             NTCPSession: [oT4e] 184.58.98.218:24234 !!! HandleReceived(): 'End of file'
2016:05:29|06:44:40.877482 default:default              NFO             NTCPSession: [oT4e] 184.58.98.218:24234 !!! banning
2016:05:29|06:44:40.877663 default:default              NFO             NTCPServer: [oT4e] 184.58.98.218:24234 has been banned for 70 seconds
2016:05:29|06:44:40.877807 default:default              DBG             NTCPSession: [oT4e] 184.58.98.218:24234 *** terminating session
2016:05:29|06:44:40.878077 default:default              DBG             Transports: disconnecting peer
2016:05:29|06:44:40.878301 default:default              ERR             NTCPServer: ioservice error: 'remote_endpoint: Bad file descriptor'
2016:05:29|06:44:40.878404 default:default              DBG             NTCPServer: running ioservice

So, at first glance, the issues being:
1. Peer closes connection for currently unknown reasons Edit: fixed
2. We ban every NTCP peer because of said reason (or simply because they closed the connection). Why do they close the connection and why exactly are we banning? Once we find the reason for 1, we should review why we ban. Edit: fixed

Also,
3. When stopping the router:
2016:05:29|07:08:59.634290 default:default DBG NTCPSession: [] 0.0.0.0:0 *** destroying session
At times this can appear a multiple of times, I've seen upwards of 20 times. Why are sessions being established to 0.0.0.0:0 and why so many times?
Edit: fixed

And lastly,
4. NTCPServer appears to be incomplete, and also does not process any incoming TCP connections. This is a big. The whole point of P2P is to accept and relay traffic. Kovri is literally running on one leg by only using SSU. Edit: outbound NTCP is now functional. Also, our improper NTCP handling may watermark our router to observant peers or ISPs.

I need more time to review. Fixes may be simple and/or they may come along with a completed NTCP implementation. I'm labeling this issue as critical because of the potential security implications. Otherwise, it's simply a major bug.

@anonimal
Copy link
Collaborator Author

To clarify: these issues existed before our first Kovri commit; so we have always had these issues.

EinMByte added a commit to EinMByte/kovri that referenced this issue Sep 4, 2016
@EinMByte EinMByte self-assigned this Sep 5, 2016
EinMByte added a commit to EinMByte/kovri that referenced this issue Sep 5, 2016
See monero-project#187.
* Correctly deal with short messages.
* Read asynchronously.
* Stop reading when an EOF error occurs.
EinMByte added a commit to EinMByte/kovri that referenced this issue Sep 5, 2016
See issue monero-project#187 and commit bead4f6.
In NTCPServer::HandleReceivedPayload.
EinMByte added a commit to EinMByte/kovri that referenced this issue Sep 5, 2016
See monero-project#187
NTCPSession::ReceivedPayload, NTCPSession::HandleReceivedPayload
@EinMByte
Copy link
Contributor

EinMByte commented Sep 5, 2016

I can confirm that (at commit 65f8ee1) in principle it is possible to build tunnels using only NTCP.
However, it seems to take a very long time compared to SSU.

Some unresolved errors: (ticked if resolved by now)

  • NTCPServer: ioservice error: 'remote_endpoint: Bad file descriptor'
  • NTCPSession: [] 0.0.0.0:0 *** destroying session (Happens always, not only when stopping the router.)
  • We do not get any inbound NTCP traffic (why?), so we can't really know whether or not inbound NTCP is working.
  • NTCPSession: [...] !!! Phase4 read error 'End of file', check your clock (Looks similar to the by now resolved ban issue.)
  • NTCPSession: [...] !!! Phase2 read error 'End of file' (Looks similar to the by now resolved ban issue.)

The first error seems to be the most common one. Even if it is really not possible to connect to the remote endpoint, we should still avoid having to restart the I/O service.
Note on the last two errors: we really are not expecting EOF in phase 2/4. But these errors seem to be too frequent to be network-related. Will discuss with zzz.

EinMByte added a commit to EinMByte/kovri that referenced this issue Sep 5, 2016
See monero-project#187.
NTCPSession::HandleReceivedPayload, check whether m_NextMessage ==
nullptr
EinMByte added a commit to EinMByte/kovri that referenced this issue Sep 5, 2016
See monero-project#187.
NTCPServer: Add NTCP session after connecting.
anonimal added a commit that referenced this issue Sep 6, 2016
2c66a3f Avoid signed-unsigned comparison (EinMByte)
9e9770c Fix typo in TODO. (EinMByte)
d6d952c Fix #334. (EinMByte)
6b99292 Fix bad file descriptor error (EinMByte)
1b4dc54 Reschedule and flush after receiving full message (EinMByte)
65f8ee1 Do not require all bytes to be transferred (EinMByte)
40408b7 Add TODO about flushing m_Handler. (EinMByte)
bead4f6 Improvements to NTCPSession::HandleReceivedPayload (EinMByte)
3753198 Do not ban peers for sending short messages (#187) (EinMByte)
351a527 Do not throw when connection fails (EinMByte)
4b0249f Add options to disable SSU/NTCP (fixes #184) (EinMByte)
EinMByte added a commit to EinMByte/kovri that referenced this issue Sep 9, 2016
@anonimal anonimal added this to the 0.1.0 milestone Sep 9, 2016
@anonimal
Copy link
Collaborator Author

anonimal commented Oct 2, 2016

All that remains is point 4. in OP and the remaining two box options in @EinMByte's comment above.

Removing critical label.

@anonimal anonimal added major and removed critical labels Oct 2, 2016
@anonimal
Copy link
Collaborator Author

@EinMByte something I noticed earlier today when dealing with another issue: if you look inside router.info file after running --enable-ssu 0, you'll notice that the host= value appears malformed (unless whitespace is ignored?) and, unlike its SSU-enabled counterpart, has host as loopback. Passing --host [public facing ip] resolves both issues though I still see no regular inbound TCP traffic to the port specified in router.info (this was an off-handed observation whilst working on other milestone issues and I haven't looked deeper into the inbound issue).

Side-note: an RI unit-test and/or an RI validator as mentioned in #399 would be of use. I'll write the unit-test or both when the time comes (I don't think it will help much for NTCP right now though).

@anonimal
Copy link
Collaborator Author

anonimal commented Nov 13, 2016

I thought this issue would have a log snippet by now (we've done most of our debugging on IRC) so I've attached a tcpdump and kovri log snippet with scrubbed addresses (built against 9d44c0d).

@EinMByte to follow up on: notice the abbreviated ident hash in line 4 as it relates to the others (this is seen throughout the log) and our outgoing port (SSU has uses an inbound-as-outbound port but NTCP does not): fixed in anonimal@c2f6920

08:56:41.249528|DBG   NTCPServer: [SVWt] 123.123.123.123:20539 *** removing NTCP session
08:56:41.249600|NFO   NTCPSession: [SVWt] 123.123.123.123:20539 *** session terminated
08:56:41.249677|DBG   NTCPSession: [SVWt] 123.123.123.123:20539 *** handling termination timer
08:56:56.516813|NFO   NTCPServer: connecting to [UdUE] 123.123.123.123:20539
08:56:56.549686|NFO   NTCPServer: connected to 123.123.123.123:20539
08:56:56.549844|DBG   NTCPSession: [SVWt] 123.123.123.123:20539 *** Phase1, preparing
08:56:56.550025|DBG   NTCPSession: [SVWt] 123.123.123.123:20539 <-- Phase1, sending

@anonimal
Copy link
Collaborator Author

anonimal commented Feb 6, 2017

JFTR: when working on #545, I noticed that tunnel building with --enable-ssu 0 is problematic on current master 3d4a3f7 (before merging #545); e.g., IRC client tunnel is not successful.

I don't remember the last time I tried to run an NTCP-only router (no earlier that November apparently) but even then, I don't remember testing tunnel usability (though I do remember it being reliable at some point).

Note: @EinMByte's 7e32943 was the most recent commit in this area (though this issue could predate that).

@EinMByte
Copy link
Contributor

EinMByte commented Feb 7, 2017

@anonimal We should probably discuss this issue on IRC. I do remember that at some point, NTCP only was at least partially working (at this time SSU was not working). I've always used --host so that could explain something.

@anonimal anonimal self-assigned this Feb 7, 2017
@anonimal
Copy link
Collaborator Author

@EinMByte ok, sounds great. I'll give you a ping next week. Now that milestone issues are nearing completion, I planned on getting back to this issue once the osx/windows issues we spoke about (not NTCP) were resolved.

@anonimal
Copy link
Collaborator Author

anonimal commented Sep 7, 2018

NOTICE: THIS ISSUE HAS BEEN MOVED TO GitLab. Please continue the discussion there. See #1013 for details.

@anonimal anonimal closed this as completed Sep 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants