Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting over ethernet causes infinite loop #24

Open
GuzTech opened this issue Mar 29, 2020 · 12 comments
Open

Connecting over ethernet causes infinite loop #24

GuzTech opened this issue Mar 29, 2020 · 12 comments
Assignees

Comments

@GuzTech
Copy link

GuzTech commented Mar 29, 2020

I have been testing Litex on the Colorlight 5A-75B board and I have connected to it with wishbone-tool over Etherbone several times.

After some modifications to the SoC, I noticed that I couldn't connect to it anymore. Then I saw that it has nothing to do with the board, as the wishbool-tool gives me this in a loop that I cannot CTRL-C out of:

ERROR [wishbone_tool::bridge::ethernet] ethernet connection was closed: peek IoError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) @ 82001820
INFO [wishbone_tool::bridge::ethernet] Re-opened ethernet host 192.168.1.50:1234

This happens all the time, even if I disable all network ports. I invoke it like this:

wishbone-tool --ethernet-host 192.168.1.50 --server terminal --csr-csv=csr.csv

This is with the latest commit, and compiled with Rust version 1.42.

@mithro
Copy link
Contributor

mithro commented Mar 29, 2020

@xobs, @enjoy-digital - Thoughts?

@xobs
Copy link
Member

xobs commented Mar 30, 2020

Can you run it with RUST_LOG=debug? Do you have a CPU configured?

@xobs
Copy link
Member

xobs commented Mar 30, 2020

The socket as a 1000 ms (one second) timeout, and if that timeout expires (and you're on a Unix-like) it will generate EAGAIN, or "Resource temporarily unavailable": https://doc.rust-lang.org/std/net/struct.TcpStream.html#platform-specific-behavior-1

With RUST_LOG=debug we can see more of what's going on.

Also, if you're connecting directly to the board (as opposed to going through litex_server, make sure you DO NOT add --ethernet-tcp to the command line.

@enjoy-digital
Copy link
Member

@GuzTech: The colorlight target currently has timing issues (litex-hub/litex-boards#40). Despite that the target in litex-boards is working correctly, but before trying wishbone-tool i would recommend trying to ping it manually. This would validate that the hardware IP/UDP stack is behaving correctly and that wishbone-tool can operate. If you are not able to ping it, it's more a gateware/timing issue than a wishbone-tool issue and i have a look at that if you share a design that allows reproducing the issue.

@GuzTech
Copy link
Author

GuzTech commented Mar 30, 2020

@enjoy-digital Yesterday I was trying the colorlight target and everything worked except for the SDRAM. Then I checked the nextpnr log and saw that both the system clock and ethernet clock (125 MHz) fails like you said. I asked on the 1BitSquared discord and @daveshah1 suggested that I could try to lower the system clock to 40 MHz which passes timing (ethernet still doesn't but is close to 125 MHz). After this, everything stopped working.

So as you suggested, I tried to ping the board and it fails. So I re-synthesized with the system clock set back to 125 MHz and now I can ping the board and wishbone-tool also works. A design with negative slack is playing Russian roulette, but of course this has nothing to do with the wishbone-tool issue.

@xobs Here is the output when I run it with RUST_LOG=debug:

ERROR [wishbone_tool::bridge::ethernet] ethernet connection was closed: peek IoError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }) @ 82001820
DEBUG [wishbone_tool::bridge] Peek failed, trying again: IoError(Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" })
INFO [wishbone_tool::bridge::ethernet] Re-opened ethernet host 192.168.1.50:1234
DEBUG [wishbone_tool::bridge] Peek failed, trying again: NotConnected

I'm directly connecting to the board and the command I invoke (in the OP) does not use --ethernet-tcp. So the problem seems that whenever it tries to connect to an unavailable host, it causes this problem and is unrelated to the FPGA board.

@xobs
Copy link
Member

xobs commented Mar 30, 2020

That's kind of the design, but I agree it's not clear that's what's going on.

wishbone-tool doesn't know if your board has crashed, isn't connected, or isn't programmed. It's designed to let you run the command and it will wait for you to connect the device, at least at the PHY layer. It does this so that, for example, you can connect GDB to the board and it will stay connected even if you reflash the FPGA.

@GuzTech
Copy link
Author

GuzTech commented Mar 30, 2020

Sure, that makes sense. But why am I not able to CTRL-C out of it? I have to kill the process if I want out.

@xobs
Copy link
Member

xobs commented Mar 30, 2020

What was the command you used to run wishbone-tool?

@GuzTech
Copy link
Author

GuzTech commented Mar 30, 2020

wishbone-tool --ethernet-host 192.168.1.50 --server terminal --csr-csv=csr.csv

I have also tried litex-devmem2, and I can connect to the board propertly. When I specify and invalid target address I can CTRL-C out of it.

@xobs
Copy link
Member

xobs commented Mar 30, 2020

In a separate channel, we determined that the board in question is failing to meet timing by more than 3x (requested: 125 MHz, actual: 41.51 MHz). The link is unstable, so it is spending a lot of time retrying the connection.

The terminal "server" is managed in a function called "terminal_client()". This server attempts to read from the serial port IRQ status register, and if that fails then it polls the console. However, due to how wishbone-tool aggressively tries to re-establish the connection, it actually gets stuck in https://github.com/litex-hub/wishbone-utils/blob/master/wishbone-tool/src/server/mod.rs#L457-L466 waiting for a response.

Furthermore, the terminal server takes over the console, preventing you from e.g. sending "Control-C". This keystroke combination is only checked further down in https://github.com/litex-hub/wishbone-utils/blob/master/wishbone-tool/src/server/mod.rs#L483-L486

So if:

  1. You're using the terminal server, and
  2. you're connecting via Etherbone, and
  3. It's direct using UDP, and
  4. The board is not providing reliable communication,

then it will get stuck waiting for the board to respond and never check for Control-C.

@enjoy-digital
Copy link
Member

@GuzTech: since the LiteEth core is currently running in sys_clk domain, sys_clk needs to be >= 125MHz for the IP/UDP MAC to work, that's the reason it's actually set to 125MHz in the target file.
I'm planning to work on this, but don't have the solution for now to have something functional that also meets timings.

@xobs
Copy link
Member

xobs commented Nov 14, 2020

This is likely related to #33 and was fixed in 41f8c81

Can you try v0.7.8 and see if it solves the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants