diff --git a/README.md b/README.md index 2d3158f81581eade7c33d0f0736da63708759bd3..736414389792023721e13e7d3dc5569cc45b6d68 100644 --- a/README.md +++ b/README.md @@ -1,87 +1,302 @@ -# TCP/IP Stack Design Using Vivado HLS +# Scalable Network Stack supporting TCP/IP, RoCEv2, UDP/IP at 10-100Gbit/s +#### Table of contents +1. [Getting Started](#gettingstarted) +2. [Compiling HLS modules](#compiling) +3. [Interfaces](#interfaces) + 1. [TCP/IP](#tcp-interface) + 2. [ROCE](#roce-interface) +4. [Benchmarks](#benchmarks) +5. [Publications](#publications) +6. [Citation](#citation) +7. [Contributors](#contributors) + + +<a name="gettingstarted"></a> ## Getting Started ### Prerequisites -- Xilinx Vivado 2018.1 -- License for Xilinx 10G MAC IP -- Linux OS +- Xilinx Vivado 2019.1 +- cmake 3.0 or higher Supported boards (out of the box) - Xilinx VC709 - Xilinx VCU118 - Alpha Data ADM-PCIE-7V3 -### Installation +<a name="compiling"></a> +## Compiling (all) HLS modules and install them to your IP repository + +0. Optionally specify the location of your IP repository: +``` +export $IPREPO_DIR=/home/myname/iprepo +``` + +1. Create a build directory +``` +mkdir build +cd build +``` + +2.a) Configure build +``` +cmake .. -DDATA_WIDTH=64 -DCLOCK_PERIOD=3.1 -DFPGA_PART=xcvu9p-flga2104-2L-e -DFPGA_FAMILY=ultraplus -DVIVADO_HLS_ROOT_DIR=/opt/Xilinx/Vivado//2019.1/bin/ +``` + +2.b)Alternatively you can use one the board name ot configure your build +``` +cmake .. -DDEVICE_NAME=vcu118 +``` + +All cmake options: + +| Name | Values | Desription | +| --------------------------- | --------------------- | ----------------------------------------------------------------------- | +| DEVICE_NAME | <vc709,vcu118,adm7v3> | Supported devices | +| NETWORK_BANDWIDTH | <10,100> | Bandwidth of the Ethernet interface in Gbit/s, default depends on board | +| FPGA_PART | <name> | Name of the FPGA part, e.g. xc7vx690tffg1761-2 | +| FPGA_FAMILY | <7series,ultraplus> | Name of the FPGA part family | +| DATA_WIDTH | <8,16,32,64> | Data width of the network stack in bytes | +| CLOCK_PERIOD | <nanoseconds> | Clock period in nanoseconds, e.g. 3.1 for 100G, 6.4 for 10G | +| TCP_STACK_MSS | <value> | Maximum segment size of the TCP/IP stack | +| TCP_STACK_WINDOW_SCALING_EN | <0,1> | Enalbing TCP Window scaling option | +| VIVADO_HLS_ROOT_DIR | <path> | Path to Vivado HLS directory, e.g. /opt/Xilinx/Vivado/2019.1 | + +3. Build HLS IP cores and install them into IP repository +``` +make installip +``` + + +For an example project including the TCP/IP stack or the RoCEv2 stack with DMA to host memory checkout our Distributed Accelerator OS [DavOS](https://github.com/fpgasystems/davos). + + +## Working with individual HLS modules + +1. Setup build directory, e.g. for the TCP module + +``` +$ cd hls/toe +$ mkdir build +$ cd build +$ cmake .. -DFPGA_PART=xcvu9p-flga2104-2L-e -DDATA_WIDTH=8 -DCLOCK_PERIOD=3.1 +``` + +2. Run c-simulation +``` +$ make csim +``` + +3. Run c-synthesis +``` +$ make synthesis +``` + +4. Generate HLS IP core +``` +$ make ip +``` + +5. Install HLS IP core into the IP repository +``` +$ make installip +``` + +<a name="interfaces"></a> +## Interfaces +All interfaces are using the AXI4-Stream protocol. For AXI4-Streams carrying network/data packets, we use the following definition in HLS: +``` +template <int D> +struct net_axis +{ + ap_uint<D> data; + ap_uint<D/8> keep; + ap_uint<1> last; +}; +``` -Make sure that Vivado and Vivado HLS are in your PATH. Use version 2018.1 +<a name="tcp-interface"></a> +### TCP/IP -Navigate to the _hls_ directory: +#### Open Connection +To open a connection the destination IP address and TCP port have to provided through the `s_axis_open_conn_req` interface. The TCP stack provides an answer to this request through the `m_axis_open_conn_rsp` interface which provides the sessionID and a boolean indicating if the connection was openend successfully. - cd hls +Interface definition in HLS: +``` +struct ipTuple +{ + ap_uint<32> ip_address; + ap_uint<16> ip_port; +}; +struct openStatus +{ + ap_uint<16> sessionID; + bool success; +}; + +void toe(... + hls::stream<ipTuple>& openConnReq, + hls::stream<openStatus>& openConnRsp, + ...); +``` -Execute the script generate the HLS IP cores for your board: - ./generate_hls vc709 -For the VCU118 run - ./generate_hls vcu118 +#### Close Connection +To close a connection the sessionID has to be provided to the `s_axis_close_conn_req` interface. The TCP/IP stack does not provide a notification upon completion of this request, however it is guranteeed that the connection is closed eventually. +Interface definition in HLS: +``` +hls::stream<ap_uint<16> >& closeConnReq, -Navigate to the _projects_ directory: +``` - cd ../projects +#### Open a TCP port to listen on +To open a port to listen on (e.g. as a server), the port number has to be provided to `s_axis_listen_port_req`. The port number has to be in range of active ports: 0 - 32767. The TCP stack will respond through the `m_axis_listen_port_rsp` interface indicating if the port was set to the listen state succesfully. -Create the example project for your board. +Interface definition in HLS: +``` +hls::stream<ap_uint<16> >& listenPortReq, +hls::stream<bool>& listenPortRsp, +``` -For the Xilinx VC709: +#### Receiving notifications from the TCP stack +The application using the TCP stack can receive notifications through the `m_axis_notification` interface. The notifications either indicate that new data is available or that a connection was closed. - vivado -mode batch -source create_vc709_proj.tcl +Interface definition in HLS: +``` +struct appNotification +{ + ap_uint<16> sessionID; + ap_uint<16> length; + ap_uint<32> ipAddress; + ap_uint<16> dstPort; + bool closed; +}; + +hls::stream<appNotification>& notification, +``` -For the Alpha DATA ADM-PCIE-7V3: +#### Receiving data +If data is available on a TCP/IP session, i.e. a notification was received. Then this data can be requested through the `s_axis_rx_data_req` interface. The data as well as the sessionID are then received through the `m_axis_rx_data_rsp_metadata` and `m_axis_rx_data_rsp` interface. - vivado -mode batch -source reate_adm7v3_proj.tcl +Interface definition in HLS: +``` +struct appReadRequest +{ + ap_uint<16> sessionID; + ap_uint<16> length; +}; + +hls::stream<appReadRequest>& rxDataReq, +hls::stream<ap_uint<16> >& rxDataRspMeta, +hls::stream<net_axis<WIDTH> >& rxDataRsp, +``` -For the Xilinx VCU118: +Waveform of receiving a (data) notification, requesting data, and receiving the data: - vivado -mode batch -source create_vcu118_proj.tcl + -After the previous command executed, a Vivado project will be created and the Vivado GUI is started. +#### Transmitting data +When an application wants to transmit data on a TCP connection, it first has to check if enough buffer space is available. This check/request is done through the `s_axis_tx_data_req_metadata` interface. If the response through the `m_axis_tx_data_rsp` interface from the TCP stack is positive. The application can send the data through the `s_axis_tx_data_req` interface. If the response from the TCP stack is negative the application can retry by sending another request on the `s_axis_tx_data_req_metadata` interface. -Click "Generate Bitstream" to generate a bitstream for the FPGA. +Interface definition in HLS: +``` +struct appTxMeta +{ + ap_uint<16> sessionID; + ap_uint<16> length; +}; +struct appTxRsp +{ + ap_uint<16> sessionID; + ap_uint<16> length; + ap_uint<30> remaining_space; + ap_uint<2> error; +}; + +hls::stream<appTxMeta>& txDataReqMeta, +hls::stream<appTxRsp>& txDataRsp, +hls::stream<net_axis<WIDTH> >& txDataReq, +``` -## Testing the example project +Waveform of requesting a data transmit and transmitting the data. + -The default configuration deploys a TCP echo server and a UDP iperf client. The default IP address the board is 10.1.212.209. Make sure the testing machine conencted to the FPGA board is in the same subnet 10.1.212.* -As an intial connectivity test ping the FPGA board by running +<a name="roce-interface"></a> +### RoCE (RDMA over Converged Ethernet) - ping 10.1.212.209 +#### Load Queue Pair (QP) +Before any RDMA operations can be executed the Queue Pairs have to established out-of-band (e.g. over TCP/IP) by the hosts. The host can the load the QP into the RoCE stack through the `s_axis_qp_interface` and `s_axis_qp_conn_interface` interface. -After reprogramming the FPGA the first ping message is lost due to a missing ARP entry in the ARP table. However, the FPGA should reply to all following ping messages. +Interface definition in HLS: +``` +typedef enum {RESET, INIT, READY_RECV, READY_SEND, SQ_ERROR, ERROR} qpState; + +struct qpContext +{ + qpState newState; + ap_uint<24> qp_num; + ap_uint<24> remote_psn; + ap_uint<24> local_psn; + ap_uint<16> r_key; + ap_uint<48> virtual_address; +}; +struct ifConnReq +{ + ap_uint<16> qpn; + ap_uint<24> remote_qpn; + ap_uint<128> remote_ip_address; + ap_uint<16> remote_udp_port; +}; + +hls::stream<qpContext>& s_axis_qp_interface, +hls::stream<ifConnReq>& s_axis_qp_conn_interface, +``` - -For the TCP echo server you can use netcat: +#### Issue RDMA commands +RDMA commands can be issued to RoCE stack through the `s_axis_tx_meta` interface. In case the commands transmits data. This data can be either originate from the host memory as specified by the `local_vaddr` or can originate from the application on the FPGA. In the latter case the `local_vaddr` is set to 0 and the data is provided through the `s_axis_tx_data` interface. - echo 'hello world' | netcat -q 1 11.1.212.209 7 +Interface definition in HLS: +``` +typedef enum {APP_READ, APP_WRITE, APP_PART, APP_POINTER, APP_READ_CONSISTENT} appOpCode; + +struct txMeta +{ + appOpCode op_code; + ap_uint<24> qpn; + ap_uint<48> local_vaddr; + ap_uint<48> remote_vaddr; + ap_uint<32> length; +}; +hls::stream<txMeta>& s_axis_tx_meta, +hls::stream<net_axis<WIDTH> >& s_axis_tx_data, +``` +Waveform of issuing a RDMA read request: + -Alternatively, you can use the _echoping_ linux commandline tool. +Waveform of issuing an RDMA write request where data on the FPGA is transmitted: + -For the TCP and UDP iperf test, see [here](http://github.com/dsidler/fpga-network-stack/wiki/iPerf-Benchmark). -## Configuration +<a name="interfaces"></a> +## Benchmarks +(Coming soon) -Coming soon +<a name="publications"></a> ## Publications - D. Sidler, G. Alonso, M. Blott, K. Karras et al., *Scalable 10Gbps TCP/IP Stack Architecture for Reconfigurable Hardware,* in FCCM’15, [Paper](http://davidsidler.ch/files/fccm2015-tcpip.pdf), [Slides](http://fccm.org/2015/pdfs/M2_P1.pdf) - D. Sidler, Z. Istvan, G. Alonso, *Low-Latency TCP/IP Stack for Data Center Applications,* in FPL'16, [Paper](http://davidsidler.ch/files/fpl16-lowlatencytcpip.pdf) + +<a name="citation"></a> ## Citation If you use the TCP/IP stack in your project please cite one of the following papers and/or link to the github project: ``` @@ -95,6 +310,19 @@ If you use the TCP/IP stack in your project please cite one of the following pap booktitle={FPL'16}, title={{Low-Latency TCP/IP Stack for Data Center Applications}}, } +@PHDTHESIS{sidler2019innetworkdataprocessing, + author = {Sidler, David}, + publisher = {ETH Zurich}, + year = {2019-09}, + copyright = {In Copyright - Non-Commercial Use Permitted}, + title = {In-Network Data Processing using FPGAs}, +} ``` -For more information please visit the [wiki](http://github.com/dsidler/fpga-network-stack/wiki) +<a name="contributors"></a> +## Contributors +- [David Sidler](http://github.com/dsidler), [Systems Group](http://systems.ethz.ch), ETH Zurich +- [Monica Chiosa](http://github.com/chipet), [Systems Group](http://systems.ethz.ch), ETH Zurich +- [Mario Ruiz](https://github.com/mariodruiz), HPCN Group of UAM, Spain +- [Kimon Karras](http://github.com/kimonk), former Researcher at Xilinx Research, Dublin +- [Lisa Liu](http://github.com/lisaliu1), Xilinx Research, Dublin