Gigabit Ethernet and Fibre Channel Technology

Gigabit Ethernet Is Closely Related To Fibre Channel Technology, going back to 1988!

The engineering roots of Gigabit Ethernet go back to the original specifications for Fiber Channel (…most commonly spelled, “Fibre Channel”). This document provides a perspective on both the original Fibre Channel technology and the evolved Gigabit standards, including 8B/10B signal encoding.

Perspective On Fibre Channel

An understanding of the IEEE 802.3z standards for Gigabit Ethernet begins with a perspective on an earlier technology called Fibre Channel. (The word “Fiber” is typically presented with its European spelling, “Fibre”, when talking about Fibre Channel technology).

In 1988 the American National Standards Institute (ANSI) chartered the Fibre Channel Working Group (X3T11) to develop a standard for high-speed data transfer between computing devices. Bear in mind that this was only one year after the introduction of 16 Mb/s Token-Ring (802.5) so the idea of a 1 Gigabit standard, which is what Fibre Channel ultimately offered, was quite phenomenal. Fibre channel equipment is now available from the major vendors (Cisco, Cabletron, Bay Networks, etc.) and has become a firmly established standard.

In 1993, the Fibre Channel Systems Initiative (FCSI) was formed by Hewlett Packard Corp, IBM, and Sun Microsystems. The goal of FCSI was to promote the new Fibre Channel standards through the development of “FCSI Profiles”; implementation guidelines for manufacturers and integrators. While these Profiles are not “standards” in the strictest sense of the word, the do serve as a conformance model for interoperability testing between vendors.

The goal of Fibre Channel was to provide a high speed data transfer technology for the exchange of data between workstations, mainframes, supercomputers, storage devices, display devices, and other peripherals. In one very broad sense, Fibre Channel may be thought of as a “next generation SCSI” bus architecture. (SCSI, “Small Computer Systems Interface”, is a high-speed bus used to connect disk drives, tape drives, and other peripherals to computer systems). While SCSI transfers data at a rate of roughly 800 Mb/s, Fibre Channel provides up to 200 Gigabits/second (bi-directional aggregate bandwidth in full-duplex mode). A SCSI connection between a computer and a disk subsystem can be no more than a few feet while a Fibre Channel connection can be up to 10 km (or more, with current-technology high-power fiber repeaters).

The difference between a “channel” interconnection between devices and a “network” connection is fundamentally based on the relationship between the devices. A network, for the purposes of this discussion, is the aggregate communications community in which a number of devices may communicate to a number of other devices. One or more communications protocols operate to provide the control and management of these conversations. Networks, and their associated protocols, are intended to perform properly in a broad range of possible situations and with a broad range of potential requirements.

A “channel”, on the other hand, is a direct (or switched) point-to-point connection between two communicating devices. The implementation of a communications “channel” typically implies a high dependence on the hardware elements making up the transmit and receive side of the connection. Because of the very specific nature of the channel connection, the overhead and associated performance sacrifice necessary in the more generalized network communications architecture is greatly eliminated. The trade-off is in favor of speed and throughput at the cost of general-purpose connectivity.

The Fibre Channel standards attempt to combine aspects of “channel” connectivity with those of network connectivity. The communication path between two devices, the point-to-point connection itself, is referred to as the “fabric”. Fibre Channel supports its own high speed serial protocol across the fabric as well as other protocols including FDDI, SCSI, HIPPI, and others. (These acronyms are defined in the FC-4 Interface Layer discussion; below). In each case, the design objective of the fabric protocol is to achieve high speed data transfers of large amounts of data.

The Fibre Channel Fabric consists of a link with two unidirectional fibers transmitting in opposite directions. The Fabric itself allows for more complex interconnections than a physical point-to-point connection (although the data transfer is always either point-to-point or point-to-multipoint). Three physical implementations of the Fabric topology are possible: point-to-point, crosspoint switched, and arbitrated loop.

Fibre Channel is specified to operate at 133Mb/s, 266 Mb/s, 530 Mb/s and 1 Gb/s. The 1 Gigabit/second implementation of Fibre Channel served as the physical layer basis for Gigabit Ethernet. This layer is referred to as FC-0. The “layers” of functional operation in the Fibre Channel architecture are represented in the diagram below.

FC-0 Physical Layer

FC-0 specifies a laser light safety system to protect against injury. The laser power output exceeds the limits defined as safe and, consequently, if an open fiber condition occurs in the link, the receiver on the open connection detects the loss of signal. It then pulses its transmitting laser at a low duty cycle which both meets the safety requirements and serves as an open-connection signal to the other side of the link. In response to the open-connection pulse signal, the other side starts pulsing its transmit laser (onto the open connection) with the same low duty cycle. When the link is fixed, the receiver port that originally detected loss of signal now detects the pulsing laser from the other side. This signals that the link is reconnected and the two sides engage in a handshaking procedure that brings the link back up.

FC-1 Conversion Layer

The FC-1 layer converts transmitted data from its 8-bit binary representation into a 10-bit code word. The codes are engineered to guarantee as sufficient number of ‘1’ bits to allow for clock synchronization. This 8B/10B coding is carried over into Gigabit Ethernet. A complete discussion of 8B/10B may be found in the discussion of Gigabit Ethernet.

FC-2 Signaling Layer

FC-2 provides the signaling protocol for Fibre Channel. Flow control, framing, and sequencing of data is performed by the FC-2 layer. Fibre Channel provides for a variety of connectionless and reliable forms of data transfer including negotiation and policing of Quality Of Service in the form of bandwidth guarantees. The data transfer may take one of the following classes of service:

Class 1 Service
Dedicated connections with a guarantee of 100% bandwidth availability
Class 2 Service
Frame-switching in a connectionless mode with no guarantee of delivery, order, or latency. If frames are dropped then a “Busy” notification is returned to the sender.
Class 3 Service
Identical to Class 2 but no Busy frames are returned.
“Intermix” Class
A variant of Class 1 in which Class 1 frames are guaranteed a prespecified amount of bandwidth but Class 2 or Class 3 frames may be multiplexed onto the channel when sufficient bandwidth is available to share the link.

FC-3 Common Services

These services are Striping, Hunt groups, and Multicasting. Striping is the process of using multiple ports in parallel to increase the effective aggregate bandwidth of a connection between two devices. Hunt groups allow more than one port to respond the same alias address and thereby improves efficiency by decreasing the chance of reaching a busy port. Multicasting is the delivery of a single transmission to multiple destination ports.

FC-4 Interface Layer

The interfaces to upper layer protocol operations include:

Small Computer system Interface (SCSI)
Intelligent Peripheral Interface (IPI)
High Performance Parallel Interface (HIPPI) Framing Protocol
Internet Protocol (IP)
ATM Adaptation Layer for computer data (AAL5)
Link Encapsulation (FC-LE)
Single Byte Command Code Set Mapping (SBCCS)
IEEE 802.2

Gigabit Ethernet / 802.3x

The MAC Layer

Gigabit Ethernet allows connection between two devices in either a full-, or a half-duplex mode. In half-duplex mode, Gigabit Ethernet uses the CSMA/CD access method, just like it’s 10 and 100 Mb/s predecessors. In full-duplex mode, the MAC uses frame-based flow control as defined in the IEEE 802.3x standard.

The Reconciliation Sublayer and Gigabit Media Independent Interface (GMII)

These two layers connect the MAC sublayer to the physical (PHY) layer entities. The interconnection is via a 125 Mhz bus with an 8-bit data wide data path.

P1394.2, promoted primarily by Intel Corp., is a gigabit hybrid of IEEE P1394 (a.k.a. Firewire) and IEEE 1596-1992, Scalable Coherent Interface (SCI). This hybrid marries a modified Fiber Channel physical layer to extend the memory bus outside the CPU chassis. The host interface as proposed by Intel currently is not compliant with the original 1394 standard. But 1394.2 shares similarities with the first official version of 1394.0. It provides external devices with high-bandwidth and low-latency access to memory. The physical layer works at 1.25 GHz to provide 1 Gbit of data rate. Developers of Gigabit Ethernet and P1394.2 hope to use the same specification.

The topology of P1394.2 is of a logical ringlet. Up to 63 nodes can be connected to a ringlet, and up to 1,023 ringlets can be connected through bus bridges. The bus supports directed and multicast addressing. Flow control is provided in asynchronous mode via acknowledge packets that force retransmits, and through prenegotiation in isochronous operation. A transmission priority bit provides prioritized bandwidth.

Six bits address up to 63 nodes on a bus; addressing the 64th node causes a broadcast to all nodes. This scheme also supports multiple buses bridged together: 10 of the bits extend the reach across as many as 1,023 buses, for a total of 64,512 nodes (addressing the 1,024th bus automatically accesses the local bus).

Address specs

This leaves 48 bits for addresses in each node. Each address specifies a byte in a 32-bit word, so every node has a vast 256-Tbyte (262,144-Gbyte) address space.

P1394.2 supports two types of traffic: asynchronous transactions and isochronous streams.

Directed and multicast addressing are supported. Directed addressing is used for communicating between a requester and a responder and, true to the memory model, supports transactions such as read, write, lock and move. P1394.2 partitions large data transfers into 64-byte data packets, which are constrained to access 64-byte aligned blocks. The smaller size reduces the latency of other transactions and reduces the size of buffers in bridges.

Unlike 1394.0, which is now appearing in many consumer and computer products, P1394.2 still has not been submitted for official IEEE review.

The Physical Layer (PHY)

The standards for Gigabit Ethernet include operation over a variety of physical interconnection media including:

62.5 m m and 50 m m multimode fiber (MMF)
Single mode fiber (SMF)
Short copper links (25 meters)
Horizontal copper (100 meters)

A variety of PHY types have been specified in the Gigabit Ethernet draft standards. This includes support for a broad range of distances including various options optimized for important cost/distance design points. PHYs incorporated into the IEEE 802.3z draft standard include 1000BASE-CX, which supports interconnection of equipment clusters; 1000BASE-SX, which is targeted for horizontal building cabling; and 1000BASE-LX, which supports backbone building, cabling and campus interconnections.

A second task force, IEEE 802.3ab, is developing the standard for 1000BASE-T, which will provide 1 Gb/s Ethernet over 4-pair Category 5 unshielded twisted pair wiring for up to 100 meters. This task force has the special expertise in Digital Signal Processing (DSP) technology necessary to develop this technology. This group is building on the technical foundation developed by the 802.3z Task Force for 1000BASE-X, but is working on a longer timeline.

1000BASE-T is designed to work over Category 5 cabling that has been installed according to the ANSI/TIA/EIA568-A-1995 cabling standard. Higher grade cables (sometimes referred to as Category 6 and Category 7) are new, high-performance copper cabling technologies that are currently under development. If standardized, both cabling systems will provide higher bandwidth (Category 6 cabling may accommodate 200MHz and Category 7 may accommodate 600MHz) and better crosstalk and noise immunity. Standards defining the cable performance and installation practices for both Category 6 and Category 7 are now under development.

Physical Coding Sublayer (PCS)

The PCS provides the 8B/10B encoder/decoder functions (adopted from the FC-1 fiber channel specification).

The PCS examines each incoming octet passed down by the GMII and encodes it into a ten bit code group, this is referred to as 8B/10B encoding. Each octet is given a code group name according to the bit arrangement.

Each octet is broken down into two groups; the first group contains the three most significant bits (y) and the second group contains the remaining five bits (x). Each data code group is named /Dx.y/, where the value of x represents the decimal value of the five least significant bits and y represents the value of the three most significant bits. For example:

/D0.0/ = 000 00000

/D6.2/ = 010 00110

/D30.6/= 110 11101

There are also 12 special octets which may be encoded into ten bits. The PCS differentiates between special and data code words via a signal passed to it from the GMII. Special code words follow the same naming convention as data code words except they are named /Kx.y/ rather than /Dx.y/.

Motivation For 8B/10B Encoding

One of the motivations behind the use of 8B/10B encoding lies with the ability to control the characteristics of the code words such as the number of ones and zeros and the consecutive number of ones or zeros. Another motivation behind the use of 8B/10B encoding is the ability to use special code words which would be impossible if no encoding was performed.

Features And Operation Of 8B/10B Encoding

Every ten bit code group must fit into one of the following three possibilities:

Five ones and five zeros
Four ones and six zeros
Six ones and four zeros

This helps limit the number of consecutive ones and zeros between any two code groups.

A special sequence of seven bits, called a comma, is used by the PMA in aligning the incoming serial stream. The comma is also used by the PCS in acquiring and maintaining synchronization. The following bit patterns represent the comma:

0011111 (comma+)

1100000 (comma-)

The comma can not be transmitted across the boundaries of any two adjacent code groups unless an error has occurred. In this way the PMA is able to determine code group boundaries.

DC balancing is achieved through the use of a running disparity calculation. Running disparity is designed to keep the number of ones transmitted by a station equal to the number of zeros transmitted by that station. This should keep the DC level balanced halfway between the ‘one’ voltage level and the ‘zero’ voltage level. Running disparity can take on one of two values: positive or negative. In the absence of errors, the running disparity value is positive if more ones have been transmitted than zeros and the running disparity value is negative if more zeros have been transmitted than ones since power-on or reset. Running disparity is explained in much more detail in section 2 of PCS Fundamentals.

Ten-bit code groups can be categorized into data (Dx.y), special (Kx.y) and invalid code groups. Each code group has two possible encoding values based upon the current running disparity. Table 36-1 contains all of the valid encodings of data bits 00-FF. Table 36-2 contains the 12 valid special code groups. Invalid code groups are ten-bit code groups which haven’t been defined within tables 36-1 or 36-2, and those code groups which are received or transmitted with the wrong running disparity.

The table below gives examples of how four different eight-bit code groups are encoded based upon the current running disparity value. If the current running disparity is negative then the encoded value will come from the Current RD- column. If the current running disparity is positive then the encoded value will come from the Current RD+ column. It is possible for the ten bit code groups to be identical for both columns of a given code group. An example of this is D28.5 below.

The RD- column contains code groups which do not contain more zeros than ones. This is because in the absence of errors, the current negative running disparity value shows that more zeros have been transmitted than ones, and so a code group with more ones than zeros will have to be transmitted before another code group with more zeros than ones can be transmitted. The RD+ column contains no code groups which contain more ones than zeros for the opposite reasons. Consider the following examples of the effect of Running Disparity on the encoding value used to represent different data bytes.

Examples Of Eight-Bit Code Groups

Code Group Name	Actual Byte Being Encoded	RD- Encoding Value	RD+ Encoding Value	Effect on RD after Sending
D1.0	000 00001	011101 0100	100010 1011	same
D4.1	001 00100	110101 1001	001010 1001	flip
D28.5	101 11100	001110 1010	001110 1010	same
D28.5	101 11100	001111 1010	110000 0101	flip

Subclause 36.2.4.4 defines how the running disparity of transmitted and received ten-bit code groups is calculated. Each ten-bit code group is broken down into two sub-blocks, the first of which is the most significant six-bits and the second is the remaining four bits. The running disparity at the end of the six-bit sub-block is the running disparity at the beginning of the four-bit sub-block. The running disparity at the end of the four-bit sub-block is the running disparity at the end of the code group.

Running Disparity of Transmitted Code Groups

Although the initial state of Running Disparity is specified as Negative for a transmitter and is configurable for a receiver, a misconfiguration between two devices will be renegotiated during the link partner auto negotiation process.

Ordered Sets:

Notation

The notation used for ordered sets is similar to that used for code groups. Code groups are written as either /Dx.y/ or /Kx.y/. Ordered sets are written in the form of /XY/ where X is a letter and Y is sometimes used and contains a number. The defined ordered sets are: /C/, /C1/, /C2/, /I/, /I1/, /I2/, /R/, /S/, /T/ and /V/.

Definition

Consist of either one, two or four code groups

The first code group must be a special code group

The second code group must be a data code group.

Defined ordered sets

/C/ = Configuration (/C1/ or /C2/)

/C1/ = /K28.5/D21.5/config_reg[7:0]/config_reg[15:8]/

/C2/ = /K28.5/D2.2/config_reg[7:0]/config_reg[15:8]/

/K28.5/ is used as the first code group because it contains a comma which is a unique data pattern that was defined previously. The reception of this code group will not happen during a data packet unless there is a data error. This makes it very useful for use with very specific ordered sets such as Idle and Configuration.

Continuous repetition of ordered sets /C1/ alternating with /C2/. It is used to convey the 16-bit Configuration Register to the link partner.

/C1/ will flip the current running disparity after the transmission of /D21.5/. This is because /K28.5/ will flip the running disparity and /D21.5/ will maintain the current running disparity.

/C2/ will sustain the current running disparity after the transmission of /D2.2/. This is because both /K28.5/ and /D2.2/ flip the current running disparity.

/D21.5/ and /D2.2/ were chosen for their high bit transition density.

/I/ = IDLE (/I1/ or /I2/)

/I1/ = /K28.5/D5.6/

/I2/ = /K28.5/D16.2/

Transmitted continuously and repetitively whenever the GMII interface is idle (TX_EN and TX_ER are both inactive). It provides a continuous fill pattern to establish and maintain clock synchronization.

/I1/ is transmitted only if the current running disparity value is positive. In which case it is used to correct to the RD- state.

/I2/ is transmitted when the current running disparity value is negative. It is used to maintain the RD- state.

/D5.6/ and /D16.2/ were chosen for their high bit transition density.

/S/ = Start_of_Packet delimiter

/S/ = /K27.7/

Used to delineate the starting boundary of a data sequence.

/T/ = End_of_Packet delimiter

/T/ = /K29.7/

Used to delineate the ending boundary of a packet. The EPD is transmitted by the PCS following each de-assertion of TX_EN on the GMII, which follows the last data octet composing the FCS of the MAC packet.

The EPD consists of either /T/R/I/ or /T/R/R/.

/R/ = Carrier_Extend

/R/ = /K23.7/

Used to extend the duration of a carrier event.

Used to separate packets within a burst of packets.

Used to pad the last or only packet of a burst of packets so that the subsequent /I/ is aligned on an even-numbered code group boundary.

Used in the EPD.

/R/ is required within the EPD to meet the Hamming distance requirement for ending delimiters.

/V/ = Error_Propagation

/V/ = /K30.7/

The presence of Error_Propagation (or an invalid code group) on the medium denotes a collision artifact or an error condition. Transmitted upon the assertion of TX_EN and TX_ER from the GMII, or the assertion of TX_ER with the de-assertion of TX_EN while TXD<7:0> is not equal to 0F.

Physical Medium Attachment (PMA)

B) PCS Functions

The PCS can be broken down into four major functions:

Synchronization Process
Transmit Process
Receive Process
Auto-negotiation Process

Services provided to the GMII:

Encoding/Decoding of GMII data octets to/from ten-bit code groups (8B/10B) for communication within the underlying PMA.

Carrier Sense (CRS) and Collision Detect (COL) indications.

Managing the auto-negotiation process by informing it when it has lost synchronization of the received code_groups. Auto-negotiation can be instructed to restart if /C/ ordered_sets are received from the other station after the link has been established

The purpose of the PCS synchronization process is to verify that the PMA is correctly aligning code groups from the serial stream it is receiving. Synchronization is acquired upon the reception of three ordered_sets each starting with a code_group containing a comma. Each comma must be followed by an odd number of valid data code_groups. No invalid code_groups can be received during the reception of these three ordered_sets.

Once synchronization is acquired, the synchronization process begins counting the number of invalid code_groups received. That count is incremented for every code_group received that is invalid or contains a comma in an odd code_group position. That count is decremented for every four consecutive valid code_groups received (a comma received in an even code group position is considered valid). The count never goes below zero and if it reaches four, sync_status is set to FAIL.

The following section discusses exactly how the PCS acquires synchronization based upon the code groups being received by the underlying PMA.

Acquiring Synchronization

After powering on or resetting, the PCS synchronization process does not have synchronization and it is in the LOSS_OF_SYNC state. The synchronization process looks for the reception of a /COMMA/ (a code group containing a comma). It then assigns that code group to be an evenly aligned code group. The next code group received is assigned to be an odd code group. Code groups received thereafter are alternately assigned to even and odd code group alignments.

Thus, synchronization is achieved upon the reception of three ordered_sets each starting with a code_group containing a comma. Each comma must be followed by an odd number of valid data code_groups. No invalid code_groups can be received during the reception of these three ordered_sets.

Synchronization is acquired if three consecutive ordered sets which each begin with a /COMMA/ are received. The /COMMA/ must be followed by an odd number of valid /D/ code groups. The number of valid /D/’s following the /COMMA/ doesn’t have an upper limit. The synchronization process moves to the SYNC_ACQUIRED_1 state and sets the flag sync_status=OK when synchronization is acquired. If at any time prior to acquiring synchronization the PCS receives a /COMMA/ in an odd code group or if it receives an /INVALID/, a code group that isn’t found in the correct running disparity column of tables 36-1 or 36-2, the PCS synchronization process returns to the LOSS_OF_SYNC state.

The following section defines how the PCS can lose synchronization once it has been acquired.

Maintaining and Losing Synchronization

While in the SA1 state, the PCS synchronization process examines each new code_group.

If the code_group is a valid data code_group or contains a comma when rx_even is FALSE, the PCS asserts the variable cggood and the synchronization process toggles the rx_even variable. Otherwise, the PCS asserts the variable cgbad and the process moves to the SA2 state, toggles the rx_even variable, and sets the variable good_cgs to 0.

If the next code_group is a valid code_group which causes the PCS to assert the variable cggood, the process transitions to the SA2A state, toggles the rx_even variable, and increments good_cgs. Otherwise it continues on to the SA3 state.

While in the SA2A state, the process examines each new code_group. For each code_group which causes the PCS to assert cggood, the variable good_cgs is incremented. If good_cgs reaches three and if the next code_group received asserts cggood, the process returns to the SA1 state. Otherwise, the process transitions to the SA3 state.

Once in the SA3 state, the process may return to the SA2 state via the SA3A state using the same mechanisms that take the process from the SA2 state to the SA1 state. However, another invalid code_group or comma received when rx_even is TRUE will take the process to the SA4 state.

If the process fails to return to the SA3 state via the SA4A state, it will transition to LOS where sync_status is set to FAIL.

Thus, once sync_status is set to OK, the synchronization process begins counting the number of invalid code_groups received. That count is incremented for every code_group received that is invalid or contains a comma when rx_even is TRUE. That count is decremented for every four consecutive valid code_groups received (a comma received when rx_even is FALSE is considered valid). The count never goes below zero and if it reaches four, sync_status is set to FAIL.

The PMA performs the 10-bit serialize/deserialize functions (which is analogous to the ANSI 10-bit SERDES chip). It receives 10-bit encoded data at 125 MHz from the PCS and delivers serialized data to the PMD sublayer. In the reverse direction, the PMA receives serialized data from the PMD and delivers deserialized 10-bit data to the PCS.

The PCS and the PMA are both contained within the physical layer of the OSI reference model. The PCS and the Gigabit Media Independent Interface (GMII) communicate with one another via 8-bit parallel data lines and several control lines. The PCS is responsible for encoding each octet passed down from the GMII into ten bit code groups. The PCS is also responsible for decoding ten bit code groups passed up from the PMA into octets for use by the upper layers. The PCS also controls the auto-negotiation process which allows two separate gigabit devices to establish a link of which they are both capable of using.

The PMA is responsible for serializing each ten bit code group received from the PCS and sending the serialized data to the PMD. The PMA is responsible for deserializing every ten bit code group received from the PMD and passing it to the PCS. The PMA is also responsible for aligning the incoming serial data stream prior to passing ten bit code words up to the PCS.

Physical Medium Dependent Sublayer (PMD)

The PMD is the physical connection to the medium. This may consist of a 780 or 1300 nm wavelength optical driver for fiber connections and a transceiver for 25 and 100 meter maximum length transmission over Category 5 UTP or Twinax. For a CSMA/CD implementation the maximum network diameter should not exceed 200 meters regardless of media type.

Switched Gigabit Ethernet Considerations

In full-duplex mode, a Gigabit Ethernet switch can provide an aggregate 2 Gb/s bandwidth to each node on the network. Congestion at the switch is controlled through the 802.3x standard frame-based flow control mechanism. If a switch is experiencing congestion it can send a flow control message (a special 64 byte packet with a unique ID type) to the attached station to stop sending packets for a specified period of time or until the switch sends another flow control message indicating a zero delay time.

The Transmission Process

PCS Transmit Process

The PCS transmit process is responsible for the 8B/10B encoding of the data octets from the GMII.

The PCS transmit process sends the signal ‘transmitting’ to the Carrier Sense function of the PCS whenever the PCS transmit process is sending out data packets. The signal ‘receiving’ is sent out by the PCS receive process whenever it is receiving packets. The PCS transmit process is responsible for checking to see if the PCS is both sending and receiving data. If (receiving = 1 AND transmitting = 1) then the collision signal COL is sent to the GMII.

The Carrier Sense mechanism of the PCS reports Carrier events to the GMII via the CRS signal. CRS is asserted when (receiving OR transmitting =1) and is de-asserted when (transmitting AND receiving =0). (Note: CRS is asserted for repeaters when receiving=TRUE and de-asserted when receiving=FALSE.)

PCS Receive Process

The PCS receive process is responsible for decoding the incoming 10-bit code groups into their corresponding octets for transmission by the GMII. The receive process sends out the signal ‘receiving’ to the PCS transmit process and the Carrier Sense process for use in notifying the upper layers that there is carrier on the line or that a collision has occurred.

Carrier Detection is used by the MAC for deferral purposes and by the PCS transmit process for collision detection. A carrier event is signaled by the assertion of ‘receiving’. The signal ‘receiving’ is asserted upon the reception of a code group with at least a two bit difference from the /K28.5/ code group for code groups received in an even code group position.

/K28.5/ = 101 11100

/K27.7/ = 111 11011

False Carrier is defined as a carrier event packet not beginning with /S/. The receive process replaces the beginning code group with 0E and asserts RX_ER when a false carrier event is received.

The incoming code groups are checked for the existence of /INVALID/ data or mis-aligned /COMMA/s. The receive process also checks to see if an /I/ or /C/ ordered set is received prior to receiving the EPD. If this occurs then the packet has ended early because once the /S/ has been received the receive process expects to see a series of /D/ code groups followed by the /T/R/R/ or /T/R/I/. If /I/ or /C/ arrived prior to /T/ then an error has occurred.

The /K28.5/ code group is used at the beginning of the /C/ and /I/ ordered sets. While the PCS receive process is receiving a packet, it expects to see /T/ prior to seeing the next /K28.5/. A one bit error in some of the data code groups can cause them to become /K28.5/. If this one bit error happens to be followed by /D21.5/ or /D2.2/ then the beginning of a /C/ ordered_set will have been created. To account for the possibility that a one bit error has occurred, the PCS receive process checks the received code groups three code groups at a time. Doing this allows the receive process to treat /K28.5/ received in an odd code group position as an /INVALID/ code group rather than assuming a /C/ or /I/ ordered_set had been received. If /K28.5/ is received in an even code_group position and is followed by /D21.5/ or /D2.2/ the receive process assumes a /C/ ordered _set is being received. If these two code_groups are followed by two more /D/ code_groups, the receive process indicates to the auto-negotiation process that a /C/ ordered_set has indeed been received via the RX_UNITDATA.indicate(/C/) signal.

Physical Medium Attachment (PMA) sublayer Fundamentals

PMA Functions

The PMA sublayer is responsible for aligning the incoming serial stream of data. The PMA receive process is allowed to lose up to four 10-bit code groups in the alignment process. This is referred to as code slippage. The PMA receive process aligns the incoming code groups by finding /COMMA/s in the data stream and aligning the subsequent code groups according to the /COMMA/. Alignment by the PMA is essential if the PCS receive process is to operate correctly. If mis-aligned /COMMA/s and/or data are repeatedly sent by the PMA, synchronization cannot be obtained by the PCS.

The PMA is responsible for the serialization of the incoming 10-bit code groups from the PCS. This is accomplished through the use of a 10x clock multiplier. The PCS transmits the code groups in parallel at 125 MHz and the PMA sends them out bitwise at a rate of 1.25 GHz.

The PMA is also responsible for the deserialization of the data coming in from the PMD. The data arrives serially at a rate of 1.25 GHz and is transmitted in parallel to the PCS at a rate of 125 MHz.

The PMA is also responsible for recovering the clock from the incoming data stream. A Phased Lock Loop circuit is included in both the PMA receive and transmit circuitry.

Startup Protocol

The PCS start-up protocol for auto-negotiating devices can be divided into two components:

Synchronization Process
Auto-Negotiation Process

Auto-negotiating, and manually configured, devices are unable to interpret any received code_group until synchronization has been acquired. Once synchronization has been acquired the PCS is then able to receive and interpret the incoming code_groups.

Auto-negotiating devices begin with the variable xmit set to CONFIGURATION. Before data transmission can begin, auto-negotiating devices first need to receive three consecutive, consistent /C/ ordered_sets. Consistent /C/ ordered_sets must contain the same code_groups within the last two code_groups of each /C/ ordered_set, (ignoring the

ACK(nowledge bit). Once three consecutive, consistent /C/ ordered_sets have been received the auto-negotiation process looks for three consecutive, consistent /C/ ordered_sets which have the ACK bit set to 1. After a period of time the variable xmit is set to IDLE at which point the device begins transmitting /I/ ordered_sets After a specific amount of time xmit is set to DATA and the auto-negotiating device is then able to transmit and receive data, assuming the partner device also received three consecutive, consistent /C/ ordered_sets followed by three consecutive, consistent /C/ ordered_sets with the ACK bit set to 1.

Conclusion

Whether a device is an auto-negotiating device or a manually configured device, synchronization is the heart of the entire PCS transmit and receive process. The PMA is responsible for monitoring the incoming serial bit stream and aligning it into code_groups. If the PMA fails to recognize and properly align commas, the PCS will be unable to receive any code_groups from its link partner.

Manually configured devices require properly aligned commas in order to acquire and maintain synchronization from the incoming /I/ ordered_sets. Auto-negotiating devices have the added burden of needing to recognize /C/ ordered_sets and differentiating them from /I/ ordered_sets.

Prior to transmitting data, auto-negotiating devices are required to receive a certain number of consecutive, consistent /C/ ordered_sets. This is essential in order to ensure that both auto-negotiating devices will be communicating on the same terms.

Once synchronization has been acquired and the communication protocol has been determined between the link partners, the devices are allowed to transmit data to one another. To help ensure that the devices will be able to correctly interpret the received code_groups, an 8B/10B encoding scheme is utilized. Such a coding scheme along with the use of running disparity allows for a predictable and controlled set of code_groups that will be transmitted across the channel. Because of the number of extra code_groups created by the encoding scheme, special control words can be implemented for use in the interpacket gap, start-of and end-of packet delimiters, and configuration ordered_sets among others.

Initially, Gigabit Ethernet will be deployed as a fast backbone for 100-Mbit switches. Dataquest Inc. forecasts that the volume in 1999 is expected to be about 2 million nodes.