Voice over IP has been implemented in various ways using both proprietary and open protocols and standards. Examples of technologies used to implement Voice over Internet Protocol include:
• H.323
• IMS
• SIP
• RTP
A notable proprietary implementation is the Skype network. Other examples of specific implementations and a comparison between them are available in Comparison of VoIP software.
An open standard is a standard that is publicly available and has various rights to use associated with it, and may also have various properties of how it was designed (e.g. open process).
The terms “open” and “standard” have a wide range of meanings associated with their usage. The term “open” is usually restricted to royalty-free technologies while the term “standard” is sometimes restricted to technologies approved by formalized committees that are open to participation by all interested parties and operate on a consensus basis.
The definitions of the term “open standard” used by academics, the European Union and some of its member governments or parliaments such as Denmark, France, and Spain preclude open standards requiring fees for use, as do the New Zealand and the Venezuelan governments. On the standard organisation side, the W3C ensures that its specifications can be implemented on a Royalty-Free (RF) basis.
Many definitions of the term “standard” permit patent holders to impose “reasonable and non-discriminatory” royalty fees and other licensing terms on implementers and/or users of the standard. For example, the rules for standards published by the major internationally recognized standards bodies such as the IETF, ISO, IEC, and ITU-T permit their standards to contain specifications whose implementation will require payment of patent licensing fees. Among these organizations, only the IETF and ITU-T explicitly refer to their standards as “open standards”, while the others refer only to producing “standards”. The IETF and ITU-T use definitions of “open standard” that allow “reasonable and non-discriminatory” patent licensing fee requirements.
The term “open standard” is sometimes coupled with “open source” with the idea that a standard is not truly open if it does not have a complete free/open source reference implementation available.
Open standards which specify formats are sometimes referred to as open formats.
Many specifications that are sometimes referred to as standards are proprietary and only available under restrictive contract terms (if they can be obtained at all) from the organization that owns the copyright on the specification. As such these specifications are not considered to be fully Open.
H.323 is an umbrella Recommendation from the ITU Telecommunication Standardization Sector (ITU-T) that defines the protocols to provide audio-visual communication sessions on any packet network. The H.323 standard addresses call signaling and control, multimedia transport and control, and bandwidth control for point-to-point and multi-point conferences.
It is widely implemented by voice and videoconferencing equipment manufacturers, is used within various Internet real-time applications such as GnuGK and NetMeeting and is widely deployed worldwide by service providers and enterprises for both voice and video services over Internet Protocol (IP) networks.
It is a part of the ITU-T H.32x series of protocols, which also address multimedia communications over Integrated Services Digital Network (ISDN), Public Switched Telephone Network (PSTN) or Signaling System 7 (SS7), and 3G mobile networks.
H.323 Call Signaling is based on the ITU-T Recommendation Q.931 protocol and is suited for transmitting calls across networks using a mixture of IP, PSTN, ISDN, and QSIG over ISDN. A call model, similar to the ISDN call model, eases the introduction of IP telephony into existing networks of ISDN-based PBX systems, including transitions to IP-based Private Branch eXchanges (PBXs).
Within the context of H.323, an IP-based PBX might be an H.323 Gatekeeper or other call control element that provides service to telephones or videophones. Such a device may provide or facilitate both basic services and supplementary services, such as call transfer, park, pick-up, and hold.
While H.323 excels at providing basic telephony functionality and interoperability, H.323’s strength lies in multimedia communication functionality designed specifically for IP networks.
The Session Initiation Protocol (SIP) is a signaling protocol, widely used for controlling multimedia communication sessions such as voice and video calls over Internet Protocol (IP). Other feasible application examples include video conferencing, streaming multimedia distribution, instant messaging, presence information and online games. The protocol can be used for creating, modifying and terminating two-party (unicast) or multiparty (multicast) sessions consisting of one or several media streams. The modification can involve changing addresses or ports, inviting more participants, adding or deleting media streams, etc.
SIP was originally designed by Henning Schulzrinne and Mark Handley starting in 1996. The latest version of the specification is RFC 3261[1] from the IETF Network Working Group.[2] In November 2000, SIP was accepted as a 3GPP signaling protocol and permanent element of the IP Multimedia Subsystem (IMS) architecture for IP-based streaming multimedia services in cellular systems.
The SIP protocol is a TCP/IP-based Application Layer protocol. SIP is designed to be independent of the underlying transport layer; it can run on Transmission Control Protocol (TCP), User Datagram Protocol (UDP), or Stream Control Transmission Protocol (SCTP).[3] It is a text-based protocol, incorporating many elements of the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP),[4] allowing for direct inspection by administrators.
The Real-time Transport Protocol (RTP) defines a standardized packet format for delivering audio and video over the Internet. It was developed by the Audio-Video Transport Working Group of the IETF and first published in 1996 as RFC 1889, and superseded by RFC 3550 in 2003.
RTP is used extensively in communication and entertainment systems that involve streaming media, such as telephony, video teleconference applications and web-based push to talk features. For these it carries media streams controlled by H.323, MGCP, Megaco, SCCP, or Session Initiation Protocol (SIP) signaling protocols, making it one of the technical foundations of the Voice over IP industry.
RTP is usually used in conjunction with the RTP Control Protocol (RTCP). While RTP carries the media streams (e.g., audio and video) or out-of-band signaling (DTMF), RTCP is used to monitor transmission statistics and quality of service (QoS) information. When both protocols are used in conjunction, RTP is usually originated and received on even port numbers, whereas RTCP uses the next higher odd port number.
“This article is brought to you by Gus Woltmann”.

