Php文档 Php问答行业资讯 Php论坛 Php手册 Php博客

游戏榜单

软件榜单

关闭导航

热搜榜

热门下载

热门标签

关闭搜索

php爱好者> php文档>《TCP/IP Sockets 编程》笔记5

《TCP/IP Sockets 编程》笔记5

时间：2010-09-11 来源：龍蝦

第5章发送和接收数据

There is nomagic: any programs that exchange information must agree on how that information will be encoded—represented as a sequence of bits—as well as which program sends what information when, and how the information received aﬀects the behavior of the program. This agreement regarding the form and meaning of information exchanged over a communication channel is called a protocol .

Most application protocols are deﬁned in terms of discrete messages made up of sequences of ﬁelds. Each ﬁeld contains a speciﬁc piece of information encoded as a sequence of bits.

平台(platform)的解释：

By “platform” in this book we mean the combination of compiler, operating system, and hardware architecture. The gcc compiler with the Linux operating system, running on Intel’s IA-32 architecture, is an example of a platform.

5.1.1 Sizes of Integers

确定平台上整型的大小。

sizeof()需要注意的两件事：

第一，sizeof(char)总是1。因此，在C语言里，一个"byte"就是一个char类型变量占据的空间，sizeof()的单位其实是sizeof(char)；

第二，预定义常量CHAR_BIT指示表示一个char类型的值需要多少bit。

Here are a couple of things to note about sizeof(). First, the language speciﬁes that sizeof(char) is 1—always. Thus in the C language a “byte” is the amount of space occupied by a variable of type char, and the units of sizeof() are actually sizeof(char). But exactly how big is a C-language “byte”? That’s the second thing: the predeﬁned constant CHAR_BIT tells how many bits it takes to represent a value of type char —usually 8, but possibly 10 or even 32.

The C99 language standard speciﬁcation oﬀers a solution in the form of a set of optional types: int8_t, int16_t, int32_t, and int64_t (along with their unsigned counterparts uint8_t, etc) all have the size (in bits) indicated by their names. On a platform where CHAR_BIT is eight, these are 1, 2, 4 and 8 byte integers, respectively. Although these types may not be implemented on every platform, each is required to be deﬁned if any native primitive type has
the corresponding size. (So if, say, the size of an int on the platform is 32 bits, the “optional” type int32_t is required to be deﬁned.

5.1.2 Byte Ordering

There are two obvious choices: start at the “right” end of the number, with the least signiﬁcant bits—so-called little-endian order—or at the left end, with the most signiﬁcant bits— big-endian order. (Note that the ordering of bits within bytes is, fortunately, handled by the implementation in a standard way.)

Most protocols that send multibyte quantities in the Internet today use big-endian byte order; in fact, it is sometimes called network byte order. The byte order used by the hardware (whether it is big- or little-endian) is called the native byte order.

Addresses and ports that cross the Sockets API are always in network byte order.

5.1.3 Signedness and Sign Extension

Given k bits, we can represent values in the range −2k−1 through 2k−1 − 1 using two’s-complement. Note that the most signiﬁcant bit (msb) tells whether the value is positive (msb=0) or negative (msb=1).On the other hand, a k-bit unsigned integer can encode values in the range 0 through 2k − 1 directly.

The signedness of the integers being transmitted should be determined by the range of values that need to be encoded.

Some care is required when dealing with integers of diﬀerent signedness because of sign extension.

1.When a signed value is copied to any wider type, the additional bits are copied from the sign (i.e., most signiﬁcant) bit.

当把有符号的值复制到任意更宽的类型时，将从符号位(即最高有效位)复制到额外的位。

2.The value of an unsigned integer type is—reasonably enough—not sign-extended.

One ﬁnal point to remember: when expressions are evaluated, values of variables are widened (if needed) to the “native” ( ) size before any computation occurs. Thus, if you add the values of two variables together, the type of the result will be int, not char.

size_t fwrite(const void * ptr, size_t size, size_t nmemb, FILE * stream)
size_t fread(void * ptr, size_t size, size_t nmemb, FILE * stream)

Note that the sizes are given in units of sizeof(char), while the return values of these methods are the number of objects read/written, not the number of bytes.

The C language rules for laying out data structures include speciﬁc alignment requirements, including that the ﬁelds within a structure begin on certain boundaries based on their type. The main points of the requirements can be summarized as follows:
1. Data structures are maximally aligned. That is, the address of any instance of a structure (including one in an array) will be divisible by the size of its largest native integer ﬁeld.
2. Fields whose type is a multibyte integer type are aligned to their size (in bytes). Thus, an int32_t integer ﬁeld’s beginning address is always divisible by four, and a uint16_t integer ﬁeld’s address is guaranteed to be divisible by two.

To enforce these constraints, the compiler may add padding between the ﬁelds of a structure.

针对布置数据结构，C语言的规则包含特定的对齐要求，结构中的字段基于其类型开始于特定的边界。要点可以概括如下：

1.数据结构是最大化对齐的。一个结构任何实例(包括数组中的元素)的地址，可以被结构中最大整型字段的大小整除。

2.多字节整型字段与它们的大小对齐。因此，一个int32_t整型字段的开始地址总是能被4整除，一个unt16_t整型字段的地址则保证能被2整除。

Strings and Text

The C99 extensions standard deﬁnes a type wchar_t (“wide character”) to store characters from charsets that may use more than one byte per symbol. In addition, various library functions are deﬁned that support conversion between byte sequences and arrays of wchar_t, in both directions. (In fact, there is a wide character string version of virtually every library function that operates on character strings.) To convert back and forth between wide strings
and encoded char (byte) sequences suitable for transmission over the network, we would use the wcstombs() (“wide character string to multibyte string”) and mbstowcs() functions.

#include <stdlib.h>
size_t wcstombs(char *restrict s, const wchar_t *restrict pwcs, size_t n);
size_t mbstowcs(wchar_t *restrict pwcs, const char *restrict s, size_t n);

The bad news is that C99’s wide character facilities are not designed to give the programmer explicit control over the encoding scheme. Indeed, they assume a single, ﬁxed charset deﬁned according to the “locale” of the platform. Although the facilities support a variety of charsets, they do not even provide the programmer any way to learn which charset or encoding is in use. In fact, the C99 standard states in several situations that the eﬀect of changing the locale’s charset at runtime is undeﬁned. What this means is that if you want to implement a protocol using a particular charset, you’ll have to implement the encoding yourself.

Constructing, Framing, and Parsing Messages

A clean design further decomposes the process into two parts. The ﬁrst is concerned with framing, or marking the boundaries of the message, so the receiver can ﬁnd it in the stream. The second is concerned with the actual encoding of the message, whether it is represented using text or binary data. Notice that these two parts can be independent of each other, and in a well-designed protocol they should be separated.

Two general techniques enable a receiver to unambiguously ﬁnd the end of the message:

1. Delimiter-based: The end of the message is indicated by a unique marker, a particular, agreed-upon byte (or sequence of bytes) that the sender transmits immediately following the data.
2. Explicit length: The variable-length ﬁeld or message is preceded by a length ﬁeld that tells how many bytes it contains. The length ﬁeld is generally of a ﬁxed size; this limits the maximum size message that can be framed.

本文来自CSDN博客，转载请标明出处：http://blog.csdn.net/custa/archive/2010/07/12/5728195.aspx