Php文档 Php问答行业资讯 Php论坛 Php手册 Php博客

游戏榜单

软件榜单

关闭导航

热搜榜

热门下载

热门标签

关闭搜索

php爱好者> php文档>XScale alignment

XScale alignment

时间：2009-03-31 来源：Jeffoery

XScale alignment收藏

转载from:

http://lecs.cs.ucla.edu/wiki/index.php/XScale_alignment

XScale alignment

From CSL Wiki

Jump to: navigation, search

[hide]

1 The Problem
- 1.1 Why silently ignoring unaligned memory accesses can be a problem
- 1.2 An Example
2 Solutions
- 2.1 Outline
- 2.2 Identifying alignment problems
  - 2.2.1 Use gcc
  - 2.2.2 Use the kernel
- 2.3 Rewrite your code
  - 2.3.1 Add padding
  - 2.3.2 Just rewrite the code (best solution!)
- 2.4 Use the packed attribute
- 2.5 Use the aligned attribute (second best solution!)
- 2.6 Have the kernel find the problem for you
  - 2.6.1 0 - ignore
  - 2.6.2 1 - warn
  - 2.6.3 2 - fixup
  - 2.6.4 3 - fixup+warn
  - 2.6.5 4 - signal
  - 2.6.6 5 - signal+warn
3 Beyond this document

[edit]

The Problem

A nice explanation of how arm/xscale only does word accesses for read and writes, and how things get mixed up when you try to do loads and stores with pointers not on the word boundaries.

A nice intro about how certain programing styles lead to situations like below.

The below example is a little contrived, but in certain styles of programming (embedded, network), it can be common.

This document is a work in progress... any and all suggestion/criticism are welcome (send them to mlukac at lecs.cs.ucla.edu). Feel free to make minor changes in wording and a little in order, but if you want to make major changes like removing sections or seriously reodering stuff, please warn me before hand.

[edit]

Why silently ignoring unaligned memory accesses can be a problem

If neither you or the OS is fixing unaligned memory accesses, this is the kind of behavior you are likely to see:

If the contents of memory look like this:

 
memory address 0 1 2 3 4 5 .....
(bytes) +----+----+----+----+----+
memory contents |0x0a|0x0b|0x0c|0x0d|0x0e| .....
 +----+----+----+----+----+

If you do a 32-bit wide read starting from byte 1, you want to see 0xe0d0c0b on a little endian processor, the contents of the 4 contiguous bytes starting from address 1.

However, what you will actually read is 0xa0d0c0b on a little endian processor, the contents of the 32-bit aligned memory, starting from address 0.

In other words, the problem will look like memory corruption (the CPU will not return the data which is 'really in' the address you specify), and if you are reading pointers from unaligned memory, it can cause segmentation faults later on if that pointer is dereferenced.

[edit]

An Example

#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>

#define DATA_SIZE 20

typedef struct _bar {
 int8_t data1;
 int8_t data2[DATA_SIZE];
} bar_t;

typedef struct _foo {
 char *b;
} foo_t;

int main()
{
 bar_t bar = {};

// foo points to a chunk of memory that is *not* 32-bit aligned
 foo_t *foo = (foo_t *)(bar.data2);

// good_foo is a 'valid' pointer, pointing to a chunk of 32-bit aligned memory
 char *good_foo = (char *) malloc(sizeof(char));

// assign b to good_foo (so b is pointing to valid, aligned memory, but it is not itself 32-bit aligned) 
 foo->b = good_foo;

printf("\n");
 printf("sizeof(bar)=%d, sizeof(foo)=%d\n",
 sizeof(bar_t), sizeof(foo_t));
 printf("\n");
 printf("bar is at mem location %p\n", &bar);
 printf("bar.data2 is at mem location %p\n", bar.data2);
 printf("foo is at mem location %p\n", foo);
 printf("... so foo is potentially now not on a word (4 byte) boundary\n\n");
 printf("foo->b=%p should equal good_foo=%p\n", foo->b, good_foo);
 printf("If they are not, dereferecing foo->b would most likely cause a segfault\n");

printf("\n\n");
 return 0;
}

[edit]

Solutions

[edit]

Outline

Identifying alignment problems
Rewrite your code
Use the packed attribute
Use the aligned attribute
Have the kernel fix the alignment problem<

[edit]

Identifying alignment problems

[edit]

Use gcc

gcc can help you identify alignment problems. I have not tested it extensivly, but it seems to work for the relativly simple alignment problems like the above example. Add a -Wcast-align to your compile flags will tell gcc to print a warning whenever it thinks there will be an alignment problem. From the gcc man page:

 -Wcast-align
 Warn whenever a pointer is cast such that the required alignment of the target is 
 increased. For example, warn if a "char *" is cast to an "int *" on machines where 
 integers can only be accessed at two- or four‐yte boundaries.

Other flags that might be helpful:

 -Wpadded
 Warn if padding is included in a structure, either to align an element of the structure 
 or to align the whole structure. Sometimes when this happens it is possible to 
 rearrange the fields of the structure to reduce the padding and so make the structure 
 smaller.

 -Wpacked
 Warn if a structure is given the packed attribute, but the packed attribute has no 
 effect on the layout or size of the structure. Such structures may be mis-aligned 
 for little benefit. For instance, in this code, the variable "f.x" in "struct bar" 
 will be misaligned even though "struct bar" does not itself have the packed attribute:
 struct foo {
 int x;
 char a, b, c, d;
 } __attribute__((packed));
 struct bar {
 char z;
 struct foo f;
 };

[edit]

Use the kernel

The arm-linux kernel can also identify alignment problems. Every time there is any unaligned access to memroy, there is an alignment trap. The arm-linux kernel allows you to specify how to deal with the alignment trap. The default is to silently ignore the unaligned access. Other options are to have the kernel print when there is an unaligned access, send a signal to the process (typically this will kill the process), or to fix the alignment problem (at the cost of a few more instructions). The printing or signaling methods can be used to help identify the problems. For more information on how to use these kernel features, please see the the below sections.

[edit]

Rewrite your code

[edit]

Add padding

One quick fix is to manually pad your structs to align all the important data members on the 4 byte boundaries. This way, when you do assignment and cast the structs around, you will always be using correctly aligned addresses. You must keep in mind though that this is not the best possible fix beacause it will become an inconvienience: as your code develops, the structs and the meaning/use of the structs may change, so the padding will have to change.

Padding can be applied in the example above. In the bar struct, 3 int8_t's can be added after the first data members as follows:

typedef struct _bar {
 int8_t data1;
 int8_t __pad1[3];
 int8_t data[DATA_SIZE];
} bar_t;

This will force data2 to sit on the 4 byte boundary. So, the assignment to the foo data structure will not have any alignment problems.

[edit]

Just rewrite the code (best solution!)

A more difficult but long term solution is rewrite your code so that you do not need to cast structs around. Well-designed network protocols (like the IP stack) already do this; all the headers are carefully aligned on 32-bit boundaries. However, this is not always possible; for example custom protocols or protocols designed for 8 or 16-bit architectures will have no alignment issues on those architectures but can have problems on 32-bit platforms. And of course a rewrite is sometimes not practical, because the amount of code that needs to be rewritten (and then tested) can be very large. It is always easier to design things with alignment in mind (like IP) than to try to rectify the problem later. -- add something more about this not always being possible without large rewrites and something about the usual network programing styles usually have the patterns that lead to alignment problems.

The arm-linux kernel has a feature that can help you identify any alignment issues that you may have. The kernel can print a message on the stargate serial console whenever an unaligned access is performed. For instructions on how to do this, please see the sections below.

[edit]

Use the packed attribute

By default, gcc pads structs to align the data members on 4 byte boundaries. This means that structs may appear larger than the byte count of the data members. For instance, if you create a struct with an int8_t and a char*, sizeof will return 8 bytes for the size of the struct and not 5. This is beacuse padding was added between the two data members. gcc will also pad out struct if they fall short of a 4 byte boundary, for instance when running the example code above the size of bar is 24, not 21, even though there is no padding between data1 and data2 members. There is no padding between data1 and data2 beacuse the type of the data2 array is only one byte long. Because of this, the auto padding done by gcc is not good enough to fix the alignment problems.

The packed attribute which can be added to structs provides two usefull features, one of which helps solve the alignment problems. The first is more relative for cross platform network programing: the packed attribute prevents gcc from adding any padding to the structs, essentially preventing gcc from attempting to fix the alignment problems with padding. The second feature is that gcc will add in the extra code to properly deal with the misalligned memory accesses that are created by not introducting padding to attempt to align the data members. If applied to the correct structs, this can fix the alignment problems at the cost of extra instructions introduced by gcc for each access to the data members of that struct.

To fix the example code above, the foo_t struct definition would now look like:

typedef struct _foo {
 char *b;
} __attribute__ ((packed)) foo_t;

Since this struct has only one data member, the size remains the same, however for any memory accesses that use this struct, the extra instructions that gcc added will rotate the bytes correctly so there are no alignment proglems.

Using the packed attribute does have the drawback of adding extra instruction to every access to the data members of the packed structs.

You can actually see the extra instructions by telling gcc to output the assembly for the example code above with the -S flag.

A common question is, should I just use the packed attribute everywhere? I do not know the answer to this besides saying that it may slow down your program. Maybe someone else can fill us in.

[edit]

Use the aligned attribute (second best solution!)

The aligned attribute is added to individual data members to tell gcc to add enough padding before the data member to make it sit on the specified word boundary. To explain this better, consider the example code above. To fix the alignment problem using the aligned attribute, we would add the alinged attribute with a multiple of 4 as the parameter to the bar struct. So, it would look like:

typedef struct _bar {
 int8_t data1;
 int8_t data2[DATA_SIZE] __attribute__ ((aligned(4)));
} bar_t;

This is equivalent to adding 3 bytes of padding between the data1 and data2 members as in a previous example, except that gcc will automatically do it for you so it removes the management overhead.

The paramter to the aligned attribute specifies which byte boundary the data member should be padded to. For arm based processors it only makes sense to use 4 (the arm word size is 32 bits), since anything smaller could lead to more alignment issues and anything larger will waste memory.

If rewriting your code is not possible, this is an ideal solution. This solution also does not add extra instructions to your code as the packed attribute does.

[edit]

Have the kernel find the problem for you

The arm-linux kernel provides a proc interface which provides information about the number of unaligned accesses as well as the ability to change the kernel behavior on unaligned accesses. The proc file is:

 /proc/cpu/alignment

Simply 'cat'ing the file will give you the number of unaligned accesses by user space programs, by the kernel, and the number of the types of unaligned accesses:

stargate-79:~# cat /proc/cpu/alignment 
User: 30
System: 1183781
Skipped: 0
Half: 755121
Word: 428660
Multi: 0
User faults: 0 (ignored)
stargate-79:~#

Any time there is unaligned access to memory, an alignment trap happens. The alignment proc file allows you to specify how the kernel behaves when an alignment trap happens. To set the different behaviors below, just echo the number to the alignment proc file. For instance, if I wanted to set the kernel to just give a warning whenever there is an alignment problem, I would type the following command at the console:

echo 1 > /proc/cpu/alignment

The following is a description of vareious bahviors the kernel has to deal with alignment traps:

[edit]

0 - ignore

This is the default behavior compiled in the arm-linux kernel. All alignment traps are ignored by the kernel, and no attempt is made to notify the user that there is a problem except for keep track of the number of traps.

[edit]

1 - warn

In this mode, the kernel prints an error that there was an alignment trap. The error typically comes to the serial console, but can be directed to various log files using syslogd or a similar program. The error message is only usefull to identify if your code actually has alignment issues.

Alignment trap: align (32124) PC=0x00008478 Instr=0xe5823000 Address=0xbffffc85 Code 0xffffffff
Alignment trap: align (32124) PC=0x000084dc Instr=0xe5931000 Address=0xbffffc85 Code 0x00

[edit]

2 - fixup

In this mode, the kernel fixes the alignment for all unaligned accesses. This does introduc extra overhead just like the packed atribute does,

[edit]

3 - fixup+warn

This is equivalent to having mode 1 and mode 2 on at the same time, so the kernel fixes the alignment and prints the alignment trap to the console.

[edit]

4 - signal

In this mode, the kernel sends a SIGBUS signal to the processes. Unless you have implemented a signal handler, your process will be killed and 'Bus Error' will be printed to the console. Combining this mode with running a debugger is the most usefull for finding the unaligned accesses in your code.

[edit]

5 - signal+warn

This is equivalent to having mode 1 and mode 4 on at the same time, so the kernel sends a signal to the process and prints the alignment trap to the console.

If you are interested, the kernel code that creates the proc file as well as the alignment trap handler is located in arch/arm/mm/alignment.c

[edit]

Beyond this document

Hopefully after reading this document you understand the alignment issues on the XScale platorm and have an idea

There are some other things that may be usefull and or interesting to look into. In particular gcc has some flags which may help you debug problems, redesign or optimize your code are: -Wpacked -Wpadded -malignment-traps -mno-alignment-traps