LCS hdu3795 Diff
时间:2011-03-14 来源:while(2.0)
http://acm.hdu.edu.cn/showproblem.php?pid=3795
Problem Description
In computing, diff is a file comparison utility that outputs the differences between two files. It is typically used to show the changes between a file and a former version of the same file. Diff displays the changes made per line for text files.
The operation of diff is based on solving the longest common subsequence problem which is as follows: Given two sequences A = a1, a2, ..., aM, and B = b1, b2, ..., bN, find the length, k, of the longest sequence C = c1, c2, ..., ck such that C is a subsequence of both A and B. As an example, if
A = d, y, n, a, m, i, c and B = p, r, o, g, r, a, m, m, i, n, g
then the longest common subsequence is a, m, i( {3,4,5} from A, {5,6,8} from B) and has length 3.
You may find {5,7,8} from B is also a, m, i, but {5,6,8} is lexicographic smaller, so we choose the former one. We always choose the lexicographic smallest one.
From the longest common subsequence it's only a small step to get diff-like output:
dyn-progr+m+c-ng+
where '-' means a deletion from and '+' means an addition to the first string.
Now you are supposed to simulate the diff operation.
Input
Your program must read test cases from standard input.
The input file consists of several test cases. Each case contains the contents of two files. The case starts with two non-negative integers N and M (both <= 50), then followed by N+M lines, each contains a string of no more than 80 characters. The first N lines are the contents of the first file, while the second M lines are of the second file.
The input is finished by a negative N.
Output
For each test case, your program must output to standard output. If there is no difference found between the two files, print in a line "No difference found". If there is nothing in common between the two files, print in a line "Totally different". Otherwise, first print in a line the length of the longest sequence. Then for each line of the first file, print the line number, and output the differences in the diff-like format line by line, as shown by the sample output.
这题据说是浙大去年保研题,最长公共子列,动态规划。
用一个二维数组保存最长公共子列的长度,len[i][j]表示的是第一个字符串第i位之前(不包括),和第二个字符串第j位之前的最长公共子列的长度。因此一开始len[0][0]=0,因为之前没有任何字符。矩阵的第一行和第一列也是0,因为对于其中一个字符串,所要的是第0位之前,是空串。然后左上往右下递推,如果a[i]=b[j],也就是最长公共子列的长度增加一。否则就看len[i][j-1]和len[i-1][j]哪个大就是哪个。另外这题需要求到最长公共子列,不仅是长度,除了len矩阵之外,再用一个path矩阵记录下DP过程中的走向。path[i][j]为0表示当时a[i] == b[j],是从(i-1,j-1)走过来的。a[i] != b[j]的时候又两个可能的来源,(i-1,j)和(i,j-1),分别几位-1和1。DP完了看矩阵右下角那个元素,然后依次倒着走回来就得到了最长公共子列。
这题真正厉害是输出,我的理解是以前一段文字为准,每一行另起一个line #,如果碰到第二段文本换行也要换行,并且第二段没匹配完的字符串要继续和第一段下一行的进行比较。我是把所有行贴到一起,对两个长串求公共子列,然后输入的时候记下换行符的位置,但是换行符本身不参加LCS,用队列省了两个循环变量。之后输出,先求出公共子列,由于是反序,用一个栈倒过来。每次循环输出非公共字符,末尾加上+-号,为了方便判断换行,将要输出的串暂时存起来,碰到换行就输出,然后把temp清空,如果连续两个换行就会忽略掉后一个。如果是第一段文字导致换行,还输出一个line #行号。
output函数处理temp,根据栈中保存的公共字符,把所有字符写到temp,直到碰到公共字符。文本用字符串不用string也是为了最后可以给output函数传入末尾那个'\0'作为结束符,不用区别对待。
代码丑死了,不想搞了以后再说吧:
#include <stdio.h> #include <string> #include <string.h> #include <vector> #include <stack> #include <queue> using namespace std; static const int MAX_LEN = 80 * 50 + 1; char txt[2][MAX_LEN]; int len[MAX_LEN][MAX_LEN]; int path[MAX_LEN][MAX_LEN]; int LCS(const char* a, const char* b) { int la = strlen(a); int lb = strlen(b); len[0][0] = 0; path[0][0] = -1; for (int i = 1; i <= la; ++i) len[i][0] = 0, path[i][0] = -1; for (int j = 1; j <= lb; ++j) len[0][j] = 0, path[0][j] = -1; for (int i = 1; i <= la; ++i) { for (int j = 1; j <= lb; ++j) { if (a[i - 1] == b[j - 1]) len[i][j] = len[i - 1][j - 1] + 1, path[i][j] = 0; else if (len[i - 1][j] > len[i][j - 1]) len[i][j] = len[i - 1][j], path[i][j] = -1; // from above else len[i][j] = len[i][j - 1], path[i][j] = 1; // from left } } return len[la][lb]; } queue<int> endpos[2]; stack<char> common; char mark[] = "-+"; void init () { txt[0][0] = txt[1][0] = 0; } void read(int i, int n) { int len = 0; while (n--) { char line[100]; gets(line); len += strlen(line); strcat(txt[i], line); if (n != 0) endpos[i].push(len); } } void buildstack() { int len1 = strlen(txt[0]); int len2 = strlen(txt[1]); int x = len1; int y = len2; while (x > 0 && y > 0) { if (path[x][y] == 0) { common.push(txt[0][x - 1]); --x, --y; } else if (path[x][y] == -1) --x; else --y; } } string temp; void addMark(int i) { if (temp != "" && temp[temp.length()-1] != '+' && temp[temp.length()-1] != '-') temp += mark[i]; } int line = 2; void addNewline(int i) { if (temp != "" && temp[temp.length()-1] != '\n') { addMark(i); printf ("%s\n", temp.c_str(), mark[i]); if (i == 0) printf ("line #%d\n", line++); temp = ""; } } void checkend(int i, int index) { if (!endpos[i].empty() && index == endpos[i].front()) { endpos[i].pop(); addNewline(i); } } void output(int i, int& index, char end) { const char* str = txt[i]; if (str[index] == end) { ++index; checkend(i, index); return; } while (str[index] != end) { temp += str[index]; ++index; checkend(i, index); } ++index; addMark(i); checkend(i, index); return; } int main() { while (1) { int n, m; scanf ("%d", &n); if (n < 0) break; scanf ("%d", &m); getchar(); init (); read(0, n); read(1, m); if (strcmp(txt[0], txt[1]) == 0) { printf ("No difference found\n"); continue; } int lcs = LCS(txt[0], txt[1]); if (lcs == 0) { printf ("Totally different\n"); continue; } printf ("%d\n", lcs); buildstack(); int i = 0, j = 0; printf ("line #1\n"); temp = ""; line = 2; while (!common.empty()) { char c = common.top(); common.pop(); output(0, i, c); output(1, j, c); } output(0, i, 0); output(1, j, 0); printf ("%s\n", temp.c_str()); } return 0; } |