ProxyLab
开始日期:22.2.27
操作系统:Ubuntu20.0.4
Link:CS:APP3e
写在前面
实验环境bug
在目录
proxylab-handout
中输入./driver.sh
,出现bug:bug 1:未安装
net-tool
包1
2行 117: netstat:未找到命令
...是由于未安装
net-tool
包,执行以下指令安装即可(需要输入用户密码)1
sudo apt install net-tools
bug2:part II & part III 无法评分
1
2
3
4
5
6*** Concurrency ***
Starting tiny on port 25178
Starting proxy on port 26265
Starting the blocking NOP server on port 12673
Timeout waiting for the server to grab the port reserved for it
已终止是python环境问题,将python文件
nop-server.py
中的首行#!/usr/bin/python
添加改为#!/usr/bin/python3
即可
curl
测试出现bug
端口号来自指令./port-for-user.pl
,分配的端口号是55200
,笔者拿给tiny web
使用,为了方便,笔者手动加1,把552001
给proxy
使用
首先需要开启三个终端,一个服务器tiny web(注意要在目录proxylab-handout/tiny
中启动),一个代理proxy,一个用以执行curl
指令1
2
3
4
5
6
7
8/* boot tiny */
./tiny 55200
/* boot proxy */
./tiny 55201
/* command: curl */
curl -v --proxy http://localhost:55201/ http://localhost:55200/home.htmlproxy、tiny对应的port出错,或者没有分开启动三个终端或者浏览器firefox网络代理配置错误
1
2
3
4
5
6* Trying 127.0.0.1:55201...
* TCP_NODELAY set
* connect to 127.0.0.1 port 55201 failed: 拒绝连接
* Failed to connect to localhost port 55201: 拒绝连接
* Closing connection 0
curl: (7) Failed to connect to localhost port 55201: 拒绝连接1
2
3
4
5
6
7
8
9
10
11
12
13
14host:55201/ http://localhoast:55200/home.html
* Trying 127.0.0.1:55201...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 55201 (#0)
> GET http://localhoast:55200/home.html HTTP/1.1
> Host: localhoast:55200
> User-Agent: curl/7.68.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
duile@ubuntu:~/Desktop/csapp_lab/proxtiny web
对于的URI(网址)输入错误1
2
3Accepted connection from (localhost 48746)
getaddrinfo failed (localhoast:55200): Temporary failure in name resolution
Open_clientfd error: Resource temporarily unavailable
firefox网络代理配置
- 端口号来自指令
./port-for-user.pl
,分配的端口号是55200
,笔者拿给tiny web
使用,为了方便,笔者自动加1,把552001
给proxy
使用 - 如图所示(firefox => 设定 => 网络配置)
需要提前理解的知识点
- sprint()、sscanf()
- fprintf()
- strstr()、strcmp()
- RIO packet
- telnet、curl
telnet 127.0.0.1 4500
可以用来远程登录是否有效
(4500
是通过脚本指令./free-port.sh
获得自由端口)
- int Open_clientfd(char *hostname, char *port)
- csapp.h文件中给出的Open系列函数里,port是char类型
\r\n
,new_request_hdr需要加上- 先回车(return line)再换行(newline)
- 在HTTP协议,服务器的默认端口号为:
80
- 关闭服务器或者代理
ctrl + c
- uri和url的区别
参考链接
15-213: Intro to Computer Systems: Schedule for Fall 2015
- rec13.pdf
- 25-sync-advanced.pdf
writeup
记得认真看!
官网开始更新Spring-2022了!
part I
任务:实现一个简单的proxy(代理),只要满足requests of sequence (序列请求)即可
首先要搞懂什么是代理,它主要有两个任务(这也是
doit
要实现的功能)- 面对client(客户端)的request(请求),代理要充当服务器,接收请求,再将请求包装转发给服务器
- 面对server(服务器)的respond(响应),代理要充当客服端,接收响应,再将响应直接转发给客服端
一开始懵是很正常的,该部分的代码主要参考了
tiny.c
,要理解tiny.c
才好写- 我们是通过fd(文件描述符),来进行数据传递的(这里还需要使用RIO packet的主要函数)
- 注意sprint()函数不打印字符,而是传递字符给指针
为方便理解,笔者在代码中将真正的客户端命名为
real_client
,将真正的服务器命名为real_server
笔者在这里写清楚两个函数的作用
parse_uri
功能:从uri中获取hostname(主机名),port(端口号),path(路径)
normal uri => http://hostname:port/path
eg
uri => http://www.cmu.edu:450/hub/index.html
hostname => www.cmu.edu
port => 450
,若uri中没有端口号,默认是80
(在HTTP协议,服务器的默认端口号为:80
)path => /hub/index.html
hostname,port,path的获取都要分为两种情况,一共有六种情况
- hostname要考虑
http://
是否存在 - port要考虑port本身是否存在
- path也要path本身是否存在
- hostname要考虑
build_new_request_hdr
- 功能:把来自
real_client
的旧请求包装成新请求转发给real_server
- 先读取旧请求中的
hdr
,以其为基础构造新请求的hdr
- 功能:把来自
curl
的测试结果
(关于GET请求,请读者自测:GET http://www.cmu.edu/hub/index.html HTTP/1.1
)参考代码
1 |
|
part II
实现concurrency(并发),也就是可以两个或两个以上的客户端同时发送请求
参考课本关于线程并发的内容即可,基本是照搬,但注意两点(书本也提及了)
- 为每一个peer thread(对等线程)分配单独的空间,可以防止多个peer thread的race(竞争)
- 对于每个peer thread要先 detach(分离)之后再释放
课本内容(英文版):实现concurrency
参考代码
1 |
|
part III
功能:在proxy中格外添加一个cache(缓存)功能,这个cache存储的是最近使用的object(对象)
- 若cache中已存在请求的对象,就直接从cache转发给client
- 若cache中未存在请求的对象,在把响应转发给client的同时,把响应(可能需要拆解成一个个object,也可能不需要)存储到cache中
注意一个object是page(网页)的一部分,而不是整个网页
object的结构
1
2
3
4
5
6
7/* struction of one object(also one cache block) */
typedef struct {
char *url;
char *content;
int *cnt; /* LRU: the count of use */
int *is_used; /* equals 0 => obj can't be used; equals 1 => obj can be used */
}object;is_used
的取值不同(取0
或取1
)对于读者(reader)、写者(writer)各有意义cache[i].is_used == 1
- 对于
reader()
函数,这个obj存在,可以使用,可以传递给real client - 对于
writer()
函数,这个obj已经在cache中,如果它对应的cnt最小,那么可以被替代为新的obj
- 对于
cache[i].is_used == 0
- 对于
reader()
函数,这个obj不存在,不可以使用,不可以传递给real client - 对于
writer()
函数,这个obj还没有内容,在cache中是空壳,可以直接把新的obj插入
- 对于
这里使用的URL丢弃策略的思路和平时是一样的
- 即最近最少使用的把object丢弃掉
- 常规的计数方式是:
cnt = 0(或者是cnt 最小的)是不经常使用的,cnt最大的是经常使用的 - 反常规的计数方式是
cnt = 0(或者是cnt 最小的)反倒是经常使用的,cnt最大的是不经常使用的 - 笔者的代码采用常规的计数方式
使用了读者-写者模型中的读者优先
- 读者可以有多个以满足并发,但写者只能有一个
- 可能会出现starvation(饥饿)现象:
- 饥饿的定义:一个线程无限期地阻塞,无法进展
- 读者优先可能导致:读者不断地到达,从而写者无限期地等待
对于part III
,可以理解为,cache不断地被读,但没法写(更新)
- 参考了课本代码(读者计数锁,写者锁)
- 当有第一个读者来的时候,我们允许唯一的写者启动,并锁住保护;当最后一个读者离开的时候,我们允许唯一的写者结束,并解锁保护。这就履行了读者-写者模型中的读者优先
- 注意我们是多线程,所以会有多个读者,同时我们只允许唯一的写者
- 课本内容(英文版):读者-写者模型中的读者优先
cache中objects的数量,笔者取
10
#define NUMBERS_OBJECT 10
- 因为
MAX_OBJECT_SIZE * 10 = 1024000
约等于MAX_CACHE_SIZE = 1049000
参考代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
/* Recommended max cache and object sizes */
/* numbers of object from a cache */
/* You won't lose style points for including this long line in your code */
static const char *user_agent_hdr = "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120305 Firefox/10.0.3\r\n";
static const char *conn_hdr = "Connection: close\r\n";
static const char *proxy_hdr = "Proxy-Connection: close\r\n";
/* struction of one object(also one cache block) */
typedef struct {
char *url;
char *content;
int *cnt; /* LRU: the count of use */
int *is_used; /* equals 0 => obj can't be used; equals 1 => obj can be used */
}object;
/* Global varibles */
static object *cache;
static int readcnt; /* count of reader */
static sem_t readcnt_mutex, writer_mutex; /* and the mutex that pretects it */
/* helper function */
void doit(int client_fd);
void clienterror(int fd, char *cause, char *errnum,
char *shortmsg, char *longmsg);
void parse_uri(char *uri, char *hostname, char *path, int *port);
void print_and_build_hdr(rio_t *rio_packet, char *new_request, char *hostname, char *port);
void *thread(void *varge_ptr);
void init_cache(void);
static void init_mutex(void);
int reader(int fd, char* url);
void writer(int* buf, char* url);
/* boot proxy as server get connfd from client*/
int main(int argc, char **argv)
{
init_cache();
int listenfd, *connfd_ptr;
char hostname[MAXLINE], port[MAXLINE];
socklen_t clientlen;
struct sockaddr_storage clientaddr;
pthread_t tid;
/* Check command line args */
if (argc != 2) {
fprintf(stderr, "usage: %s <port>\n", argv[0]);
exit(1);
}
listenfd = Open_listenfd(argv[1]);
while (1) {
clientlen = sizeof(clientaddr);
connfd_ptr = Malloc(sizeof(int)); /* alloc memory of each thread to avoid race */
*connfd_ptr = Accept(listenfd, (SA *)&clientaddr, &clientlen);
Getnameinfo((SA *) &clientaddr, clientlen, hostname, MAXLINE,
port, MAXLINE, 0);
printf("Accepted connection from (%s, %s)\n", hostname, port);
Pthread_create(&tid, NULL, thread, connfd_ptr);
}
return 0;
}
/*
* Thread routine
*/
void *thread(void *varge_ptr){
int connfd = *((int *)varge_ptr);
Pthread_detach(pthread_self());
doit(connfd);
Free(varge_ptr);
Close(connfd);
return;
}
/*
* doit - handle one HTTP request/response transaction
*/
void doit(int client_fd)
{
int real_server_fd;
char buf[MAXLINE], method[MAXLINE], url[MAXLINE], version[MAXLINE];
char uri[MAXLINE], obj_buf[MAXLINE];
rio_t real_client, real_server;
char hostname[MAXLINE], path[MAXLINE];
int port;
/* Read request line and headers from real client */
Rio_readinitb(&real_client, client_fd);
if (!Rio_readlineb(&real_client, buf, MAXLINE))
return;
sscanf(buf, "%s %s %s", method, uri, version);
strcpy(url, uri);
if (strcasecmp(method, "GET")) {
clienterror(client_fd, method, "501", "Not Implemented",
"Tiny does not implement this method");
return;
}
/* if object of request from cache */
if(reader(client_fd, url)){
fprintf(stdout, "%s from cache\n", url);
return;
}
/* perpare for parse uri and build new request */
parse_uri(uri, hostname, path, &port);
char port_str[0];
sprintf(port_str, "%d", port); /* port from int convert to char */
real_server_fd = Open_clientfd(hostname, port_str); /* real server get fd from proxy(as client) */
if(real_server_fd < 0){
printf("connection failed\n");
return;
}
Rio_readinitb(&real_server, real_server_fd);
char new_request[MAXLINE];
sprintf(new_request, "GET %s HTTP/1.0\r\n", path);
print_and_build_hdr(&real_client, new_request, hostname, port_str);
/* proxy as client sent to web server */
Rio_writen(real_server_fd, new_request, strlen(new_request));
/* then proxy as server respond to real client */
int char_nums;
int obj_size = 0;
while((char_nums = Rio_readlineb(&real_server, buf, MAXLINE))){
Rio_writen(client_fd, buf, char_nums);
/* perpare for write object to cache */
if(obj_size + char_nums < MAX_OBJECT_SIZE){
strcpy(obj_buf + obj_size, buf);
obj_size += char_nums;
}
}
if(obj_size < MAX_OBJECT_SIZE)
writer(obj_buf, url);
Close(real_server_fd);
}
/*
* clienterror - returns an error message to the client
*/
void clienterror(int fd, char *cause, char *errnum,
char *shortmsg, char *longmsg)
{
char buf[MAXLINE], body[MAXBUF];
/* Build the HTTP response body */
sprintf(body, "<html><title>Tiny Error</title>");
sprintf(body, "%s<body bgcolor=""ffffff"">\r\n", body);
sprintf(body, "%s%s: %s\r\n", body, errnum, shortmsg);
sprintf(body, "%s<p>%s: %s\r\n", body, longmsg, cause);
sprintf(body, "%s<hr><em>The Tiny Web server</em>\r\n", body);
/* Print the HTTP response */
sprintf(buf, "HTTP/1.0 %s %s\r\n", errnum, shortmsg);
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "Content-type: text/html\r\n");
Rio_writen(fd, buf, strlen(buf));
sprintf(buf, "Content-length: %d\r\n\r\n", (int)strlen(body));
Rio_writen(fd, buf, strlen(buf));
Rio_writen(fd, body, strlen(body));
}
/*
* parse_uri - parse uri to get hostname, port, path from real client
*/
void parse_uri(char *uri, char *hostname, char *path, int *port) {
*port = 80; /* default port */
char* ptr_hostname = strstr(uri,"//");
/* normal uri => http://hostname:port/path */
/* eg. uri => http://www.cmu.edu:8080/hub/index.html */
if (ptr_hostname)
/* hostname_eg1. uri => http://hostname... */
ptr_hostname += 2;
else
/* hostname_eg2. uri => hostname... <= NOT "http://"*/
ptr_hostname = uri;
char* ptr_port = strstr(ptr_hostname, ":");
/* port_eg1. uri => ...hostname:port... */
if (ptr_port) {
*ptr_port = '\0'; /* c-style: the end of string(hostname) is '\0' */
strncpy(hostname, ptr_hostname, MAXLINE);
/* change default port to current port */
/* if path not char, sscanf will automatically store the ""(null) in the path */
sscanf(ptr_port + 1,"%d%s", port, path);
}
/* port_eg1. uri => ...hostname... <= NOT ":port"*/
else {
char* ptr_path = strstr(ptr_hostname,"/");
/* path_eg1. uri => .../path */
if (ptr_path) {
*ptr_path = '\0';
strncpy(hostname, ptr_hostname, MAXLINE);
*ptr_path = '/';
strncpy(path, ptr_path, MAXLINE);
return;
}
/* path_eg2. uri => ... <= NOT "/path"*/
strncpy(hostname, ptr_hostname, MAXLINE);
strcpy(path,"");
}
return;
}
/*
* print_and_build_hdr - print old request_hdr then build and print new request_hdr
*/
void print_and_build_hdr(rio_t *real_client, char *new_request, char *hostname, char *port){
char temp_buf[MAXLINE];
/* print old request_hdr */
while(Rio_readlineb(real_client, temp_buf, MAXLINE) > 0){
if (strstr(temp_buf, "\r\n")) break; /* read to end */
/* if all old request_hdr had been read, we print it */
if (strstr(temp_buf, "Host:")) continue;
if (strstr(temp_buf, "User-Agent:")) continue;
if (strstr(temp_buf, "Connection:")) continue;
if (strstr(temp_buf, "Proxy Connection:")) continue;
sprintf(new_request, "%s%s", new_request, temp_buf);
}
/* build and print new request_hdr */
sprintf(new_request, "%sHost: %s:%s\r\n", new_request, hostname, port);
sprintf(new_request, "%s%s%s%s", new_request, user_agent_hdr, conn_hdr, proxy_hdr);
sprintf(new_request,"%s\r\n", new_request);
}
/*
* initialize the cache
*/
void init_cache(void){
init_mutex();
int readcnt = 0;
/* cache is a Array of object*/
cache = (object*)Malloc(MAX_CACHE_SIZE);
for(int i = 0; i < 10; i++){
cache[i].url = (char*)Malloc(sizeof(char) * MAXLINE);
cache[i].content = (char*)Malloc(sizeof(char) * MAX_OBJECT_SIZE);
cache[i].cnt = (int*)Malloc(sizeof(int));
cache[i].is_used = (int*)Malloc(sizeof(int));
*(cache[i].cnt) = 0;
*(cache[i].is_used) = 0;
}
}
/*
* initialize the mutex
*/
static void init_mutex(void){
Sem_init(&readcnt_mutex, 0, 1);
Sem_init(&writer_mutex, 0, 1);
}
/*
* reader - read from cache to real client
*/
int reader(int fd, char* url){
while(1){
int from_cache = 0; /* equals 0 => obj not from cache; equals 1 => obj from cache */
P(&readcnt_mutex);
readcnt++;
if(readcnt == 1) /* First in */
P(&writer_mutex);
V(&readcnt_mutex);
/* obj from cache then we should write content to fd of real client */
for(int i = 0; i < NUMBERS_OBJECT; i++){
if(cache[i].is_used && (strcmp(url, cache[i].url) == 0)){
from_cache = 1;
Rio_writen(fd, cache[i].content, MAX_OBJECT_SIZE);
*(cache[i].cnt)++;
break;
}
}
P(&readcnt_mutex);
readcnt--;
if(readcnt == 0) /* last out */
V(&writer_mutex);
V(&readcnt_mutex);
return from_cache;
}
}
/*
* writer - write from real server to cache
*/
void writer(int* buf, char* url){
while(1){
int min_cnt = *(cache[0].cnt);
int insert_or_evict_i;
P(&writer_mutex);
/* LRU: find the empty obj to insert or the obj of min cnt to evict */
for(int i = 0; i < NUMBERS_OBJECT; i++){
if(*(cache[i].is_used) == 0){ /* insert */
insert_or_evict_i = i;
break;
}
if(*(cache[i].cnt) < min_cnt){ /* evict */
insert_or_evict_i = i;
min_cnt = *(cache[i].cnt);
}
}
strcpy(cache[insert_or_evict_i].url, url);
strcpy(cache[insert_or_evict_i].content, buf);
*(cache[insert_or_evict_i].cnt) = 0;
*(cache[insert_or_evict_i].is_used) = 1;
V(&writer_mutex);
}
}
总结
- 完成日期:22.3.13
- 文件描述符真的很好用,用来传递数据
- 对于很多函数的功能经常想当然,如果没见过,一定要仔细查资料搞清楚,比如
sprint()
- 期间还是中断了一大段时间,跑去做xv6的实验了,还休息了两三天(在玩galgame),不过总归是完成了!
- csapp的lab之旅算是结束了,经常受折磨,也收获许多,对计算机有大概的了解了,嗯,之后做xv6的实验我还要多多提高自己的debug能力
- 最近在听SHE’S的Letter、Sekai no Owari的《花鳥風月》以及Eason的《人生马拉松》