VPS SSH连接超时问题

今天早上发现自己DO的VPS SSH连接失败了,这篇blog主要记录了失败的现象和定位解决过程.

现象

使用secureCRT客户端工具, shell命令行等等连接时统统提示:

Connection timed out.

基础信息定位

  1. 首先ping VPS,可以正常ping通,延迟结果跟平时没什么两样;
  2. 然后访问80端口的web应用(就是这个blog app了),也是正常的;
  3. 最后尝试从web console登录VPS,检查防火墙ufw和fail2ban设置,并检查所有有可能相关的log文件(syslog, ufw.log, fail2ban.log等)

分析1

以上结果查看都是正常的,事实上自己昨天晚上还SSH登录了VPS,做了下package更新之类的操作,

在这期间没有进行任何防火墙等配置的修改.

结合VPS的80端口可以正常访问, ping包可以正常到达来看, 现在需要查找的是:

为什么建立SSH的TCP包会被丢掉以及在哪里被丢掉的.

路由信息定位

这时候就要用到路由追踪工具traceroute了.

具体用法和参数请man它,不再赘述.

下面是我四次traceroute的结果log:

测试1
-d 使用socket level debugging
-I 发出ICMP ECHO

> traceroute -dI [my vps ip] 
traceroute to [my vps ip] ([my vps ip]), 30 hops max, 40 byte packets 
1 ([my test source ip1]) 0.453 ms 0.941 ms 1.171 ms 
2 ([my test source ip2]) 0.238 ms 0.283 ms 0.335 ms 
3 10.0.0.17 (10.0.0.17) 0.156 ms 0.163 ms 0.160 ms 
4 * * * 
5 * * * 
6 * * * 
7 61.148.155.65 (61.148.155.65) 2.869 ms * * 
8 * * * 
9 219.158.104.238 (219.158.104.238) 41.377 ms 40.596 ms 40.583 ms 
10 219.158.11.202 (219.158.11.202) 35.299 ms 35.298 ms 35.321 ms 
11 219.158.97.26 (219.158.97.26) 38.758 ms 38.758 ms 38.752 ms 
12 219.158.30.162 (219.158.30.162) 251.722 ms 251.725 ms 251.791 ms 
13 sjo-b21-link.telia.net (213.248.73.189) 283.383 ms 283.386 ms 283.460 ms 
14 digitalocean-ic-306499-sjo-b21.c.telia.net (62.115.45.22) 284.334 ms 284.398 ms 284.380 ms 
15 [my vps ip] ([my vps ip]) 284.910 ms 283.683 ms 283.283 ms 

测试2
-T 发出TCP SYN包(默认80端口)

> traceroute -dT [my vps ip] 
traceroute to [my vps ip] ([my vps ip]), 30 hops max, 40 byte packets 
1 ([my test source ip1]) 0.423 ms 0.986 ms 1.215 ms 
2 ([my test source ip2]) 0.236 ms 0.253 ms 0.304 ms 
3 10.0.0.17 (10.0.0.17) 0.170 ms 0.170 ms 0.161 ms 
4 * * * 
5 202.106.42.149 (202.106.42.149) 7.831 ms 7.942 ms 8.045 ms 
6 * bt-228-025.bta.net.cn (202.106.228.25) 2.227 ms 61.148.146.221 (61.148.146.221) 3.694 ms 
7 123.126.7.145 (123.126.7.145) 3.309 ms 61.148.155.65 (61.148.155.65) 2.543 ms 2.522 ms 
8 124.65.194.37 (124.65.194.37) 4.050 ms 2.536 ms 124.65.194.105 (124.65.194.105) 4.466 ms 
9 219.158.104.238 (219.158.104.238) 37.699 ms 37.538 ms 37.509 ms 
10 219.158.11.202 (219.158.11.202) 33.708 ms 219.158.3.70 (219.158.3.70) 36.789 ms 219.158.11.202 (219.158.11.202) 33.743 ms 
11 219.158.97.26 (219.158.97.26) 35.827 ms 35.867 ms 35.825 ms 
12 219.158.30.162 (219.158.30.162) 262.187 ms 262.146 ms 262.077 ms 
13 sjo-b21-link.telia.net (213.248.73.189) 3231.431 ms 3231.232 ms 230.541 ms 
14 digitalocean-ic-306499-sjo-b21.c.telia.net (62.115.45.22) 231.839 ms 231.776 ms 231.677 ms 
15 (198.199.99.234) 236.235 ms 244.375 ms (198.199.99.242) 231.921 ms 
16 [my vps ip] ([my vps ip]) 232.670 ms 1430.643 ms 232.605 ms 

测试3
-T 发出TCP SYN
-p 指定目标端口

> traceroute -dT -p [my vps ssh port] [my vps ip] 
traceroute to [my vps ip] ([my vps ip]), 30 hops max, 40 byte packets 
1 ([my test source ip1]) 0.468 ms 1.009 ms 1.222 ms 
2 ([my test source ip2]) 0.251 ms 0.261 ms 0.316 ms 
3 10.0.0.17 (10.0.0.17) 0.153 ms 0.144 ms 0.139 ms 
4 * * * 
5 202.106.42.97 (202.106.42.97) 6.910 ms 6.996 ms 7.050 ms 
6 61.148.146.221 (61.148.146.221) 1.839 ms 61.148.146.209 (61.148.146.209) 1.800 ms bt-228-025.bta.net.cn (202.106.228.25) 3.031 ms 
7 123.126.7.145 (123.126.7.145) 4.441 ms 61.148.155.65 (61.148.155.65) 3.470 ms 3.448 ms 
8 124.65.194.105 (124.65.194.105) 1.013 ms 124.65.194.37 (124.65.194.37) 2.270 ms 124.65.194.133 (124.65.194.133) 1.746 ms 
9 219.158.104.238 (219.158.104.238) 40.904 ms 40.731 ms 40.699 ms 
10 219.158.11.202 (219.158.11.202) 71.651 ms 219.158.11.22 (219.158.11.22) 60.150 ms 219.158.11.202 (219.158.11.202) 71.482 ms 
11 219.158.97.26 (219.158.97.26) 76.786 ms 76.750 ms 76.731 ms 
12 219.158.30.162 (219.158.30.162) 239.041 ms 239.003 ms 238.990 ms 
13 sjo-b21-link.telia.net (213.248.73.189) 267.914 ms 268.764 ms 268.599 ms 
14 digitalocean-ic-306499-sjo-b21.c.telia.net (62.115.45.22) 251.226 ms 271.041 ms 270.587 ms 
15 (198.199.99.234) 275.339 ms 275.530 ms * 
16 * * * 
17 * * * 
18 * * * 
19 * * * 
20 * * * 
21 * * * 
22 * * * 
23 * * * 
24 * * * 
25 * * * 
26 * * * 
27 * * * 
28 * * * 
29 * * * 
30 * * * 

测试4
默认参数(暂时没有查到默认参数是发送什么类型的数据包)

> traceroute -d [my vps ip] 
traceroute to [my vps ip] ([my vps ip]), 30 hops max, 40 byte packets 
1 ([my test source ip1]) 0.444 ms 0.895 ms 1.105 ms 
2 ([my test source ip2]) 0.291 ms 0.287 ms 0.320 ms 
3 10.0.0.17 (10.0.0.17) 0.190 ms 0.188 ms 0.173 ms 
4 * * * 
5 202.106.42.97 (202.106.42.97) 6.829 ms 202.106.42.149 (202.106.42.149) 14.271 ms 14.416 ms 
6 61.148.155.37 (61.148.155.37) 3.040 ms bt-228-025.bta.net.cn (202.106.228.25) 5.109 ms 2.139 ms 
7 61.148.155.65 (61.148.155.65) 2.435 ms 2.409 ms 2.391 ms 
8 124.65.194.37 (124.65.194.37) 5.372 ms 124.65.194.105 (124.65.194.105) 2.931 ms 124.65.194.37 (124.65.194.37) 5.334 ms 
9 219.158.104.238 (219.158.104.238) 37.441 ms 37.440 ms 37.403 ms 
10 219.158.11.202 (219.158.11.202) 33.857 ms 33.840 ms 33.773 ms 
11 219.158.97.26 (219.158.97.26) 37.147 ms 37.167 ms 37.111 ms 
12 219.158.30.162 (219.158.30.162) 242.141 ms 242.643 ms 243.297 ms 
13 sjo-b21-link.telia.net (213.248.73.189) 230.799 ms 232.114 ms 230.689 ms 
14 digitalocean-ic-306499-sjo-b21.c.telia.net (62.115.45.22) 231.678 ms * digitalocean-ic-302451-sjo-b21.c.telia.net (62.115.34.18) 231.579 ms 
15 (198.199.99.234) 235.753 ms 235.752 ms (198.199.99.242) 231.714 ms 
16 * * * 
17 * * * 
18 * * * 
19 * * * 
20 * * * 
21 * * * 
22 * * * 
23 * * * 
24 * * * 
25 * * * 
26 * * * 
27 * * * 
28 * * * 
29 * * * 
30 * * *

分析2

从traceroute结果可以看出,到vps的数据在到达第14跳,

digitalocean-ic-306499-sjo-b21.c.telia.net

这个路由之后, 在第15跳发生了问题(过去的包或者回来的包被丢了).

其他验证

在我的朋友@PengLi的帮助下(顺便表示感谢!)--我借用了他的VPS,同一家供应商的不同机房,

从我这边的同一出口IP,traceroute到他的VPS,发现可以正常连接,下面是log:

> traceroute -dT -p [PengLi ssh port] [PengLi vps ip]
traceroute to [PengLi vps ip] ([PengLi vps ip]), 30 hops max, 40 byte packets
 1   ([my test ip1])  6.978 ms  7.130 ms  7.340 ms
 2   ([my test ip2])  0.223 ms  0.255 ms  0.290 ms
 3  10.0.0.25 (10.0.0.25)  0.174 ms  0.157 ms  0.175 ms
 4  * * *
 5  202.106.42.97 (202.106.42.97)  8.040 ms  8.132 ms 202.106.42.149 (202.106.42.149)  9.023 ms
 6  * bt-228-025.bta.net.cn (202.106.228.25)  2.306 ms *
 7  123.126.6.241 (123.126.6.241)  1.524 ms 61.51.113.137 (61.51.113.137)  3.164 ms  3.121 ms
 8  123.126.0.69 (123.126.0.69)  3.860 ms 61.148.152.138 (61.148.152.138)  2.424 ms 123.126.0.69 (123.126.0.69)  5.773 ms
 9  219.158.101.54 (219.158.101.54)  4.663 ms 123.126.0.69 (123.126.0.69)  4.780 ms 219.158.101.54 (219.158.101.54)  7.490 ms
10  219.158.101.54 (219.158.101.54)  7.463 ms  5.916 ms  5.774 ms
11  219.158.97.238 (219.158.97.238)  62.962 ms 219.158.103.2 (219.158.103.2)  319.895 ms 219.158.97.238 (219.158.97.238)  62.778 ms
12  las-bb1-link.telia.net (213.248.94.125)  318.339 ms * *
13  * * *
14  nyk-bb1-link.telia.net (213.155.137.124)  403.562 ms nyk-bb1-link.telia.net (80.91.252.226)  415.773 ms nyk-bb2-link.telia.net (62.115.137.42)  443.101 ms
15  * * nyk-bb1-link.telia.net (80.91.252.226)  3414.478 ms
16  * digitalocean-ic-306498-nyk-b3.c.telia.net (62.115.45.10)  426.854 ms *
17   (162.243.188.230)  414.822 ms digitalocean-ic-306498-nyk-b3.c.telia.net (62.115.45.10)  426.696 ms digitalocean-ic-306497-nyk-b3.c.telia.net (62.115.45.6)  424.103 ms
18  [PengLi vps ip] ([PengLi vps ip])  424.674 ms  3404.148 ms  28794.190 ms

沟通及结果

拿着上述log去跟VPS供应商的customer service进行沟通, 对方态度和反应速度(考虑到时差)都还不错,

不过结论只有一句话: 就是VPS自己防火墙的问题.

可是在我关闭了ufw后,SSH仍然无法连接,这是我百思不得其解的地方.

最后,本着不折腾的原则,更换了一个SSH的端口,居然就可以正常连接了...

在网上发帖求助, 有人说可能是GFW针对IP(包括端口)进行了屏蔽,

但是我觉得这么小的VPS(不只是规模小, 流量更是小的可怜)应该不会被GFW这么轻易看上.

不管怎样, SSH连接失败的问题至此算是解决了吧.


Comments

blogroll

social