URL编码相关的知识

图片和视频的URL地址中,path和参数均未编码,虽然目前大部分Http客户端能默认处理一些特殊字符,但是类似空格、=、+、&这些字符(串),http客户端默认无法处理,需要客户编码传入。
一些不符合URL编码的URL:

  • 空格:HTTPclient默认则会截断,浏览器则会编码,编码为**%20或者+**。
  • =:Httpclient和浏览器均不会做编码,虽然=是参数键值对的连接符,但目前也能通过判断=前面是否有”&key“的形式,编码为 %3D
  • +:http客户端则会认为是空格的编码,但实际的编码为**%2B** (如下图)
  • &:则认为是两个参数之间的连接符,编码为 %26

url编解码

问题URL比较

URL Result 差异 说明
https://sit-iot-media.oss-cn-beijing.aliyuncs.com/upload-temp/pic/20210420/19:35:01/f2456a6129354dd8add2e78eb8c8d3f0.png?Expires=1618918561&OSSAccessKeyId=LTAI4GByEtbtPtCKGaRbAkJV&Signature=/xZwBJTd48JHTuHzaLo3fbL1HRA= success =
https://sit-iot-media.oss-cn-beijing.aliyuncs.com/upload-temp/pic/20210421/17:30:47/139a79198bea4be68c37c70ff8e2631a.png?Expires=1619083847&OSSAccessKeyId=LTAI4GByEtbtPtCKGaRbAkJV&Signature=w+KuWECzz90C8+ctyvVMGGClobo= failure +=
https://sit-iot-media.oss-cn-beijing.aliyuncs.com/upload-temp/pic/20210420/16:24:10/fd9e6382213e47c8abc7af5cd7fc92e9.png?Expires=1618907110&OSSAccessKeyId=LTAI4GByEtbtPtCKGaRbAkJV&Signature=uG7E81+QppUIzEWES2GN9Gea1gY= failure +=
https://sit-iot-media.oss-cn-beijing.aliyuncs.com/upload-temp/pic/20210421/17:30:47/139a79198bea4be68c37c70ff8e2631a.png?Expires=1619083847&OSSAccessKeyId=LTAI4GByEtbtPtCKGaRbAkJV&Signature=w%2BKuWECzz90C8%2BctyvVMGGClobo%3D ok 编码后的URL
https://sit-iot-media.oss-cn-beijing.aliyuncs.com/upload-temp/pic/20210421/17:16:20/0e2965e480814206834f58af95f13ea1.jpeg?Expires=1619082981&OSSAccessKeyId=LTAI4GByEtbtPtCKGaRbAkJV&Signature=rGSqvsMvNJ5mU+qFM25iQSejQ8s= failure +=
https://prod-iot-media.oss-cn-beijing.aliyuncs.com/uploadFiles/C2/video/20210421/19:24:31/1384830400968790016_1f9035102706cf7715362c7974111385.mp4?Expires=1619090671&OSSAccessKeyId=LTAI5t8AYuA3UNyDAnqyuPVm&Signature=93iWiU+h2iDVh13Kuvj+T9PVttE= failure +=

经排查,主要问题在:+,本意上是RSA或者BASE64编码产生的字符,因为HTTP客户端(包括浏览器)无法判定此处的“+”到底是空格的编码还是字符本身,因此在我们的程序中无法硬编码,只能在请求方确认其代表的含义。

解决方式

编码引入下面jar包:

1
2
3
4
5
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>

代码中只需要一句话:

1
2
3
4
5
6
7
8
9
10
11
String url = "https://sit-iot-media.oss-cn-beijing.aliyuncs.com/upload-temp/pic/20210421/17:16:20/0e2965e480814206834f58af95f13ea1.jpeg?Expires=1619082981&OSSAccessKeyId=LTAI4GByEtbtPtCKGaRbAkJV&Signature=rGSqvsMvNJ5mU+qFM25iQSejQ8s=";

// 下面开始编码处理,将 + 进行编码
BitSet set = URI.allowed_query;
set.clear('+'); // 加这一句就是为了处理+号,这样会把所有的加号编码为%2B,因此不适合在未知情况下使用
url = URIUtil.encode(url, set, "utf-8");

System.out.println(url);
// 输出编码后的URL,该URL直接可在浏览器打开
// https://sit-iot-media.oss-cn-beijing.aliyuncs.com/upload-temp/pic/20210421/17:16:20/0e2965e480814206834f58af95f13ea1.jpeg?Expires=1619082981&OSSAccessKeyId=LTAI4GByEtbtPtCKGaRbAkJV&Signature=rGSqvsMvNJ5mU%2BqFM25iQSejQ8s=