当用java或者python爬取目标网站的时候,浏览器可以正确重定向,而用编程爬取始终是code:200
只需要将请求头修改成如下,可以根据需要进行更改
Map<String, String> headers = Map.of(
"Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding", "gzip, deflate, sdch, br",
"Accept-Language", "zh-CN,zh;q=0.8",
"Connection", "keep-alive",
"Host", "www.baidu.com",
"Upgrade-Insecure-Requests", "1",
"User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
);
然后就可以获取目标重定向后的地址文章来源:https://www.toymoban.com/news/detail-813399.html
String redirectedUrl = connection.getHeaderField("Location");
完整java语言get请求获取重定向地址方法文章来源地址https://www.toymoban.com/news/detail-813399.html
/**
* 获取重定向后的地址
* @param url
* @return
*/
public static String sendGetRequestWithRedirect(String url) {
try {
URL getUrl = new URL(url);
HttpURLConnection connection = (HttpURLConnection) getUrl.openConnection();
connection.setRequestMethod("GET");
// 设置请求头,模拟浏览器行为
// 设置自定义请求头
Map<String, String> headers = Map.of(
"Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Encoding", "gzip, deflate, sdch, br",
"Accept-Language", "zh-CN,zh;q=0.8",
"Connection", "keep-alive",
"Upgrade-Insecure-Requests", "1",
"User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
);
// 添加自定义请求头
for (Map.Entry<String, String> entry : headers.entrySet()) {
connection.setRequestProperty(entry.getKey(), entry.getValue());
}
// 设置重定向处理
connection.setInstanceFollowRedirects(false);
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK || responseCode == HttpURLConnection.HTTP_MOVED_TEMP || responseCode == HttpURLConnection.HTTP_MOVED_PERM) {
String redirectedUrl = connection.getHeaderField("Location");
if (redirectedUrl != null) {
// 重定向时获取新地址
return redirectedUrl;
} else {
return url;
}
} else {
// 处理错误响应
System.out.println("Error response code: " + responseCode);
return null;
}
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
到了这里,关于爬虫爬取数据遇到302,301重定向如何获取重定向后的地址(完美解决)的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!