1.1K Star 3.6K Fork 1.4K

GVP黄亿华 / webmagic

HttpClientDownloader没有对返回状态码进行判断

Backlog
leisure  Opened this issue

HttpClientDownloader没有对返回状态码进行判断,而Site中acceptStatCode只能一个一个加。

我们一般认为[200, 300)之间的请求为成功,下面可以这样扩展HttpClientDownloader,加个状态码判断:

public class CustomHttpClientDownloader extends HttpClientDownloader {

private static final Logger LOG = LoggerFactory.getLogger(CustomHttpClientDownloader.class);

@Override
protected Page handleResponse(Request request, String charset, HttpResponse httpResponse, Task task) throws IOException {
    // 调用父类处理响应
    Page page = super.handleResponse(request, charset, httpResponse, task);
    int code = page.getStatusCode();
    // 状态码判断
    if (HttpStatus.SC_OK <= code && code < HttpStatus.SC_INTERNAL_SERVER_ERROR) {
        return page;
    } else {
        LOG.warn("下载[{}]错误, 响应码: {}, 不在给定的范围内[{}-{})", request.getUrl(), code, HttpStatus.SC_OK, HttpStatus.SC_INTERNAL_SERVER_ERROR);
        page.setDownloadSuccess(false);
    }
    return page;
}

}

1162544 wangxuanbo 1578943847 total 2 participants

Comments (1)

nicoooo 2019-12-12 16:48

华哥,HttpStatus.SC_INTERNAL_SERVER_ERROR表示的状态码是500,应该改为HttpStatus.SC_MULTIPLE_CHOICES

Sign in to comment

Assignees
Labels
Not set
Projects
Milestones
Branches
Planed to start
Not set
Planed to end
Not set
Top level
Priority
Java
1
https://gitee.com/flashsword20/webmagic.git
git@gitee.com:flashsword20/webmagic.git
flashsword20
webmagic
webmagic

Search