网站首页php
PHP采集页面常用函数
发布时间:2015-11-02 22:26:29编辑:阅读(5588)
在处理QQ登陆接口中,当使用file_get_content或者curl取服务器页面值时, 页面一直提示502 bad getway.
而用fsockopen来处理的时候就没有这问题了,参考下面这个函数,可兼容http和https页面。
当然,你的PHP要先配置好openssl。
<?php function getContent($url) { if (!$url_info = parse_url($url)) { return false; } switch ($url_info['scheme']) { case 'https': $scheme = 'ssl://'; $port = 443; break; case 'http': default: $scheme = ''; $port = 80; } $data = ""; $fid = fsockopen($scheme . $url_info['host'], $port, $errno, $errstr, 30); if ($fid) { fputs($fid, 'GET ' . (isset($url_info['path'])? $url_info['path']: '/') . (isset($url_info['query'])? '?' . $url_info['query']: '') . " HTTP/1.0\r\n" . "Connection: close\r\n" . 'Host: ' . $url_info['host'] . "\r\n\r\n"); while (!feof($fid)) { $data .= @fgets($fid, 128); } fclose($fid); if($data){ $body = stristr($data, "\r\n\r\n"); $body = substr($body, 4, strlen($body)); return $body; }else{ return false; } } else { return false; } } ?>
另外追加一个常用的curl采集函数:
/** * curl采集函数 * * @param $url 需要采集的链接 * @param $postdata 需要提交的post数据,非post方式访问则留空 * @param $pre_url 伪造来源url * @proxyip 设置代理IP * @compression 目标url代码压缩方式 * * @return $result 返回目标url的内容 */ function curl_getContent($url, $postdata='', $pre_url='https://www.baidu.com', $proxyip=false, $compression='gzip, deflate') { $ch = curl_init($url); curl_setopt($ch, CURLOPT_TIMEOUT,5); //设置5秒超时 $client_ip = rand(1,254).'.'.rand(1,254).'.'.rand(1,254).'.'.rand(1,254); $x_ip = rand(1,254).'.'.rand(1,254).'.'.rand(1,254).'.'.rand(1,254); curl_setopt($ch, CURLOPT_HTTPHEADER, array('X-FORWARDED-FOR:'.$x_ip,'CLIENT-IP:'.$client_ip)); //构造IP curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //返回传送内容 if($postdata!=''){ curl_setopt($ch, CURLOPT_POST, 1); //POST提交方式 curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata); //传递一个post提交所有数据的字符串 } $pre_url = $pre_url ? $pre_url : "http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI']; curl_setopt($ch, CURLOPT_REFERER, $pre_url); //前置来源url if($proxyip){ curl_setopt($ch, CURLOPT_PROXY, $proxyip); //代理服务器 } if($compression!='') { curl_setopt($ch, CURLOPT_ENCODING, $compression); //目标url传输内容压缩方式 } //Mozilla/5.0 (Linux; U; Android 2.3.7; zh-cn; c8650 Build/GWK74) AppleWebKit/533.1 //(KHTML, like Gecko)Version/4.0 MQQBrowser/4.5 Mobile Safari/533.1s //请求中包含一个”user-agent”头的字符串 curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.47 Safari/536.11'); curl_setopt($ch, CURLOPT_HEADER, 0); //输出中不要包含http头 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); //抓取跳转页面 $result = curl_exec($ch); curl_close($ch); //gbk转为utf-8 if(! mb_check_encoding($result, 'utf-8')) { $result = mb_convert_encoding($result, 'utf-8', 'gbk'); } return $result; }
使用方式:
<?php $date = date('Y-m-d'); $url = "http://www.xxx.com/"; $post = "pageIndex=1&pageCount=500"; $data = json_decode(curl_getContent($url, $post), true); print_r($data); ?>
评论