当前位置:天才代写 > tutorial > JAVA 教程 > 如何利用java挪用python下载网页

如何利用java挪用python下载网页

2017-11-02 08:00 星期四 所属: JAVA 教程 浏览:54

本篇参考:http://tonl.iteye.com/blog/1918245

python版本:2.7 64bit window版本;

下载python:http://www.python.org/getit/

Python 2.7.5 Windows X86-64 Installer (Windows AMD64 / Intel 64 / X86-64 binary [1] — does not include source),举办安装:

首先编写下面的spider.py剧本:

# -*- coding: utf-8 -*-  
#import urllib2  
from urllib import urlopen  
import os  
import sys  
       
class Spider:  
    """ 
    download web site from the given file 
    """
    def __init__(self,filename,downloadPath):  
        """ 
        init the filename ,if the filename is not raise a error 
        """
        if not os.path.isfile(filename):  
            print 'the given file does not exist,the program will exit'
            sys.exit(0)  
        else:  
            self.fname=filename  
        if not os.path.isdir(downloadPath):  
            print 'the given download path does not exist ,the programe will exit'
        else:  
            self.dpath=downloadPath  
    def download(self):  
        """ 
        download the web site from the given file by line 
        """
        fp=open(self.fname,'r')  
        while True:  
            line=fp.readline()  
            if not line:  
                break
            if 'html' in line:  
                tempname=filter(str.isalnum,line).replace('html','.html')  
            else:  
                tempname=filter(str.isalnum,line)+'.html'
            self.download_html(line,self.dpath+'\\'+tempname)  
        fp.close()  
       
    def download_html(self,website,filename):  
        """ 
        download the html by the given web site and save to name 
        """
        response=urlopen(website)  
        data=response.read()  
        fp=file(filename,'a+')  
        fp.write(data)  
        fp.close()  
       
def test():  
    """ 
    test program 
    """
    filename=sys.argv[1]  
    downloadPath=sys.argv[2]  
    spider=Spider(filename,downloadPath)  
    spider.download()  
               
if __name__ =='__main__': test()

上面的剧本,要输入两个参数,一个是要下载的网页的地点文件,名目一般如下(websites.txt):

查察本栏目

http://blog.csdn.net/fansy1990  
http://www.baidu.com

别的一个参数是下载的网页的存放所在。

然后可以在呼吁行运行:

python D:\\spider.py D:\\websites.txt D:\\download_tmp

然后到D盘的download_tmp下面查找下载的文件,假如找到,则说明设置正确;

最后编写下面的java措施,需要导入jython-*.jar包(lz下载的是2.2的):

package test;  
       
import java.io.IOException;  
       
public class PyTest {  
       
    /** 
     * @param args 
     * @throws IOException  
     * @throws InterruptedException  
     */
    public static void main(String[] args) throws IOException, InterruptedException {     
          String py_path="D:\\spider.py";  
          String websites="D:\\websites.txt";  
          String outDir="D:\\tmp";  
          //   
          Process pr=Runtime.getRuntime().exec("python "+py_path+" "+websites+" "+outDir );  
          pr.waitFor();  
          System.out.println("done ...");  
    }  
       
}

运行上面的呼吁,需要配置eclipse中的Environment属性,添加一个PATH变量,值是python的安装目次;

运行后,会提示:

*sys-package-mgr*: can’t create package cache dir, *jython-2.2.jar\cachedir\packages’

这个可以不消管,不会影响措施运行。

 

    关键字:


天才代写-代写联系方式