이전에 URLConnection을 통해 HTML source 중 <title> 정보를 얻어오는 방법에 대한 글을 올린 적이 있습니다.
http://sarc.io/index.php/java/339-get-from-remote-web-page-httpurlconnection
그리고 얼마전 Apache HttpClient 4.5.2 버전에 대한 소개가 있었습니다.
http://sarc.io/index.php/miscellaneous/396-3-apache-news-rave
아래는 Apache HttpClient을 이용하여 HTML source를 얻어내는 간단한 클래스입니다.
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import org.apache.http.HttpResponse; import org.apache.http.client.ClientProtocolException; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; public class Http { private final static String TARGET_URL = "http://apache.org"; public void printHTLMSource() throws ClientProtocolException, IOException { CloseableHttpClient client = HttpClients.createDefault(); HttpGet request = new HttpGet(TARGET_URL); HttpResponse response = client.execute(request); System.out.println("- Response Code : " + response.getStatusLine().getStatusCode()); BufferedReader br = new BufferedReader(new InputStreamReader(response .getEntity().getContent())); StringBuffer htmlSource = new StringBuffer(); String line = ""; while ( (line = br.readLine()) != null ) { htmlSource.append(line); } System.out.println("- Result : " + htmlSource.toString()); } }
제가 사용한 라이브러리는 다음과 같습니다.
- httpclient-4.5.2.jar
- httpcore-4.4.4.jar
- commons-logging-1.2.jar