그누보드 QA - 타 사이트 소스 가져오기

타 사이트 소스 가져오기

타버린나무 자기소개 전체게시물 회원게시물 회원 질문검색 회원 답변검색 회원 댓글검색

자기소개 전체게시물 회원게시물 회원 질문검색 회원 답변검색 회원 댓글검색

2023.10.09 18:31:38 조회 2629 (222.♡.♡.24)

본문

A사이트의 소스를 가져오고 싶습니다.

우선

A사이트는 우클릭 소스보기하면 잘 나옵니다.

해본건

curl

fopen("A사이트")

file_get_contents("A사이트")

예를 들어 소스를 가져와서 보면

$source = file_get_contents("https://www.ppomppu.co.kr"); // 파일(소스) 가져오기
echo htmlspecialchars($source); // 아무것도 안나옴

echo $source; // 깨져서 보여짐

$source = file_get_contents("https://sir.kr"); // 파일(소스) 가져오기
echo htmlspecialchars($source); // 잘 보여짐

echo $source; // 깨져서 보여짐

$source = file_get_contents("A사이트"); // 파일(소스) 가져오기
echo htmlspecialchars($source); // 보안문제로 다시 접속하라는 소스만 나옴

echo $source; // 보안문제로 다시 접속하라는 소스만 나옴

위 예를 보면 뽐뿌사이트의 경우는 소스가 가져와 진건지 아닌건지 모르겠고

냑의 경우는 소스(텍스트)는 잘 가져온게 확인되는데

A사이트의 경우도 뭔가 가져오는데 보안문제로 다시 접속하라는 메세지의 코드만 가져와지더군요.

소스보기로 나오는 페이지(텍스느)를 그대로 가져올수 없을까요?

#php #html

답변 5

배르만 자기소개 전체게시물 회원게시물 회원 질문검색 회원 답변검색 회원 댓글검색 님의 답변

2023-10-09 20:04:45 119.♡.♡.81

인코딩 문제로 인해 나타나는 현상인 경우 다음처럼 고정 인코딩으로 시도해볼수 있습니다.


<?php
const ENCODING_TO = 'UTF-8';
header('Content-Type: text/html; charset=' . ENCODING_TO);
 
function url_front_source($url) {
    $source = file_get_contents($url);
 
    $encoding_from = null;
    $charset = null;
    if (preg_match('!charset=([^\'"\s]+)!i', $source, $charset) === 1) {
        $encoding_from = $charset[1];
    }
    if (empty($encoding_from) == true) {
        $encoding_from = mb_detect_encoding($source);
    }
    if (empty($encoding_from) == true) {
        $encoding_from = 'ASCII';
    }

    $encoding_from = strtoupper($encoding_from);
    $encoding_to = strtoupper(ENCODING_TO);

    if ($encoding_from != $encoding_to) {
        $source = iconv($encoding_from, $encoding_to, $source);
    }
    
    $source = htmlspecialchars($source);
 
    return $source;
}
 
// echo url_front_source('https://www.ppomppu.co.kr');
echo url_front_source('https://sir.kr');
?>

웹메이킹 자기소개 전체게시물 회원게시물 회원 질문검색 회원 답변검색 회원 댓글검색 님의 답변

2023-10-09 18:39:29 118.♡.♡.157

일반적으로 웹사이트는 웹 스크래핑을 방지하기 위한 다양한 방법을 사용하며, 이러한 방법 중 하나가 User-Agent 또는 Referer 검사일 수 있습니다

User-Agent 를 다음과 같은 형식으로 수정 및 추가를 해보시는 건 어떨까 합니다.


$context = stream_context_create([
    'http' => [
        'user_agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.0.0 Safari/537.36',
    ],
]);
$source = file_get_contents("A사이트", false, $context);
echo htmlspecialchars($source);