The forums have permanently moved to forum.kirupa.com. This forum will be kept around in read-only mode for archival purposes. To learn how to continue using your existing account on the new forums, check out this thread.


Results 1 to 4 of 4

Thread: [PHP] Getting information from sites

  1. #1
    jw06's Avatar
    657
    posts
    Registered User

    [PHP] Getting information from sites

    I was wondering if there was any way to get particular information from a web site. Such as the <meta> tag information or other links inside the page that have http://www.blah.com in them. I know how i open the site and read it and can store the hole site, but dont know how to get certain bits of it out of the source.

    If this is confusing let me know.

    Thanks!

  2. #2
    Here are a few snippets from things I've done. Should point you in the right direction, I'm not sure EXACTLY what you want but these examples step through the source of html pages line by line and look for certain things and acts on them ^_^

    a sort of hacky script to get mapquest images
    Code:
    function getMapquestImages($address, $csz, $lot_number)
    {
        list($city, $statezip) = split(', ', $csz);
        list($state, $zip) = split(' ', $statezip);
        $address = str_replace(".", "", $address);
        $zip = str_replace("-", "%2d", $zip);
        $url = "http://www.mapquest.com/maps/map.adp?country=US&countryid=US&addtohistory=&searchtype=address&cat=&address=" . rawurlencode($address) . "&city=" . rawurlencode($city) . "&state=" . rawurlencode($state) . "&zipcode=" . $zip . "&search=%20%20Search%20%20&searchtab=address";
        $file = fopen ($url, "r");
        if (!$file) return false;
        while (!feof ($file)) {
           $line = fgets ($file, 1024);
           if (eregi("name=mqmap border=0 src=\"(.*)\"", $line, $image_url)) {
                $image_location = $image_url[1];
                break;
           }
           else if (strstr($line, "MapQuest Found:")==true) {
                fgets ($file, 1024);
                $url_line = fgets ($file, 1024);
                eregi ("href=\"(.*)\">", $url_line, $url);
                $location = "http://www.mapquest.com" . $url[1];
                break;
           }
           else if (strstr($line, "Results")==true) {
                fgets ($file, 1024);
                fgets ($file, 1024);
                $url_line = fgets ($file, 1024);
                eregi ("<a class=small href=(.*)>", $url_line, $url);
                $location = "http://www.mapquest.com" . $url[1];
                break;
           }
        }
        fclose($file);
    
        if(isset($location)) {
            $file = fopen ($location, "r");
            if (!$file) die ("Unable to open remote file");
            while (!feof ($file)) {
               $line = fgets ($file, 1024);
               if (eregi("name=mqmap border=0 src=\"(.*)\"", $line, $image_url)) {
                    $image_location = $image_url[1];
                    break;
               }
            }
            fclose($file);
        }
    
        $im = @imagecreatefromgif($image_location);
        imagejpeg($im, "../../images/lot$lot_number.jpg", 80);
        imagedestroy($im);
    
        $jpg = @imagecreatefromjpeg("../../images/lot$lot_number.jpg");
        $sold_overlay = @imagecreatefrompng('sold.png');
        $src_w = ImageSX($sold_overlay);
        $src_h = ImageSY($sold_overlay);
        ImageAlphaBlending($jpg, true);
        imagecopy($jpg, $sold_overlay, 0, 0, 0, 0, $src_w, $src_h);
        imagejpeg($jpg, '../../images/lot' . $lot_number . '_sold.jpg', 80);
        imagedestroy($jpg);
        return "lot$lot_number.jpg";
    }
    a script I used to grab some pantone conversions and put them into a db.
    Code:
    $file = fopen ("http://www.seoconsultants.com/css/colors/conversion/800.asp", "r");
    
    $capture = false;
    while (!feof ($file)) {
        $line = fgets ($file, 1024);
        if((strstr($line, "</pre>")==true)) $capture = false;
        if ($capture==true) {
            $code = trim(substr($line, 0, 8));
            $r = trim(substr($line, 8, 8));
            $g = trim(substr($line, 16, 8));
            $b = trim(substr($line, 24, 8));
            $hex = trim(substr($line, 33, 8));
            mysql_query("INSERT INTO pantone_conversions VALUES ('$code', $r, $g, $b, '$hex')");
        }
        if((strstr($line, "<pre>")==true)) $capture = true;
    }
    fclose($file);
    If you've got any questions about anything, let me know. If I'm completely off base on what you want to do, let me know that too :b

  3. #3
    jw06's Avatar
    657
    posts
    Registered User
    Lol, Well, I've been working on a search engine and I'm to the point of making a web spider to automatcily search content and add it to an index. What need the spider to do though, is get the <meta></meta> tags of the website its on though. I dont have a clude on how I would grab the content between the two tags. I also want it to grab any link that is on the web page so it can follow it later and keep updating. Does that explain it a little better?

  4. #4
    jw06's Avatar
    657
    posts
    Registered User
    ^^bump^^

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Home About kirupa.com Meet the Moderators Advertise

 Link to Us

 Credits

Copyright 1999 - 2012