Everybody! This is important. In a few days, these forums will be moving over to using the totally sweet Discourse platform. To ensure this migration happens smoothly with no loss of content, these forums are currently in a read-only mode. I do apologize for the inconvenience.

There is never a good time to turn the forums off for an extended period of time, but I promise the new forums will be a billion times better. I'm pretty sure of it.

See you all on the other side in a few days, and if you have any (non-technical) questions, please e-mail me at kirupa@kirupa.com. For technical questions, try to find a tutorial that corresponds to what you are looking for and post in the comments section of that page.

Cheers,
Kirupa

Results 1 to 4 of 4

Thread: [PHP] Getting information from sites

  1. #1
    jw06's Avatar
    657
    posts
    Registered User

    [PHP] Getting information from sites

    I was wondering if there was any way to get particular information from a web site. Such as the <meta> tag information or other links inside the page that have http://www.blah.com in them. I know how i open the site and read it and can store the hole site, but dont know how to get certain bits of it out of the source.

    If this is confusing let me know.

    Thanks!

  2. #2
    Here are a few snippets from things I've done. Should point you in the right direction, I'm not sure EXACTLY what you want but these examples step through the source of html pages line by line and look for certain things and acts on them ^_^

    a sort of hacky script to get mapquest images
    Code:
    function getMapquestImages($address, $csz, $lot_number)
    {
        list($city, $statezip) = split(', ', $csz);
        list($state, $zip) = split(' ', $statezip);
        $address = str_replace(".", "", $address);
        $zip = str_replace("-", "%2d", $zip);
        $url = "http://www.mapquest.com/maps/map.adp?country=US&countryid=US&addtohistory=&searchtype=address&cat=&address=" . rawurlencode($address) . "&city=" . rawurlencode($city) . "&state=" . rawurlencode($state) . "&zipcode=" . $zip . "&search=%20%20Search%20%20&searchtab=address";
        $file = fopen ($url, "r");
        if (!$file) return false;
        while (!feof ($file)) {
           $line = fgets ($file, 1024);
           if (eregi("name=mqmap border=0 src=\"(.*)\"", $line, $image_url)) {
                $image_location = $image_url[1];
                break;
           }
           else if (strstr($line, "MapQuest Found:")==true) {
                fgets ($file, 1024);
                $url_line = fgets ($file, 1024);
                eregi ("href=\"(.*)\">", $url_line, $url);
                $location = "http://www.mapquest.com" . $url[1];
                break;
           }
           else if (strstr($line, "Results")==true) {
                fgets ($file, 1024);
                fgets ($file, 1024);
                $url_line = fgets ($file, 1024);
                eregi ("<a class=small href=(.*)>", $url_line, $url);
                $location = "http://www.mapquest.com" . $url[1];
                break;
           }
        }
        fclose($file);
    
        if(isset($location)) {
            $file = fopen ($location, "r");
            if (!$file) die ("Unable to open remote file");
            while (!feof ($file)) {
               $line = fgets ($file, 1024);
               if (eregi("name=mqmap border=0 src=\"(.*)\"", $line, $image_url)) {
                    $image_location = $image_url[1];
                    break;
               }
            }
            fclose($file);
        }
    
        $im = @imagecreatefromgif($image_location);
        imagejpeg($im, "../../images/lot$lot_number.jpg", 80);
        imagedestroy($im);
    
        $jpg = @imagecreatefromjpeg("../../images/lot$lot_number.jpg");
        $sold_overlay = @imagecreatefrompng('sold.png');
        $src_w = ImageSX($sold_overlay);
        $src_h = ImageSY($sold_overlay);
        ImageAlphaBlending($jpg, true);
        imagecopy($jpg, $sold_overlay, 0, 0, 0, 0, $src_w, $src_h);
        imagejpeg($jpg, '../../images/lot' . $lot_number . '_sold.jpg', 80);
        imagedestroy($jpg);
        return "lot$lot_number.jpg";
    }
    a script I used to grab some pantone conversions and put them into a db.
    Code:
    $file = fopen ("http://www.seoconsultants.com/css/colors/conversion/800.asp", "r");
    
    $capture = false;
    while (!feof ($file)) {
        $line = fgets ($file, 1024);
        if((strstr($line, "</pre>")==true)) $capture = false;
        if ($capture==true) {
            $code = trim(substr($line, 0, 8));
            $r = trim(substr($line, 8, 8));
            $g = trim(substr($line, 16, 8));
            $b = trim(substr($line, 24, 8));
            $hex = trim(substr($line, 33, 8));
            mysql_query("INSERT INTO pantone_conversions VALUES ('$code', $r, $g, $b, '$hex')");
        }
        if((strstr($line, "<pre>")==true)) $capture = true;
    }
    fclose($file);
    If you've got any questions about anything, let me know. If I'm completely off base on what you want to do, let me know that too :b

  3. #3
    jw06's Avatar
    657
    posts
    Registered User
    Lol, Well, I've been working on a search engine and I'm to the point of making a web spider to automatcily search content and add it to an index. What need the spider to do though, is get the <meta></meta> tags of the website its on though. I dont have a clude on how I would grab the content between the two tags. I also want it to grab any link that is on the web page so it can follow it later and keep updating. Does that explain it a little better?

  4. #4
    jw06's Avatar
    657
    posts
    Registered User
    ^^bump^^

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Home About kirupa.com Meet the Moderators Advertise

 Link to Us

 Credits

Copyright 1999 - 2012