Breaking Code

June 29, 2010

Using Google Search from your Python code

Filed under: Tools, Web applications — Tags: , , , , , , , , — Mario Vilas @ 6:31 pm

Hi everyone. Today I’ll be showing you a quick script I wrote to make Google searches from Python. There are previous projects doing the same thing -actually, doing it better-, namely Googolplex by Sebastian Wain and xgoogle by Peteris Krumins, but unfortunately they’re no longer working. Maybe the lack of complexity of this script will keep it working a little longer… 🙂

The interface is extremely simple, the module exports only one function called search().

        # Get the first 20 hits for: "Breaking Code" WordPress blog
        from google import search
        for url in search('"Breaking Code" WordPress blog', stop=20):

You can control which one of the Google Search pages to use, which language to search in, how many results per page, which page to start searching from and when to stop, and how long to wait between queries – however the only mandatory argument is the query string, everything else has a default value.

        # Get the first 20 hits for "Mariposa botnet" in Google Spain
        from google import search
        for url in search('Mariposa botnet', tld='es', lang='es', stop=20):

A word of caution, though: if you wait too little between requests or make too many of them, Google may block your IP for a while. This is especially annoying when you’re behind a corporate proxy – I won’t be made responsible when your coworkers suddenly develop an urge to kill you! 😀

Below are the download links (source code and Windows installers) and the source code for you to read online. Enjoy! 🙂


  • Version 1.0 (initial release).
  • Version 1.01 (fixed the IOError exception bug).
  • Version 1.02 (fixed the missing href bug reported by Rahul Sasi and the duplicate results bug reported by Slawek).
  • Version 1.03 (extracts the hidden links from the results page, thanks ubershmekel!).
  • Version 1.04 (added support for BeautifulSoup 4, thanks alxndr!).
  • Version 1.05 (added compatibility with Python 3.x, better command line parser, and also added some improvements by machalekj)
  • Version 1.06 (added an option to only grab the relevant results, instead of all possible links from each result page, as requested by user Nicky and others).


Source code: google-1.06.tar.gz

Windows 32 bits installer: google-1.06.win32.msi

Windows 64 bits installer:


Source code

Get the source code from GitHub:

    #!/usr/bin/env python

    # Python bindings to the Google search engine
    # Copyright (c) 2009-2014, Mario Vilas
    # All rights reserved.
    # Redistribution and use in source and binary forms, with or without
    # modification, are permitted provided that the following conditions are met:
    #     * Redistributions of source code must retain the above copyright notice,
    #       this list of conditions and the following disclaimer.
    #     * Redistributions in binary form must reproduce the above copyright
    #       notice,this list of conditions and the following disclaimer in the
    #       documentation and/or other materials provided with the distribution.
    #     * Neither the name of the copyright holder nor the names of its
    #       contributors may be used to endorse or promote products derived from
    #       this software without specific prior written permission.

    __all__ = ['search']

    import os
    import sys
    import time

    if sys.version_info[0] > 2:
        from http.cookiejar import LWPCookieJar
        from urllib.request import Request, urlopen
        from urllib.parse import quote_plus, urlparse, parse_qs
        from cookielib import LWPCookieJar
        from urllib import quote_plus
        from urllib2 import Request, urlopen
        from urlparse import urlparse, parse_qs

    # Lazy import of BeautifulSoup.
    BeautifulSoup = None

    # URL templates to make Google searches.
    url_home          = ""
    url_search        = ""
    url_next_page     = ""
    url_search_num    = ""
    url_next_page_num = ""

    # Cookie jar. Stored at the user's home folder.
    home_folder = os.getenv('HOME')
    if not home_folder:
        home_folder = os.getenv('USERHOME')
        if not home_folder:
            home_folder = '.'   # Use the current folder on error.
    cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie'))
    except Exception:

    # Request the given URL and return the response page, using the cookie jar.
    def get_page(url):
        Request the given URL and return the response page, using the cookie jar.

        @type  url: str
        @param url: URL to retrieve.

        @rtype:  str
        @return: Web page retrieved for the given URL.

        @raise IOError: An exception is raised on error.
        @raise urllib2.URLError: An exception is raised on error.
        @raise urllib2.HTTPError: An exception is raised on error.
        request = Request(url)
                           'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)')
        response = urlopen(request)
        cookie_jar.extract_cookies(response, request)
        html =
        return html

    # Filter links found in the Google result pages HTML code.
    # Returns None if the link doesn't yield a valid result.
    def filter_result(link):

            # Valid results are absolute URLs not pointing to a Google domain
            # like or
            o = urlparse(link, 'http')
            if o.netloc and 'google' not in o.netloc:
                return link

            # Decode hidden URLs.
            if link.startswith('/url?'):
                link = parse_qs(o.query)['q'][0]

                # Valid results are absolute URLs not pointing to a Google domain
                # like or
                o = urlparse(link, 'http')
                if o.netloc and 'google' not in o.netloc:
                    return link

        # Otherwise, or on error, return None.
        except Exception:
        return None

    # Returns a generator that yields URLs.
    def search(query, tld='com', lang='en', num=10, start=0, stop=None, pause=2.0,
        Search the given query string using Google.

        @type  query: str
        @param query: Query string. Must NOT be url-encoded.

        @type  tld: str
        @param tld: Top level domain.

        @type  lang: str
        @param lang: Languaje.

        @type  num: int
        @param num: Number of results per page.

        @type  start: int
        @param start: First result to retrieve.

        @type  stop: int
        @param stop: Last result to retrieve.
            Use C{None} to keep searching forever.

        @type  pause: float
        @param pause: Lapse to wait between HTTP requests.
            A lapse too long will make the search slow, but a lapse too short may
            cause Google to block your IP. Your mileage may vary!

        @type  only_standard: bool
        @param only_standard: If C{True}, only returns the standard results from
            each page. If C{False}, it returns every possible link from each page,
            except for those that point back to Google itself. Defaults to C{False}
            for backwards compatibility with older versions of this module.

        @rtype:  generator
        @return: Generator (iterator) that yields found URLs. If the C{stop}
            parameter is C{None} the iterator will loop forever.

        # Lazy import of BeautifulSoup.
        # Try to use BeautifulSoup 4 if available, fall back to 3 otherwise.
        global BeautifulSoup
        if BeautifulSoup is None:
                from bs4 import BeautifulSoup
            except ImportError:
                from BeautifulSoup import BeautifulSoup

        # Set of hashes for the results found.
        # This is used to avoid repeated results.
        hashes = set()

        # Prepare the search string.
        query = quote_plus(query)

        # Grab the cookie from the home page.
        get_page(url_home % vars())

        # Prepare the URL of the first request.
        if start:
            if num == 10:
                url = url_next_page % vars()
                url = url_next_page_num % vars()
            if num == 10:
                url = url_search % vars()
                url = url_search_num % vars()

        # Loop until we reach the maximum result, if any (otherwise, loop forever).
        while not stop or start < stop:

            # Sleep between requests.

            # Request the Google Search results page.
            html = get_page(url)

            # Parse the response and process every anchored URL.
            soup = BeautifulSoup(html)
            anchors = soup.find(id='search').findAll('a')
            for a in anchors:

                # Leave only the "standard" results if requested.
                # Otherwise grab all possible links.
                if only_standard and (
                            not a.parent or != "h3"):

                # Get the URL from the anchor tag.
                    link = a['href']
                except KeyError:

                # Filter invalid links and links pointing to Google itself.
                link = filter_result(link)
                if not link:

                # Discard repeated results.
                h = hash(link)
                if h in hashes:

                # Yield the result.
                yield link

            # End if there are no more results.
            if not soup.find(id='nav'):

            # Prepare the URL for the next request.
            start += num
            if num == 10:
                url = url_next_page % vars()
                url = url_next_page_num % vars()

    # When run as a script...
    if __name__ == "__main__":

        from optparse import OptionParser, IndentedHelpFormatter

        class BannerHelpFormatter(IndentedHelpFormatter):
            "Just a small tweak to optparse to be able to print a banner."
            def __init__(self, banner, *argv, **argd):
                self.banner = banner
                IndentedHelpFormatter.__init__(self, *argv, **argd)
            def format_usage(self, usage):
                msg = IndentedHelpFormatter.format_usage(self, usage)
                return '%s\n%s' % (self.banner, msg)

        # Parse the command line arguments.
        formatter = BannerHelpFormatter(
            "Python script to use the Google search engine\n"
            "By Mario Vilas (mvilas at gmail dot com)\n"
        parser = OptionParser(formatter=formatter)
        parser.set_usage("%prog [options] query")
        parser.add_option("--tld", metavar="TLD", type="string", default="com",
                          help="top level domain to use [default: com]")
        parser.add_option("--lang", metavar="LANGUAGE", type="string", default="en",
                          help="produce results in the given language [default: en]")
        parser.add_option("--num", metavar="NUMBER", type="int", default=10,
                          help="number of results per page [default: 10]")
        parser.add_option("--start", metavar="NUMBER", type="int", default=0,
                          help="first result to retrieve [default: 0]")
        parser.add_option("--stop", metavar="NUMBER", type="int", default=0,
                          help="last result to retrieve [default: unlimited]")
        parser.add_option("--pause", metavar="SECONDS", type="float", default=2.0,
                          help="pause between HTTP requests [default: 2.0]")
        parser.add_option("--all", dest="only_standard",
                          action="store_false", default=True,
                          help="grab all possible links from result pages")
        (options, args) = parser.parse_args()
        query = ' '.join(args)
        if not query:
        params = [(k,v) for (k,v) in options.__dict__.items() if not k.startswith('_')]
        params = dict(params)

        # Run the query.
        for url in search(query, **params):


  1. Hey, that’s pretty cool – I used to do the same with html5lib, BeautifulSoup and mechanize but you’d better remove the code quickly – this violates Google’s TOS.

    Comment by cryzed — June 30, 2010 @ 6:45 am

  2. Hey, this is really cool! I kept getting an IOError when trying to instantiate cookiejar – .google-cookie does not exist. Any ideas as to why it wouldn’t be creating the cookie? For now, I just caught the exception and passed, which is probably dangerous.

    Secondly, how can I modify this to get the number of results for a specific query? I’m not sure how to parse the html accordingly, because I can’t quite just print out the variable ‘html’ to see what it looks like.


    Comment by Bob — July 5, 2010 @ 6:30 am

  3. Ah sorry for the spam. I actually figured out both my questions, haha. Using soup.prettify(), I outputted the html and was able to look at BeautifulSoup documentation and parse accordingly.

    I removed the IOException catching, and it turns out that the .google-cookie was created, just not on the first pass of the program (or at least this is what I think happened?).

    Comment by Bob — July 5, 2010 @ 7:07 am

  4. @cryzed: Thanks for the comment! I don’t think posting the code violates the TOS, but using it may. I’m not sure exactly which is covered by the TOS and which isn’t, really. 😦

    @Bob: Not spam at all! 🙂

    I think maybe different versions of the cookiejar module throw different exceptions. The first run is supposed to raise an exception since the file doesn’t exist yet, but it should be a cookiejar.LoadError when calling load(), not an IOError when instantiating the object.

    Catching the exception and running it once was the right call, but I should think of a more elegant solution…

    Comment by Mario Vilas — July 5, 2010 @ 8:36 am

  5. Good one!
    Only one point, the xgoogle project is still working at the momment.


    Comment by Emilio — August 6, 2010 @ 10:05 pm

  6. @Emilio: Thanks! Good to know xgoogle is working now, at the time I wrote this it had the “0 results” bug too.

    Comment by Mario Vilas — August 9, 2010 @ 5:03 am

  7. xgoogle is still giving me “0 results”. Is this fixed or is something else going on…

    Comment by Dave — August 23, 2010 @ 3:02 pm

  8. Hi,

    look at the last comments in the post of catonmat. There is a minor fix. Currently I am using a functional version after this fix.


    Comment by Emilio — August 23, 2010 @ 3:36 pm

  9. Well done,

    I just removed youtube link by replacing :
    if o.netloc and (‘google’ not in o.netloc)
    if o.netloc and (‘google’ not in o.netloc) and (“youtube” not in o.netloc):

    Thanks for this code !
    I Hope it will work for a long time.

    Comment by NaN — August 27, 2010 @ 11:19 pm

  10. Mario

    There is a problem with getting results from google search using your script.

    If google do not show local business results, your script works great but if shows (map and links) I get duplicate results like:
    (these are urls by map)

    and results from 1 to 10 on first page do not show

    is this clear? i can send you screenshot with incorrect urls


    Comment by Slawek — October 1, 2010 @ 9:36 pm

  11. […] a quick and dirty fix IMHO the best python google API written by Mario Vilas from breakingcode […]

    Pingback by Using Google Search from your Python code [fixed] « Ulisses Castro Security Labs — November 21, 2010 @ 3:32 am

  12. […] Google has undergone a lot of changes since 2001 and Googolplex and other  libraries like xgoogle are now part of Internet history. A similar new library  is available at Mario Vilas Google Search Python blog post as Quickpost: Using Google Search from your Python code. […]

    Pingback by Google Search NoAPI « Data Big Bang Blog — January 20, 2011 @ 8:03 pm

  13. […] Google, information gathering, LinkedIn, open source, python, recon, search, tool, web Breaking Code This entry was posted in Breaking Code and tagged code, from, Google, Python, Quickpost, Search, […]

    Pingback by Quickpost: Using Google Search from your Python code | — January 24, 2011 @ 9:13 am

  14. It works fine. Thanks..

    Comment by urkera — August 14, 2011 @ 2:57 am

  15. Very good. Only working google searcher that i’ve found. using the fix it works very fine

    Comment by Juliano Costa Machado — October 24, 2011 @ 2:00 am

  16. Found this update that works even if you get the funky fake url’s google is using (/?url=

    Comment by ubershmekel — February 13, 2012 @ 8:23 pm

  17. Nice! I’ll patch it right away, thanks! 🙂

    Comment by Mario Vilas — February 13, 2012 @ 8:48 pm

  18. i only get 10 results per a page no matter if i set higher number for num

    anyone find a fix for this?

    Comment by dan — March 21, 2012 @ 7:51 am

  19. ok after some more googling.. finally found the fix for being limited to 10 results:

    add “as_qdr=all” to the url

    Comment by dan — March 21, 2012 @ 8:38 am

  20. @dan: Interesting! That limit didn’t exist when I originally wrote the script, it must be a new parameter to the URL. I’ll fix the script right away, thanks!

    Comment by Mario Vilas — March 21, 2012 @ 11:26 am

  21. @dan: Odd, I can’t seem to reproduce your results. Adding as_qdr=all seems to have no effect (good or bad), and setting a higher number for “num” is working fine for me in the first place…

    I’ll take note of this just in case, but for now I won’t be modifying the script since I can’t reproduce the problem.

    Comment by Mario Vilas — March 21, 2012 @ 11:36 am

  22. hmm yes that is odd.

    i believe what is causing it for me is google’s new “Google Instant” feature, and adding as_qdr=all seems to shut it off.

    would be nice to figure out why it is happening for me.

    well i did a couple tests… and what TLD are you using, like .com or something else? it seems the 10 results limit is only happening to me with .com, and not with other TLDs (i tested .ca and, and both give me the full number of results without adding as_qdr=all)

    Comment by dan — March 21, 2012 @ 5:16 pm

  23. I think this is happening or not depending on the query you make. Reading up on this: it says the as_qdr parameter controls how old are the results it gives you.

    This is going to become a more complex problem IMHO. I need to read the whole document and see what can be implemented in the script.

    Comment by Mario Vilas — March 22, 2012 @ 6:07 pm

  24. It’s highly unfortunate Google doesn’t have an actual API for searching (their Custom Search API is intentionally crippled), unlike Bing and Yahoo. I can’t use Bing or Yahoo though because Google consistently gives me the best, the most, and the most accurate results for things I’m trying to retrieve, so I’ve been using hacks like this instead.

    This is really good though; thank you for it. It’d be great if you keep maintaining it for a while.

    Comment by Anorov — March 23, 2012 @ 11:32 am

  25. I’m getting the following error

    >> python ‘2012 movies’

    Traceback (most recent call last):
    File “”, line 211, in
    for url in search(query, stop=20):
    File “”, line 175, in search
    soup = BeautifulSoup.BeautifulSoup(html)
    File “/usr/lib/pymodules/python2.6/”, line 1499, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
    File “/usr/lib/pymodules/python2.6/”, line 1230, in __init__
    File “/usr/lib/pymodules/python2.6/”, line 1263, in _feed
    File “/usr/lib/python2.6/”, line 108, in feed
    File “/usr/lib/python2.6/”, line 148, in goahead
    k = self.parse_starttag(i)
    File “/usr/lib/python2.6/”, line 266, in parse_starttag
    % (rawdata[k:endpos][:20],))
    File “/usr/lib/python2.6/”, line 115, in error
    raise HTMLParseError(message, self.getpos())
    HTMLParser.HTMLParseError: junk characters in start tag: u'{t:119}); class=gbzt’, at line 1, column 32127

    Comment by John — May 17, 2012 @ 12:28 am

  26. I think that may be a problem with the version of BeautifulSoup. At least I can’t seem to reproduce it, using the exact same search query.

    Comment by Mario Vilas — May 17, 2012 @ 12:34 am

  27. Thanks for you help. What version of BeautifulSoup would you recommend? I’m running it on Ubuntu

    Comment by John — May 17, 2012 @ 6:00 am

  28. You’re welcome! I’m using version 3.2.0 on Windows, but I see in the webpage that the latest 3.x version is 3.2.1, so I’d go with that. Version 4.x probably won’t work.

    Comment by Mario Vilas — May 17, 2012 @ 10:57 am

  29. Hi Mario, thanks for your great package.
    I get the very same error as John.
    What is most interesting, it runs great on Windows, but throws an error on LInux.
    – Windows 7, Python 2.6.6, BeautifulSoup 3.2.1
    – BackTrack 5 R2, Python 2.6.5, BeautifulSoup 3.2.1
    I have managed to make it work by commenting out the lines below in, but that is not the best solution IMHO 😦

    offset = offset + len(self.__starttag_text)
    self.error(“junk characters in start tag: %r”
    % (rawdata[k:endpos][:20],))

    Comment by Dejan — May 18, 2012 @ 1:37 pm

  30. UPDATE:
    It seams that my BackTrack was using BeautifulSoup from /usr/shared/pyshared instead of 3.2.1 from /usr/local/lib/python2.6/dist-packages/
    It works now 🙂

    Comment by Dejan — May 18, 2012 @ 2:18 pm

  31. […] though.) The code is also available on GitHub, together with a small third-party library (via Mario Vilas) that’s used to access Google search results. Here’s the […]

    Pingback by How to answer a question: a simple system | DDI — June 13, 2012 @ 6:30 pm

  32. Project started in 2010 and still working like a charm in July 2012… AWESOME 🙂

    Comment by topo — July 2, 2012 @ 6:22 pm

  33. Thanks! 😀

    Comment by Mario Vilas — July 3, 2012 @ 1:25 pm

  34. I work with it correctly when using stand alone python script. But when I run this script from Django, it encounters “HTTP Error 503: Service Unavailable”.
    Is there any possible solution?

    File “” in get_page
    82. response = urllib2.urlopen(request)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in urlopen
    126. return, data, timeout)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in open
    406. response = meth(req, response)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in http_response
    519. ‘http’, request, response, code, msg, hdrs)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in error
    438. result = self._call_chain(*args)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in _call_chain
    378. result = func(*args)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in http_error_302
    625. return, timeout=req.timeout)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in open
    406. response = meth(req, response)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in http_response
    519. ‘http’, request, response, code, msg, hdrs)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in error
    444. return self._call_chain(*args)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in _call_chain
    378. result = func(*args)
    File “/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/” in http_error_default
    527. raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

    Exception Type: HTTPError at
    Exception Value: HTTP Error 503: Service Unavailable

    Comment by papalagi — July 7, 2012 @ 1:05 pm

  35. @papalagi: I haven’t ever used Django, but is it possible that it does some global change that affects the behavior of urllib2? It could also be something in your environment, from the traceback it seems there was a 302 (URL redirection) before the 503 error. Try sniffing the network traffic, maybe whatever is going wrong will stand out there.

    Comment by Mario Vilas — July 7, 2012 @ 5:57 pm

  36. Hi Mario, first thanks for you class.

    I have a little problem, so a i have the next search

    for url in search(‘gato’, num=100, start=0,stop=700):

    The first page to visit is

    In the browser, contains 100 or more results, but, with your class, get very few records (about 16 of the first page).

    Does your know why?


    Comment by Pedro — July 24, 2012 @ 9:37 pm

  37. Hi Mario, thank for the awesome tool. It was working fine but now I’m getting the following error

    Traceback (most recent call last):
    File “”, line 216, in
    for url in search(query, stop=11):
    File “”, line 162, in search
    get_page(url_home % vars())
    File “”, line 84, in get_page
    response = urllib2.urlopen(request)
    File “/usr/lib/python2.7/”, line 126, in urlopen
    return, data, timeout)
    File “/usr/lib/python2.7/”, line 391, in open
    response = self._open(req, data)
    File “/usr/lib/python2.7/”, line 409, in _open
    ‘_open’, req)
    File “/usr/lib/python2.7/”, line 369, in _call_chain
    result = func(*args)
    File “/usr/lib/python2.7/”, line 1185, in http_open
    return self.do_open(httplib.HTTPConnection, req)
    File “/usr/lib/python2.7/”, line 1160, in do_open
    raise URLError(err)

    Comment by Johny — July 24, 2012 @ 11:05 pm

  38. Hello Pedro! I’m guessing the difference is in the real-time search API. Currently the Google Search page uses this new API which is far better (among other things because it works over Ajax instead of a full page reload, and has less restrictions than the old API). This script uses the old API which is still maintained for backwards compatibility with older browsers, and it’s so on purpose to make sure it still works with no changes over a longer time – but it also means the search results won’t be exactly the same.

    Sebastian Wain (the author of Googolplex) has recently written a new Google Search script that works on the Ajax API instead, if you need to get results closer to the web page’s you’ll want to try this one out 🙂


    Comment by Mario Vilas — July 25, 2012 @ 9:29 am

  39. Hi Johny! That traceback seems incomplete (there’s no error message for the URLError exception), but that kind of error usually means some connectivity problem. Make sure you can reach normally from that machine and the Python interpreter isn’t prevented by SELinux from creating sockets and connecting to the outside (some free hosting providers do that to prevent misuse). Good luck! 🙂

    Comment by Mario Vilas — July 25, 2012 @ 9:36 am

  40. Thanks Mario. I’ll try to debug as you said. I also think that it may be some connectivity problem.

    Comment by Johny — July 27, 2012 @ 8:13 pm

  41. Hi, awesome binding, I’ve been searching a long time for something that works, I created this using your binding :

    Comment by Lukas Nemec — September 10, 2012 @ 12:12 pm

  42. Thankyou for this amazing piece of code.

    Comment by xdr — April 23, 2013 @ 2:47 am

  43. Thank you for your comment! 🙂

    Comment by Mario Vilas — April 23, 2013 @ 10:10 am

  44. HI Mario!
    First of all – thanks for this cool library!

    I tried to use proxy to avoid blocking my home IP by google. After around 100 search requests, I was blocked and even when I disabled proxy it turned out that my home IP was also blocked! I suspect this is because of cookies. Am I right?
    How can I use proxy with your library and avoid google’s penalties. Can I disable cookies or tweak something else?

    Comment by Olexiy Logvinov (@OlexiyL) — May 23, 2013 @ 11:33 am

  45. Thanks! 🙂

    Honestly I’ve never tried circumventing the IP block, but I’d suggest deleting the cookie only after you know you’ve been banned. The script stores the cookie in a file called “.google-cookie” in the home directory for the current user, you can try deleting the file. From Python code that imports instead of running it, you could try this code: import google; google.cookie_jar.clear()

    Let me know if it works!

    Comment by Mario Vilas — May 23, 2013 @ 1:11 pm

  46. Hello Mario,

    thanks alot for this great tool! Im very interested in getting the estimated search result number from google. Do you plan to implement this function and if not can you point me to a direction how this can be done by modifying your great module? Thanks again, best regards!

    Comment by zwieback86 — July 9, 2013 @ 11:27 am

  47. Me again, I already got it done by myself. It was really easy didnt expect that. Greets!

    Comment by zwieback86 — July 9, 2013 @ 8:54 pm

  48. @zwieback86: Hehehe, you coded it faster than I got around to looking at the blog comments! 😉

    Comment by Mario Vilas — July 10, 2013 @ 10:17 am

  49. I am extremely thankful for this tool and appreciate you taking the time to code it. I was using a windows version of a google scraper previously, but since I don’t like to use windows and was unfamiliar with the language it was coded in, this became the perfect solution. One thing that I miss about the windows version though is that I could specify a text file of proxies for it to use and it was able to do lots of searches without google being able to stop it. I am not sure how to incorporate this into the code myself, do you have any suggestions?

    Comment by Linco — August 2, 2013 @ 7:30 pm

  50. Sounds like a great feature to add! Since this script uses urllib2 to make the HTTP queries, you’d have to set the proxy like this:

    Comment by Mario Vilas — August 2, 2013 @ 8:25 pm

  51. Always get 503 Service Unavailable in all my 3 VPS and my local.


    Comment by Zeray Rice — October 24, 2013 @ 4:37 pm

  52. Seems to be working on my machine… Can you tell me anything else, like what country are you trying from, or what search query are you using?

    Comment by Mario Vilas — October 24, 2013 @ 4:43 pm

  53. It doesn’t work I get 403 Forbidden error

    Comment by asdasda — December 31, 2013 @ 11:35 am

  54. No idea why the 1.03 link is broken, I can see it in the directory index and I didn’t change anything. Sourceforge hosting sucks I guess. 😦

    Try this instead for version 1.03:

    The link to the latest version seems to be working for me. But if it doesn’t anymore just go to Github instead, it’s much more reliable.

    Comment by Mario Vilas — December 31, 2013 @ 5:44 pm

  55. Cool tool, thanks! I noticed that it grabs a lot of urls off the page in addition to the standard 10 big blue search results. For anyone who wants to eliminate all the extra urls, I added

    if not == “h3”:

    immediately after the line that reads

    for a in anchors:

    Comment by Nicky — April 13, 2014 @ 8:24 pm

  56. Thanks! I just committed a change to add that as an option. 🙂

    Comment by Mario Vilas — April 14, 2014 @ 2:11 pm

  57. Hi, I am thinking if we could get results in a specific time range? For example, if we want the results from 01/01/2009 to 01/01/2010, can we pass this parameter to your function? I think it should be fine. So could you please make it clear?

    Comment by David — April 23, 2014 @ 10:18 pm

  58. Hi David. Does the Google Search legacy API support this? I don’t recall such option… but if there is, let me know how to use it and I’ll add it to the script. 🙂

    Comment by Mario Vilas — April 23, 2014 @ 11:57 pm

  59. Hi, I think there could be someway to add the specific time range, I tried but do not know why it always failed.
    You could pass one more parameter in the url like this:,cd_min:1/1/2009,cd_max:2/1/2009,lr:lang_1en
    so “tbs=cdr:1,cd_min:1/1/2009,cd_max:2/1/2009” means to search result between 1/1/2009 and 2/1/2009.
    The thing is whenever I request this url in my program, it always returned the original search result. Here the original results means just search query without time range. However, when I open this in the browser, it can return our desired results.
    Besides, what puzzles me is if I save the html of the search results, it turns out there is no information on the html file. The only way to save other links is we have to save the web archive in order to get the links of the query. So I am really lost about it. Could you please have a look at it and we can discuss about it.

    Comment by David — May 3, 2014 @ 6:23 pm

  60. Also, you could see this post about the parameters of google search urls. I tried about one week to figure out how to get the results of time range search in program, but it always fetch the original search results instead of time range search. Hope you could find a way! Thanks!

    Comment by David — May 3, 2014 @ 6:25 pm

  61. It works differently in the browser because at some point Google switched to a new API using Ajax, so all you’ll see in the HTML is the JavaScript code to access the API, not the search results. This script uses the legacy API that was in place before the switch, so while it’s more conveniente (no Ajax, HTML is parseable) some options that were added later to the search engine may not work.

    Comment by Mario Vilas — May 3, 2014 @ 6:36 pm

  62. If so, how could we solve this problem in order to get time range search results?

    Comment by David — May 3, 2014 @ 8:07 pm

  63. May I ask a following question: I am trying to obtain ca 1000 results. So I set the maximum to 1000, however the script ends after 230. Another search word returns another number. Why doesn’t it reach 1000, although putting the same word in search engine in browser gives more like 672,000 results? Where does the limitation come from? I will also add: the script doesn’t fail, it simply stops.

    Comment by stysia — September 15, 2014 @ 10:26 am

  64. @stysia: that’s surely a limitation of the Google legacy API itself, as the script doesn’t do much more than passing the arguments to Google. You can try running the same query in your browser (with JavaScript disabled, to force the old website) to see what went wrong, but I bet you’re simply hitting some kind of hardcoded limit.

    Another thing that could happen is that you could start getting captcha prompts instead of results, I’m unsure of that triggers this exactly, but of course the script will find no results and stop working when that happens.

    Comment by Mario Vilas — September 15, 2014 @ 12:07 pm

  65. @Mario Vilas: I know that if I don’t pause for long enough between queries then probably the captcha problem appears what you explain in the latter part. With the Java Script disabled, the response is still given in thousands. Literally, my search query is one single word, and I would like to put the found urls in a file. so I have in my code:
    for url in search(“word” , stop =1000, pause =10) :
    time.sleep(random amount of time to avoid being blocked)
    write down the URL
    And maximum I got was 370 so far. :S But thanks for the hints about the reason.

    Comment by stysia — September 15, 2014 @ 3:09 pm

  66. Hello Mario:
    I’m using python 2.7 and i’m making a bot for IRC.
    I test the code in python and work, except if i put stop=3 or stop =1 always show me 20 results.

    I want to use your tool in my bot to show a result of a search made for some user in the channel but i got the next error:
    TypeError: search() got an unexpected keyword argument ‘lang’

    And when i don’t put any param i receive that a least one param must be.
    Can you help me, please?

    the part of the code of my bot:

    vartmp = line[1]
    vartmp2 = vartmp.replace(‘$g ‘, ”)
    searchgg = vartmp2
    for res in search(searchgg, lang=’es’, stop=3):
    print res
    #send_msg(“%s\n” % url.encode(“utf8”))

    Thank you.


    Comment by Maximiliano Faccone — September 23, 2014 @ 6:04 am

  67. @Max: My guess is you have imported more than one function called “search” and you’re now calling the wrong one 🙂 that’s the only thing I can think of that can explain the “lang” parameter being missing.

    As for the “stop” parameter, that really depends on the Google legacy API, the script only passes all parameters to Google and returns whatever Google gives it. My suggestion is to use all default parameters and filter out what you want using Python.

    Comment by Mario Vilas — September 23, 2014 @ 11:52 am

  68. Google Scraper
    watch this for scrape link from google search results

    visit my blog to download file

    Comment by bejo6 — October 24, 2014 @ 10:22 am

  69. @Mario: Thanks for this .I want to run this code in a loop for multiple queries. I only want to extract the first search result. What is the daily limit for such a search and if it’s possible to circumvent it. Thanks

    Comment by Ankit Jain — January 12, 2015 @ 11:16 pm

  70. Thanks for the info…!

    Comment by Vijay Kumar — January 23, 2015 @ 7:24 pm

  71. @Ankit: the daily limit really depends on Google, and I can’t make any assurances on it. I’d run it as slowly as possible to avoid problems. Another trick would be to use multiple proxies to get different IP addresses.

    Comment by Mario Vilas — January 29, 2015 @ 2:28 pm

  72. for url in search(‘Happy Pi Day’,lang=’en’,num=1,start=0,stop=1, pause=2.0):

    This return 5 URLs….I want it to return either 1,2,3 URLs only. How do i go about it?

    Comment by Shravan Kale — March 14, 2015 @ 3:27 pm

  73. Hi Shravan! That depends on the Google API, not on this script – I suppose they don’t have that much granularity. In this case you’ll have to filter out the extra results yourself (be careful, you may actually get /less/ results too!).

    Comment by Mario Vilas — March 17, 2015 @ 4:11 am

  74. hi Mario..will changing the User Agent make a difference to the search results..?

    Comment by Nikhil Nayanar — June 21, 2015 @ 2:22 am

  75. also..i notice that some of the results visible on the first page in the browser and the links obtained from the script above aren’t the same..? what could be the reason for that..?

    Comment by Nikhil Nayanar — June 21, 2015 @ 4:42 am

  76. @Nikhil: no idea, try it and let me know! As for the different results, it’s because this script uses the legacy API before the JavaScript UI was added, so it’s likely to be missing the “customizations” made per user nowadays.

    Comment by Mario Vilas — June 21, 2015 @ 11:10 am

  77. yeah..I tried with a chrome agent and gives me the same results…also I did use the new API with json ..that too skips the same results…could it be that google doesn’t return the customized results..?

    Comment by Nikhil Nayanar — June 21, 2015 @ 10:16 pm

  78. Apparently not! Did you try accessing Google from IP addresses in different countries? I bet they do customize based on the country you access from.

    It’d also be interesting to log in to Google Apps and send the cookies along with your queries, to see if you get customized results for your user.

    Comment by Mario Vilas — June 22, 2015 @ 8:22 am

  79. i meant could it be that Google doesnot send back customized results when we query it from python or so. ?

    Comment by Nikhil Nayanar — June 22, 2015 @ 1:22 pm

  80. I would not expect it to – the python code is using a legacy API.

    Comment by Mario Vilas — June 22, 2015 @ 2:16 pm

  81. yes…thats a valid point…btw i did have one more question..are the search templates you have used are they from a repository of google ?

    Comment by Nikhil Nayanar — June 22, 2015 @ 3:08 pm

  82. Hi Mario,
    Thank you for the awesome tool.
    I faced a issue, even with User-agent added to the request, this results in ‘urllib2.HTTPError: HTTP Error 503: Service Unavailable’.
    Any solution ?

    Comment by Prat — August 10, 2015 @ 10:46 am

  83. Hi Prat. I’m afraid I can’t figure out what happened to you just with that information – I would guess you either have a proxy breaking things or you exceeded the maximum number of requests.

    Comment by Mario Vilas — August 10, 2015 @ 8:32 pm

  84. @Nikhil: I have used nothing from Google, all the code is mine.

    Comment by Mario Vilas — August 10, 2015 @ 8:33 pm

  85. Hi Mario,
    after that exception I’m not able to send more requests for a day.
    Yes, I exceeded more than 100 requests. I wanted to confirm if this tool does handle sending more than 100 requests ?

    Comment by Prat — August 11, 2015 @ 5:16 am

  86. Hi Mario,

    using user-agent will disguise the script as browser, so I should be able to send more than 100 requests. Is it correct ?

    Comment by Prat — August 13, 2015 @ 6:37 pm

  87. No, I don’t think that would work. Wish it did though, it’d be so simple! 😉

    Comment by Mario Vilas — August 13, 2015 @ 7:01 pm

  88. any other solution for sending more than 100 requests ?

    Comment by Prat — August 14, 2015 @ 5:24 am

  89. […] Here’s my approach to this: […]

    Pingback by » Python:Google Search from a Python App — September 1, 2015 @ 8:22 pm

  90. is it possible to get not just results url’s but also description? or just description?

    Comment by Atis — March 6, 2016 @ 12:37 pm

  91. Sure, I suppose so. This was a quick script for getting search results, it was never meant to be a full-blown API, but you can certainly patch it to make it work that way too. Instead of extracting the links you would get a different tag using BeautifulSoup.

    Comment by Mario Vilas — March 7, 2016 @ 6:07 pm

  92. Can you change the inputs using (input) rather than having to write out which things you want to search for as i am writing an AI program and everything that is asked is done through IDLE rather than in the scripting interface,if that makes sense?

    Comment by David Myers — April 3, 2016 @ 11:53 am

  93. I’m sorry, I didn’t understand your request. 😦 Can you try elaborating a bit? What is the problem you are trying to solve?

    Comment by Mario Vilas — April 3, 2016 @ 12:15 pm

  94. Basically im trying to write a code like J.A.R.V.I.S from Iron Man in python and i want to add in a google search feature,The program works fine when the user inputs what they want to in the code itself but I want the code to ask a question such as What is the subject you are searching for? and then Is there any keywords to search for specifically? using a variable but i cant get something to work due tothe use of the (‘””‘) as this turms it into text rather than using a variable name such as X for example but it wont work and im just wondering whether there would be a way around this?

    Comment by David Myers — April 4, 2016 @ 10:12 pm

  95. Maybe you should post your code somewhere and ask for help that way, it’s a bit confusing otherwise 🙂

    Comment by Mario Vilas — April 5, 2016 @ 12:37 pm

  96. if Message == (“Google”):
    Y=raw_input(“What would you like for me to search for?”)
    Z=raw_input(“Are there any keywords that you would like me to use?”)
    from google import search
    for url in search(‘”Y”Z’ ,stop=10):

    This is the current code that i am using for the google search but Y + Z are user inputs and if i run this code now it brings up a list of websites with Y and Z in them which isnt what i would like it to do,could you see if anything can be done to get this to work correctly,Thank you

    Comment by David Myers — April 5, 2016 @ 1:29 pm

  97. Maybe you want something like this: search(‘”%s” %s’ % (Y,Z), stop=10)

    Comment by Mario Vilas — April 5, 2016 @ 1:44 pm

  98. Works perfectly,thanks man. 🙂

    Comment by David Myers — April 5, 2016 @ 6:51 pm

  99. You’re welcome! 🙂

    Comment by Mario Vilas — April 5, 2016 @ 7:54 pm

  100. Hi, Mario!

    Thank you very much for this simple program, I was struggling with headers and this helped to receive a valid response. I have a bit specific question (regarding “modern” Google results). Is it possible to fetch the results from part of the page that is generated when you search for a person or a band? For example, if you search for “mesut ezil”, on the right side of the page will the image and some text from the Wikipedia page.

    I suppose not, because it is “generated”, as I said, so it’s not available in the static HTML, but you maybe know something to try out?

    Tnx in advance and regards!

    Comment by Kristijan Bartol — April 8, 2016 @ 11:44 am

  101. Hi! For something like that I would go with Selenium:

    Comment by Mario Vilas — April 8, 2016 @ 4:40 pm

  102. There is an error as follows when I rum the code.
    File “C:\Python27\lib\site-packages\”, line 204, in search
    anchors = soup.find(id=’search’).findAll(‘a’)
    AttributeError: ‘NoneType’ object has no attribute ‘findAll’

    Comment by Guosheng Kang — July 8, 2016 @ 8:49 am

  103. @Guosheng Kang: it doesn’t seem to be happening to me. I think maybe you have some sore of network issue, like a transparent proxy mangling your requests, or a wifi hotspot redirecting you somewhere else…

    Comment by Mario Vilas — July 8, 2016 @ 10:33 am

  104. […] Using Google Search from your Python code | Breaking Code Quick script in Python to make Google searches. Can be used as a command line script or imported as a module. Download: […]

    Pingback by検索結果を取得するモジュール | 基本から学ぶPython3修練場 — December 4, 2016 @ 4:22 am

  105. Helpful info. Fortunate me I discovered your site unintentionally, and I
    am shocked why this twist of fate didn’t came about in advance!
    I bookmarked it.

    Comment by health — January 22, 2017 @ 1:27 pm

  106. Hi Mario,
    first of all thank you for this module, works like a charm and helped me a lot in the past.
    However recently is started deploying my python applications to an Amazon Web Service Instance. My code is a Flask Application, which uses your module for searching google. When run from my locally machine everything works fine, but as soon as it is deployed on AWS google answers all the requests with a 503 Error. Ive tested curling google and sending my own request with urrlib2 and both works. Only when I use your library google answers with a 503. Only thing i suspect right now is that it might be down to your use of cookies, but im not too sure how that works.

    Do you have any idea what might cause this problem, or did you even encounter something similar before? Any help would be appreciated.

    Thank You!

    Comment by Mustafa Rashed — February 21, 2017 @ 12:02 pm

  107. Hi, Mario,
    I really like your solution, I think it’s easy to use and elegant.
    however, I have struggled to alter the solution to getting the titles as well as the urls
    I’m hoping that you can give me some advise on retrieving the titles by altering this current solution you’ve written

    thanks a lot!

    Comment by mary huang — April 12, 2017 @ 1:51 pm

  108. Hi Mario – thanks so much for sharing! After hours of scouring the internet for this exact solution, I was so pleased to find this page! Quick question: I need to extract exactly 10 urls from a google search. I am using ‘stop=10’ but for some reason this sometimes returns more than 10 or less than 10 results. Any idea why this would occur? Thanks again!!

    Comment by Mary — May 17, 2017 @ 4:48 pm

  109. […] from their Python code. If you are interested in learning more about this module then please see this […]

    Pingback by welcome — May 28, 2017 @ 2:39 pm

  110. Hi Mario, thanks a lot for the code. I am trying to get only one url by setting stop = 1. But as pointed out by mary it is giving more than 10 urls. I tried by changing num, still no vail. Could you please look into it. Thank you.

    Comment by Srik — June 5, 2017 @ 10:54 am

  111. Stress is one more variable that causes IBS.

    Comment by ibs d diet — June 6, 2017 @ 2:00 pm

  112. I’m getting a 503 after about 60 iterations of my loop. I can let my program run as long as it needs to as long as Google isn’t going to flag it and cause the script to fail. Any suggestions on how long to set the pause to that doesn’t get blocked? There are about 94,000 values it needs to search. I have a stacksoverflow question up as well here:

    Comment by TJ — June 16, 2017 @ 12:33 am

  113. Hey there – try:

    req = urllib2.Request(url, headers={‘User-Agent’ : “Magic Browser”})
    con = urllib2.urlopen( req )
    stuff =
    for reference in list_of_references:
    results = re.findall(reference, stuff)
    if not results:
    except Exception:
    return “Unable to access source code”

    Let me know if that makes sense!

    Comment by madagascarmary — June 19, 2017 @ 2:42 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at

%d bloggers like this: