Hi everyone. Today I’ll be showing you a quick script I wrote to make Google searches from Python. There are previous projects doing the same thing -actually, doing it better-, namely Googolplex by Sebastian Wain and xgoogle by Peteris Krumins, but unfortunately they’re no longer working. Maybe the lack of complexity of this script will keep it working a little longer… 🙂
The interface is extremely simple, the module exports only one function called search().
# Get the first 20 hits for: "Breaking Code" WordPress blog from google import search for url in search('"Breaking Code" WordPress blog', stop=20): print(url)
You can control which one of the Google Search pages to use, which language to search in, how many results per page, which page to start searching from and when to stop, and how long to wait between queries – however the only mandatory argument is the query string, everything else has a default value.
# Get the first 20 hits for "Mariposa botnet" in Google Spain from google import search for url in search('Mariposa botnet', tld='es', lang='es', stop=20): print(url)
A word of caution, though: if you wait too little between requests or make too many of them, Google may block your IP for a while. This is especially annoying when you’re behind a corporate proxy – I won’t be made responsible when your coworkers suddenly develop an urge to kill you! 😀
Below are the download links (source code and Windows installers) and the source code for you to read online. Enjoy! 🙂
- Version 1.0 (initial release).
- Version 1.01 (fixed the IOError exception bug).
- Version 1.02 (fixed the missing href bug reported by Rahul Sasi and the duplicate results bug reported by Slawek).
- Version 1.03 (extracts the hidden links from the results page, thanks ubershmekel!).
- Version 1.04 (added support for BeautifulSoup 4, thanks alxndr!).
- Version 1.05 (added compatibility with Python 3.x, better command line parser, and also added some improvements by machalekj)
- Version 1.06 (added an option to only grab the relevant results, instead of all possible links from each result page, as requested by user Nicky and others).