Posts tagged "sphinx"

Thinking Sphinx: Searching By Location And Keyword

Thinking Sphinx is a great search library for Ruby projects. It has clean syntax and some nice options for indexing, searching and sorting. In this tutorial I'll show how you can use TS to perform searches based on both a geographical point and a keyword.

The Problem

Let's say that we're building a Rails app to index independent coffee shops in our town. Every coffee shop has a name, a description and some comments, as well as a latitude and longitude value so that we can place it on a map. We want to allow our users to search through the coffee shops in our database by providing keywords and a location. We want our searches to take into account BOTH the relevance of the results based on keyword matches, as well as their proximity to the location given.

The Solution

First, we'll of course need to install Sphinx and Thinking Sphinx. It's probably also a good idea to read this introduction to Sphinx.

This tutorial also uses the Geokit gem, so be sure to grab that as well.

The Model

Once Sphinx and Thinking Sphinx are installed, we are ready to define the indexes on our model. This tells Sphinx which fields to store in its index for searching, which attributes we want to have available for sorting and filtering, as well as any other properties we want to define.

# Table name: coffee_shops
#
#  id              :integer(4)      not null, primary key
#  name            :string(255)
#  description     :text
#  lat             :float
#  lng             :float

class CoffeeShop < ActiveRecord::Base
  
  has_many :comments

  PER_PAGE = 10
  
  define_index do
    # fields
    indexes :name
    indexes :description
    indexes comments.body, :as => :comments
    
    # attributes
    has 'RADIANS(lat)', :as => :lat,  :type => :float    
    has 'RADIANS(lng)', :as => :lng,  :type => :float 
       
    # properties
    set_property :latitude_attr  => 'lat'
    set_property :longitude_attr => 'lng'
    set_property :field_weights  => { 'name'        => 10,
                                      'description' => 2,
                                      'comments'    => 1 }
  end
  
end

The fields in the index tell Sphinx that we want our search to look at the name and description of our coffee shops, as well as the body of any comments that have been made. Notice that we're able to index not only fields that are in our coffee_shops table, but also fields from associated records - in this case the bodies of our comments.

Defining a lat and lng attribute are necessary for doing geography-based searches. The big gotcha here is that Sphinx needs these attributes to be stored as radians, whereas most geocoding APIs (such as Google) use decimal degrees. The SQL 'RADIANS(lat)' will automatically do this conversion for you. If you happen to have your lat and lng stored as radians already, however, you can just define your attributes like this:

# attributes
has :lat  
has :lng

Finally, we define our properties. The :latitude_attr and :longitude_attr properties tell Sphinx which fields we're using for our geography calculations. The :field_weights define how much weight we want to give to each indexed field. If we get a match on the name of one of our coffee shops, that should weigh heavier in the relevance than if we got a match on one of our comment bodies.

The Search Class

Since we want to keep our controllers lean and write re-usable code, it's a good idea to move our search logic into its own class. Let's create the file search.rb in our lib folder for this.

include Geokit::Geocoders

class Search
  
  METERS_PER_MILE = 1609.344
  SORT_EXPRESSION = "@weight * @weight / @geodist"
  
  def self.execute(keywords, var = {})
    @search_options = { :page => var[:page] || 1,
                        :per_page => CoffeeShop::PER_PAGE }
    
    unless var[:location].blank?
      @geocode = MultiGeocoder.geocode(var[:location])
      
      if @geocode.success and @geocode.accuracy > 1
        lat = (@geocode.lat / 180.0) * Math::PI
        lng = (@geocode.lng / 180.0) * Math::PI
        
        @search_options.merge!(:geo => [lat, lng],
                               :sort_mode => :expr,
                               :sort_by => SORT_EXPRESSION,
                               :with => { "@geodist" => 0.0..(5 * METERS_PER_MILE) })
      end
    end

    CoffeeShop.search(keywords, @search_options)
  end
  
end

That's a lot of awesome. Let's walk through our new search class and see what's going on.

@geodist, @weight and SORT_EXPRESSION
Sphinx gives you some special attributes for sorting and filtering, including @weight and @geodist. @weight is the relevance of a search result (the larger the number, the more relevant the result) and is the default sorting option. @geodist is the distance (in meters) of the search result from the anchor point. By defining SORT_EXPRESSION to use both the @weight and @geodist attributes, we can sort in a way that takes both into account. You can add any other operators and attributes to this expression that you want to tailor how your results are sorted. For instance, if you had a 'popularity' attribute on your model and wanted more popular coffee shops to rank better, you could define your search expression as '@weight * @weight / @geodist + popularity' (just make sure you add 'popularity' to the list of attributes on your model index).

:page and :per_page
If you have WillPaginate installed, Thinking Sphinx will automatically wrap your search results in a WillPaginate collection, allowing you to use all your normal WillPaginate view helpers. Neat!

geocoding
If we pass a :location string to our search class, we want to try to geocode it and use it as an anchor point if the geocoding is successful. We'll also need to convert our lat and lng to radians in order for them to work with Sphinx.

:geo, :sort_mode, :sort_by and :with
These attributes affect the sorting and filtering of our search. :geo tells Sphinx that we are doing a geography search with specific lat and lng variables and tells it to add the @geodist attribute to each result. :sort_mode and :sort_by tell Sphinx that we want to sort results by our SORT_EXPRESSION constant. The :with option tells Sphinx that we only want to return results within five miles of our anchor point.

Finally, the last line performs the actual search based on the keywords and search options we've set up.

The Controller

Now that we've got our model and search class set up, executing searches from our controller is as simple as:

@coffee_shops = Search.execute(params[:query_keywords],
                                 :page => params[:page],
                                 :location => params[:query_location])

This will execute our query with all the parameters we care about and return a paginated collection of coffee shops sorted by relevancy and distance. Also, if any or all of the parameters are blank, nothing breaks! If all parameters are blank, a Sphinx search won't even be performed, and a WIllPaginate collection will be returned with our model's default sorting and PER_PAGE attributes.

Happy searching!