Thinking Sphinx is a great search library for Ruby projects. It has clean syntax and some nice options for indexing, searching and sorting. In this tutorial I'll show how you can use TS to perform searches based on both a geographical point and a keyword.
Let's say that we're building a Rails app to index independent coffee shops in our town. Every coffee shop has a name, a description and some comments, as well as a latitude and longitude value so that we can place it on a map. We want to allow our users to search through the coffee shops in our database by providing keywords and a location. We want our searches to take into account BOTH the relevance of the results based on keyword matches, as well as their proximity to the location given.
First, we'll of course need to install Sphinx and Thinking Sphinx. It's probably also a good idea to read this introduction to Sphinx.
This tutorial also uses the Geokit gem, so be sure to grab that as well.
Once Sphinx and Thinking Sphinx are installed, we are ready to define the indexes on our model. This tells Sphinx which fields to store in its index for searching, which attributes we want to have available for sorting and filtering, as well as any other properties we want to define.
# Table name: coffee_shops
#
# id :integer(4) not null, primary key
# name :string(255)
# description :text
# lat :float
# lng :float
class CoffeeShop < ActiveRecord::Base
has_many :comments
PER_PAGE = 10
define_index do
# fields
indexes :name
indexes :description
indexes comments.body, :as => :comments
# attributes
has 'RADIANS(lat)', :as => :lat, :type => :float
has 'RADIANS(lng)', :as => :lng, :type => :float
# properties
set_property :latitude_attr => 'lat'
set_property :longitude_attr => 'lng'
set_property :field_weights => { 'name' => 10,
'description' => 2,
'comments' => 1 }
end
end
The fields in the index tell Sphinx that we want our search to look at the name and description of our coffee shops, as well as the body of any comments that have been made. Notice that we're able to index not only fields that are in our coffee_shops table, but also fields from associated records - in this case the bodies of our comments.
Defining a lat and lng attribute are necessary for doing geography-based searches. The big gotcha here is that Sphinx needs these attributes to be stored as radians, whereas most geocoding APIs (such as Google) use decimal degrees. The SQL 'RADIANS(lat)' will automatically do this conversion for you. If you happen to have your lat and lng stored as radians already, however, you can just define your attributes like this:
# attributes
has :lat
has :lng
Finally, we define our properties. The :latitude_attr and :longitude_attr properties tell Sphinx which fields we're using for our geography calculations. The :field_weights define how much weight we want to give to each indexed field. If we get a match on the name of one of our coffee shops, that should weigh heavier in the relevance than if we got a match on one of our comment bodies.
Since we want to keep our controllers lean and write re-usable code, it's a good idea to move our search logic into its own class. Let's create the file search.rb in our lib folder for this.
include Geokit::Geocoders
class Search
METERS_PER_MILE = 1609.344
SORT_EXPRESSION = "@weight * @weight / @geodist"
def self.execute(keywords, var = {})
@search_options = { :page => var[:page] || 1,
:per_page => CoffeeShop::PER_PAGE }
unless var[:location].blank?
@geocode = MultiGeocoder.geocode(var[:location])
if @geocode.success and @geocode.accuracy > 1
lat = (@geocode.lat / 180.0) * Math::PI
lng = (@geocode.lng / 180.0) * Math::PI
@search_options.merge!(:geo => [lat, lng],
:sort_mode => :expr,
:sort_by => SORT_EXPRESSION,
:with => { "@geodist" => 0.0..(5 * METERS_PER_MILE) })
end
end
CoffeeShop.search(keywords, @search_options)
end
end
That's a lot of awesome. Let's walk through our new search class and see what's going on.
@geodist, @weight and SORT_EXPRESSION
Sphinx gives you some special attributes for sorting and filtering, including @weight and @geodist. @weight is the relevance of a search result (the larger the number, the more relevant the result) and is the default sorting option. @geodist is the distance (in meters) of the search result from the anchor point. By defining SORT_EXPRESSION to use both the @weight and @geodist attributes, we can sort in a way that takes both into account. You can add any other operators and attributes to this expression that you want to tailor how your results are sorted. For instance, if you had a 'popularity' attribute on your model and wanted more popular coffee shops to rank better, you could define your search expression as '@weight * @weight / @geodist + popularity' (just make sure you add 'popularity' to the list of attributes on your model index).
:page and :per_page
If you have WillPaginate installed, Thinking Sphinx will automatically wrap your search results in a WillPaginate collection, allowing you to use all your normal WillPaginate view helpers. Neat!
geocoding
If we pass a :location string to our search class, we want to try to geocode it and use it as an anchor point if the geocoding is successful. We'll also need to convert our lat and lng to radians in order for them to work with Sphinx.
:geo, :sort_mode, :sort_by and :with
These attributes affect the sorting and filtering of our search. :geo tells Sphinx that we are doing a geography search with specific lat and lng variables and tells it to add the @geodist attribute to each result. :sort_mode and :sort_by tell Sphinx that we want to sort results by our SORT_EXPRESSION constant. The :with option tells Sphinx that we only want to return results within five miles of our anchor point.
Finally, the last line performs the actual search based on the keywords and search options we've set up.
Now that we've got our model and search class set up, executing searches from our controller is as simple as:
@coffee_shops = Search.execute(params[:query_keywords],
:page => params[:page],
:location => params[:query_location])
This will execute our query with all the parameters we care about and return a paginated collection of coffee shops sorted by relevancy and distance. Also, if any or all of the parameters are blank, nothing breaks! If all parameters are blank, a Sphinx search won't even be performed, and a WIllPaginate collection will be returned with our model's default sorting and PER_PAGE attributes.
Happy searching!