Install was tested on Ubuntu 10.04
Install Tomcat
I based my install on this How-to.
Use package manager to get relevant docs
sudo apt-get install tomcat6 tomcat6-admin tomcat6-common tomcat6-user tomcat6-docs tomcat6-examples
Test the install by starting tomcat:
sudo /etc/init.d/tomcat6 start
Visit the server in a browser, or try:
wget http://localhost:8080
Some other handy requests for tomcat
sudo /etc/init.d/tomcat6 stop
sudo /etc/init.d/tomcat6 restart
sudo /etc/init.d/tomcat6 status
Now that we have a working copy of Tomcat. We are ready to move on to the Solr installation.
Install Solr
Again, package manager for solr. I just found a solr/tomcat bundle, so I’ll just use that. This operation will restart Tomcat as well.
sudo apt-get install solr-tomcat
Check that Solr is working in a browser, or try:
wget http://localhost:8080/solr
We now having a working install of Solr on Tomcat. I recommend you walk through the small Solr tutorial, it will give you a quick understanding of how to do some basic things, such as creating, reading, updating and deleting data. I would first download a tar of a working release so that you can get the examples. Skip to the “Indexing Data” heading, as you already have a working copy of Solr on Tomcat.
Clear out test data
We will create a small XML file that will be used to clear all data from the index.
Create the xml file that you will post to the service
vim delete.xml
Paste the following into the file
<delete><query>*:*</query></delete>
Post the file using CURL
curl http://localhost:8080/solr/update --data-binary @delete.xml -H 'Content-type:text/xml; charset=utf-8'
We now have any empty index, ready to modify and load.
Download and Review the Movie DVD Database via CSV
Although I wanted to use IMDB, the data they provide for download is in a wacky, non-standard format. Instead, I found a little project that focuses on cataloging all DVDs ever released in region 1. We will use that instead.
Download the CSV and unzip
wget http://www.hometheaterinfo.com/download/dvd_csv.zip
unzip dvd_csv.zip
Take a moment to review the file labeled dvd_csv.txt. You will notice it is a pretty standard CSV document, the first line is dedicated to headers. Those headers will be used to define the search fields.
Define your Solr data schema
Update your Solr schema
vim /etc/solr/conf/schema.xml
Comment everything out between “<fields>”. Insert the following into the document:
<fields>
<!--DVD_Title,Studio,Released,Status,Sound,Versions,Price,Rating,Year,Genre,Aspect,UPC,DVD_ReleaseDate,ID,Timestamp-->
<field name="DVD_Title" type="text" indexed="true" stored="true" />
<field name="Studio" type="text" indexed="true" stored="true" />
<field name="Released" type="text" indexed="true" stored="true" />
<field name="Status" type="text" indexed="true" stored="true" />
<field name="Sound" type="text" indexed="true" stored="true" />
<field name="Versions" type="text" indexed="true" stored="true" />
<field name="Price" type="text" indexed="true" stored="true" />
<field name="Rating" type="text" indexed="true" stored="true" />
<field name="Year" type="text" indexed="true" stored="true" />
<field name="Genre" type="text" indexed="true" stored="true" />
<field name="Aspect" type="text" indexed="true" stored="true" />
<field name="UPC" type="text" indexed="true" stored="true" />
<field name="DVD_ReleaseDate" type="text" indexed="true" stored="true" />
<field name="ID" type="text" indexed="true" stored="true" />
<field name="Timestamp" type="text" indexed="true" stored="true" />
</fields>
After the closing “</fields>” tag, make sure that everything is commented out except for the following lines:
<uniqueKey>ID</uniqueKey>
<solrQueryParser defaultOperator="OR"/>
<defaultSearchField>DVD_Title</defaultSearchField>
Restart Tomcat
sudo /etc/init.d/tomcat6 restart
Revisit your admin page to ensure there are no errors.
Index the data
Index the data in the CSV
curl http://localhost:8080/solr/update/csv --data-binary @dvd_csv.txt -H 'Content-type:text/plain; charset=utf-8'
Search some fields
Now open SOLR Admin interface and enter some values.
Voila!