AWS Cloud Search in Five Minutes

AWS CloudSearch Service in 5 minutes

AWS CloudSearch is a self service, click of a button search engine service in AWS cloud. The CloudSearch is based on Apache SOLR search engine. The CloudSearch provides a way to configure and deploy an Apache SOLR without you have to do it manually. The CloudSearch has a easy to use configuration that allows you to setup index fields manually or with sample data. When you provide a sample data, the index setup process automatically recognizes data types. Then, you can review the setup and you can add, change or delete index fields. These are basic SOLR shcema configurations; however, the CloudSearch configuration does cannot allow more complicated configurations that you can do in SOLR schema.

Through SOLR schema allows to tune your index fields, I am in favor of schema-less search engine for many reasons.

  1. Log into the AWS dashboard.
  2. Click “CloudSearch” – this will bring the CloudSearch dashboard
  3. Click “Create New Domain”
    1. Choose “Analyze Sample file from my local machine (Make sure the starting column has data. If you have empty column, the analyzer return with error and it the error does not indicate what is wrong with your file.)
    2. Setup up access policy
    3. Review and confirm your search engine setup
    4. The confirm triggers setup and deployment of SOLR. It will take few minutes to deploy the SOLR Search Engine.
  4. Upload Documents through the Dashboard once the CloudSearch is deployed and available. Click on the search server.
      1. Click on Upload Document
      2. Click on select document
      3. Click continue and review and finish

    Now the document is indexed in the server.

  5. Query the indexed data – The default query parser is ‘Simple.’ The CloudSearch provides Lucene, Structured, and DisMas query parsers.
    1. Click on “Run a Test Search”
    2. Type text that is in the document for the search. It will produce the search result.
    3. The results are presented in the browser.
  6. “Availability Options” doubles the search instance by replicating the SOLR instance. The replica is deployed in different availability zone such that the Search is Highly Available.

    This option does is not same as SOLR Cloud. SOLR Cloud is a robust way of deploying SOLR search engine for HA and Failover. SOLR Cloud makes SHARDING easy and simple task.
  7. “Scaling Options” provides a way to define number of replicas.
  8. “Analysis Schemes” is the way to configure your SOLR “Analysis Schemes.” This option allows to add Stop words, Stemming, and Synonyms schemes.
  9. “Expressions” is way to allow index to pre-evaluate your search results for sorting. “Expressions are numeric expressions that are based on the data in your index and evaluated at search time. You can specify expressions in your search requests to sort search results or filter search results.”
  10. “Suggesters” – “Suggestions are based on the contents of a particular text field. When you request suggestions for a search string, the string is used as a prefix. Amazon CloudSearch searches the text field for the prefix and returns a list of matching documents. Click Add Suggester to define a new suggester. Click Edit to update an existing suggester.”
  11. “Monitoring” provides a view into search service usage.
  12. “Access Policies” allows you to configure authorization and authentication policies.

Leave a Reply