Leaveraging Elasticsearch

Elasticsearch is a powerful, distributed search and analytics engine designed for managing and querying text-intensive databases. It provides several key benefits that make it particularly useful for applications requiring full-text search capabilities, such as:

Full-Text Search: Elasticsearch excels at full-text search, allowing for efficient searching of text data with advanced features like relevance scoring, phrase matching, and highlighting.
Scalability: It can handle large volumes of data and scale horizontally, making it suitable for big data applications.
Real-Time Search: Elasticsearch provides near real-time search capabilities, enabling users to quickly retrieve up-to-date information.
Advanced Querying: It supports complex queries, including fuzzy matching, phrase queries, and boolean queries, making it flexible for various search requirements.
Analytics: Built-in aggregations and analytics capabilities allow for in-depth data analysis and reporting.

Through the following few examples I would like to show how I used the Elasticsearch DSL (Domain Specific Language) in python, using FastAPI endpoints to query the Elasticsearch Database. These are a few niche examples that I utilised to get the best out of Elasticsearch for my use case.

Fuzzy Search Endpoint

@search_router.get("/{id}/{query}/fuzzy")
async def search(id: str, query: str):
  res = es.search(index="articles", body={
    "query": {
      "bool": {
        "must": [
          {
            "match": {
              "user_id": id
            }
          },
          {
            "multi_match": {
              "query": query,
              "fields": ["title", "content"],
              "fuzziness": "AUTO"
            }
          }
        ]
      }
    }
  })
  return res['hits']['hits']

Explanation

Endpoint Definition: This defines an asynchronous GET endpoint that takes two parameters: id and query.
Elasticsearch Query:
- Index: Searches within the articles index.
- Query Structure: Uses a boolean query with two conditions:
  - Match Query for user_id: Ensures the user_id field matches the provided id.
  - Multi-Match Query: Searches for the query in both title and content fields with fuzziness set to AUTO, allowing for approximate matches.
Return: Returns the search results (hits) from Elasticsearch.

Benefits

Fuzzy Search: The fuzziness parameter allows for matching terms that are similar but not identical to the search query, improving search results for users who may make typos or use slightly different terms.
Field Targeting: The multi_match query targets specific fields (title and content), ensuring relevant results are retrieved based on the query context.
User-Specific Filtering: The user_id match ensures the search results are filtered to include only articles related to the specified user.

Metadata Search Endpoint

@search_router.get("/{id}/{query}/metadata")
async def search(id: str, query: str):
  res = es.search(index="articles", body={
    "query": {
      "bool": {
        "must": [
          {
            "match": {
              "user_id": id
            }
          },
          {
            "match": {
              "metadata.key": query
            }
          },
          {
            "match": {
              "metadata.value": query
            }
          }
        ]
      }
    }
  })
  return res['hits']['hits']

Explanation

Endpoint Definition: This defines an asynchronous GET endpoint that takes two parameters: id and query.
Elasticsearch Query:
- Index: Searches within the articles index.
- Query Structure: Uses a boolean query with three conditions:
  - Match Query for user_id: Ensures the user_id field matches the provided id.
  - Match Query for metadata.key: Ensures the metadata.key field matches the provided query.
  - Match Query for metadata.value: Ensures the metadata.value field matches the provided query.
Return: Returns the search results (hits) from Elasticsearch.

Benefits

Metadata Search: Targets specific metadata fields (metadata.key and metadata.value), enabling precise searching within the metadata associated with articles.
User-Specific Filtering: Ensures the results are filtered to include only articles related to the specified user (user_id match).
Structured Data Search: Useful for searching structured data within the articles, such as tags, categories, or other metadata attributes, providing more granular control over search results.

Summary

These endpoints leverage Elasticsearch’s powerful search capabilities to provide flexible and efficient querying for text-intensive databases. The fuzzy endpoint allows for approximate matching in text fields, improving user experience by accommodating typos and variations in query terms. The metadata endpoint enables precise searching within specific metadata fields, supporting more structured and targeted searches. Both endpoints ensure user-specific filtering, providing personalised search results based on the user’s ID.

この情報は役に立ちましたか？

フィードバックをいただき、ありがとうございました！