magnifying-glassQuery DSL

Query DSL (Domain-Specific Language) is a JSON-based language used to define queries for searching and interacting with data in the Data Lake. It provides a flexible, structured way to build powerful and precise search and aggregation queries.

Query DSL is the backbone of the Data Lake querying capabilities, enabling developers to extract, filter, and analyze data efficiently.

Key Features of Query DSL

  1. JSON Format: Query DSL uses JSON to structure queries, making them readable and easy to integrate into applications.

  2. Expressiveness: Query DSL supports a wide variety of query types, allowing you to perform simple or complex searches. You can mix and match query types to handle various use cases.

  3. Compound and Nesting: Query DSL enables the combination of multiple queries using compound queries, like bool, which allows complex logic with AND, OR, and NOT conditions.

Components of Query DSL

Query Types

Query DSL includes several categories of queries:

  • Match Queries: Search for full-text matches, such as keywords in documents.

  • Term Queries: Find exact matches for structured data, like numbers or keywords.

  • Range Queries: Retrieve documents based on numeric, date, or other range filters.

  • Boolean Queries: Combine multiple queries using logical operators like must, should, or must not.

Filters

Filters narrow down results without affecting their relevance scores, making them ideal for tasks like exact matches, ranges, or excluding certain data.

Aggregations

Query DSL also supports aggregations for analytics, helping you summarize or extract insights from data.

Leaf query clauses

Leaf query clauses are designed to search for specific values within specific fields, such as those used in term, or range queries. These types of queries are standalone and can be executed independently.

Term

Finds documents that have an exact match for a specified term in a given field. The term query is useful for locating documents based on specific values, like an IP address, firewall action, or username.

Example query:

Top-level parameters for term

<field>

(Required, object) Field you wish to search.

Parameters for <field>

value

(Required, string) The term you want to search for in the specified <field>. For a document to be returned, the term must precisely match the field’s value, taking into account whitespace and capitalization.

case_insensitive (Optional, Boolean) Enables case-insensitive matching for ASCII characters between the provided value and the indexed field values when set to true. By default, this is set to false, meaning that case sensitivity is determined by the field’s underlying mapping.

Terms

Retrieves documents that include one or more specified exact terms within a given field. The terms query functions similarly to the term query but allows you to search for multiple values at once.

Example query

Top-level parameters for terms

<field>

(Optional, object) Field you wish to search.

This parameter accepts an array of terms to search for within the specified field. For a document to be returned, at least one term must match the field value exactly, including whitespace and capitalization. By default, the platform restricts the terms query to a maximum of 65,536 terms.

Wildcard query

Retrieves documents with terms that match a specified wildcard pattern.

A wildcard operator serves as a placeholder for one or more characters. For instance, the `*` operator matches zero or more characters. You can create wildcard patterns by combining wildcard operators with other characters.

This search retrieves documents where the userName field contains a term starting with `j` and ending with `e`. Examples of matching terms include `jaime`, `jane`, or `jule`:

Top-level parameters wildcard

<field>

(Optional, object) Field you wish to search.

Parameters for <field>

value

(Required, string) Specifies the wildcard pattern for the terms you want to search for in the given <field> .

This parameter allows the use of two wildcard operators:

  • ? matches any single character.

  • * matches zero or more characters, including none.

circle-info

IMPORTANT NOTE

Refrain from starting patterns with * or ?, as doing so can increase the number of iterations required to locate matching terms, which may negatively impact search performance.

case_insensitive (Optional, Boolean) Enables case-insensitive matching for ASCII characters between the provided value and the indexed field values when set to true. By default, this is set to false, meaning that case sensitivity is determined by the field’s underlying mapping.

Time range query Retrieves documents with terms that fall within a specified time range. Following query will return documents from last 30 minutes, regardless of the time range specified as part of the URL to which the request is sent.

Top-level parameters range

<field>

(Optional, object) Field you wish to search.

Parameters for <field>

  • gt (Optional) Greater than.

  • gte (Optional) Greater than or equal to.

  • lt (Optional) Less than.

  • lte (Optional) Less than or equal to.

If the <field> parameter is of a date field type, you can apply date math using the previopus parameters.

Date math Many parameters that accept formatted date values —like gt and lt in range queries or from and to in date range aggregations—support date math.

Date math expressions begin with an anchor date, which can either be now or a date string followed by ||. After the anchor date, you can optionally include one or more mathematical operations:

  • +1h : Adds one hour

  • -1d : Subtracts one day

  • /d : Rounds down to the nearest day

Note that the supported time units for date math differ from those used for duration calculations. The supported units are:

y

Years

M

Months

w

Weeks

d

Days

H

Hours

H

Hours

m

Minutes

s

Seconds

Here are examples of date math expressions, assuming the current time is 2001-01-01 12:00:00:

  • now+1h: Adds one hour to the current time in milliseconds. Result: 2001-01-01 13:00:00

  • now-1h: Subtracts one hour from the current time in milliseconds. Result: 2001-01-01 11:00:00

  • now-1h/d: Subtracts one hour from the current time in milliseconds, then rounds down to 00:00 UTC. Result: 2001-01-01 00:00:00

  • 2001.02.01||+1M/d: Adds one month to the date 2001-02-01 in milliseconds and rounds down to the nearest day. Result: 2001-03-01 00:00:00

circle-info

IMPORTANT NOTE Some query types involving large time ranges (days or weeks) tend to run slowly because of their implementation and depending on the amount of data that must analyze, which may impact the overall stability of the Data Lake.

Time_zone parameter

The time_zone parameter allows you to adjust date values to UTC using a specified UTC offset. For instance:

  1. Specifies that the date values are adjusted using a UTC offset of +02:00.

  2. With this offset, Elasticsearch translates the date to 2024-08-23T22:00:00 UTC.

  3. The time_zone parameter does not modify the now value.

Compound query clauses

Compound query clauses encapsulate other leaf or compound queries and are used to logically combine multiple queries using the bool to modify their.

Query clauses function differently based on whether they are applied in a query context or a filter context.

Boolean query A query designed to match documents based on boolean combinations of other queries. The bool query corresponds to Lucene's BooleanQuery and is constructed using one or more boolean clauses, where each clause has a specific type of occurrence. The occurrence types include:

Occur
Description

must

The clause (query) is required to be present in matching documents and will influence the relevance score.

filter

The clause (query) is required in matching documents, but unlike `must`, its score is not factored in. Filter clauses operate in the filter context, where scoring is disregarded, and the clauses are optimized for caching.

should

The clause (query) is optional but preferred to appear in the matching document.

must_not

The clause (query) must be absent from the matching documents. These clauses are processed in the filter context, where scoring is disregarded, and they are optimized for caching. As a result, all documents are assigned a score of 0.

Query example:

Filter context

In a filter context, a query clause determines whether a document matches the query by providing a simple "Yes" or "No" answer. Filter context is commonly used for structured data filtering, such as:

  • Checking if a timestamp falls within the range of 2015 to 2016.

  • Verifying if the deviceAction field is set to allow.

The Data Lake automatically caches frequently used filters to enhance performance. Filter context applies whenever a query clause is used in a filter parameter, such as the filter or must_not parameters in a bool query or in a filter aggregation.

Here’s an example of query clauses being applied in both query and filter contexts using the search API. This query retrieves documents that satisfy all the following conditions:

  1. The query parameter specifies the query context.

  2. Within the query context, the bool and two match clauses are used to evaluate how closely each document aligns with the query.

  3. The filter parameter specifies the filter context. In this context, the term and range clauses are applied to exclude documents that do not meet the criteria.

Last updated