Skip to main content

Searching Datasets

Basic Queries

To search your datasets, begin by selecting a dataset from your Koverse workspace dashboard. This will present you with your dataset and data within that you have the correct ABAC attributes to view.

Dataset Search Dashboard

Additionally, you can select to search the current dataset, or all datasets via the dropdown menu to the immediate right of the search field.

Dataset Search Selector

Type your search criteria within the search field and allow a few moments as it is applied. Below you can see an example of what type of view you will encounter with a search.

Dataset Search Example

Lucene Queries

You can also pass Lucene queries through the dataset search to have more refined results by selection and/or exclusion of specific fields and values. The basic syntax structures are:

  • field:value
  • field1:value1 AND field2:value2
  • field:value1 OR field:value2

Key points to remember:

  • Search terms (with or without field) without operator specified will be AND by default
  • Search term without a specified field name will be for all fields
    • This is true even if there is a search term with a field in the query string
  • An empty search will be treated as "select all records from dataset"
  • Grouping utilizing parentheses () is unsupported
  • Partially indexed datasets will only return indexed fields from the dataset when searched.

The above types of queries could result in something similar to the below depictions. In example, a dataset including films and shows can be queried like so:

  • field:value country:Qatar

Single field:value Query

  • field1:value1 AND field2:value2... genres:Dramas, Independent AND country:United Kingdom AND rating:R AND release_year:2015

Multiple AND Condition Query

More advanced searches can be provided as well:

  • field1:value1 AND field2:value2 OR field2:value3

In example:

  • field1:value1 AND field2:value2 OR field2:value3 type:movie AND release_year:2015 OR release_year:2010

AND-OR Query

SQL Query Support

The system provides basic SQL support for querying data via the API and is not available in the UI. Queries must be formatted using SELECT * and include the dataset's UUID as the source for the data. The dataset UUID must be quoted, with escape characters for the quotes. Below is an overview of the supported features, along with examples. More details on interacting with the API can be found here.

Supported SQL Features

  • SELECT * FROM: Retrieve all columns from the dataset.
  • WHERE: Filter results based on conditions.
  • AND / OR: Combine multiple conditions in the WHERE clause.
  • EQUALS (=): Check for equality in conditions.
  • GREATER THAN (>) / GREATER THAN OR EQUALS (>=): Compare values that should be larger or equal to a specified value.
  • LESS THAN (<) / LESS THAN OR EQUALS (<=): Compare values that should be smaller or equal to a specified value.
  • LIKE: Perform pattern matching on text fields.
  • ORDER BY: Sort the query results by one or more columns.

Example Queries

In the examples below, 97d23549-fd63-41b2-9950-bcc7f566e4d4 represents the UUID of the dataset being queried.

  1. Basic SELECT with WHERE

    "SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE age > 30"

    This query retrieves all records from the dataset where the age field is greater than 30.

  2. SELECT with Multiple Conditions (AND)

    "SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE country = 'United States' AND age >= 25"

    This query retrieves all records where the country field is "United States" and the age field is 25 or older.

  3. SELECT with OR Condition

    "SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE occupation = 'Engineer' OR occupation = 'Scientist'"

    This query retrieves all records where the occupation field is either "Engineer" or "Scientist."

  4. Using LIKE for Pattern Matching

    "SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE name LIKE 'John%'"

    This query retrieves all records where the email field starts with "John".

  5. SELECT with ORDER BY

    "SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE salary > 50000 ORDER BY salary"

    ORDER BY queries require "orderedResults" to be set to true in the query payload. This query retrieves all records where the salary field is greater than 50,000 and orders the results by the salary field in ascending order.

Support for Date

KDP supports Date search for a field, if it contains ZonedDateTime formatted strings, as shown below:

  • yyyy-MM-dd'T'HHmmss.SSSXXXX
    • 2024-10-24T104416.352-0700
    • 2024-10-24T104416.352+0700
  • yyyy-MM-dd'T'HH:mm:ss.SSSXXXX
    • 2015-10-09T00:00:00.237-07:00
    • 2015-10-09T00:00:00.237+07:00

If the Date fields are in the ZonedDateTime format, then they can be queried using by =, >,<,>=,<= or ORDER BY.

If the Date fields are not in the ZonedDateTime format, they are treated as string and not as timestamp and so comparison operators (<, > etc.,) and order by will not work.

Example Queries

  1. Simple SELECT

     "select * from \"2436779a-b320-4645-925c-128410931de6\" where myDateField = '2024-08-01T065223.246-1200' "

    This query retrieves all records where the myDateField matches the given timestamp

  2. SELECT with ORDER BY

     "SELECT * FROM \"2436779a-b320-4645-925c-128410931de6\" ORDER BY myDateField"

    This query retrieves all records sorted by the field myDateField

  3. SELECT with Greater than

     "SELECT * FROM \"2436779a-b320-4645-925c-128410931de6\" WHERE myDateField >= '2024-08-12T090423.246-0330' "

    This query retrieves all records where the myDateField is greater or equal to '2024-08-12T090423.246-0330'

  4. SELECT with PART of Date

     "SELECT * FROM \"2436779a-b320-4645-925c-128410931de6\" WHERE myDateField like '2024-09%'"

    This query retrieves all records where the month of the myDateField is 09 of year 2024

Unsupported SQL Features

  • JOINS: Combining data from multiple datasets is not supported.
  • Other SQL Operations: Operations like INSERT, UPDATE, DELETE, GROUP BY, or complex functions are not supported.

For more information on using SQL queries through the API, refer to the API documentation.

Searching Your Datasets and Indexing

Datasets are automatically indexed during ingest; however, you can disable the index entirely before or after ingest if you choose. Alternatively, you may modify the index settings after data is ingested to add specific fields for indexing, or to selectively disable existing index fields. Indexing will impact how and what you can find using the dataset search function, as fields that are not indexed are not searchable. As stated previously, only indexed fields will populate as a result of your search query.

For additional information on indexing datasets see Index Management.

As an example, the below image depicts how a non-indexed dataset will present during the search:

Lucene Query Against Un-Indexed Dataset