Searching Datasets
Basic Queries
To search your datasets, begin by selecting a dataset from your Koverse workspace dashboard. This will present you with your dataset and data within that you have the correct ABAC attributes to view.
Additionally, you can select to search the current dataset, or all datasets via the dropdown menu to the immediate right of the search field.
Type your search criteria within the search field and allow a few moments as it is applied. Below you can see an example of what type of view you will encounter with a search.
Lucene Queries
You can also pass Lucene queries through the dataset search to have more refined results by selection and/or exclusion of specific fields and values. The basic syntax structures are:
field:value
field1:value1 AND field2:value2
field:value1 OR field:value2
Key points to remember:
- Search terms (with or without field) without operator specified will be
AND
by default - Search term without a specified field name will be for all fields
- This is true even if there is a search term with a field in the query string
- An empty search will be treated as "select all records from dataset"
- Grouping utilizing parentheses
()
is unsupported - Partially indexed datasets will only return indexed fields from the dataset when searched.
The above types of queries could result in something similar to the below depictions. In example, a dataset including films and shows can be queried like so:
- field:value
country:Qatar
- field1:value1 AND field2:value2...
genres:Dramas, Independent AND country:United Kingdom AND rating:R AND release_year:2015
More advanced searches can be provided as well:
field1:value1 AND field2:value2 OR field2:value3
In example:
- field1:value1 AND field2:value2 OR field2:value3
type:movie AND release_year:2015 OR release_year:2010
SQL Query Support
The system provides basic SQL support for querying data via the API and is not available in the UI. Queries must be formatted using SELECT *
and include the dataset's UUID as the source for the data. The dataset UUID must be quoted, with escape characters for the quotes. Below is an overview of the supported features, along with examples. More details on interacting with the API can be found here.
Supported SQL Features
- SELECT * FROM: Retrieve all columns from the dataset.
- WHERE: Filter results based on conditions.
- AND / OR: Combine multiple conditions in the WHERE clause.
- EQUALS (
=
): Check for equality in conditions. - GREATER THAN (
>
) / GREATER THAN OR EQUALS (>=
): Compare values that should be larger or equal to a specified value. - LESS THAN (
<
) / LESS THAN OR EQUALS (<=
): Compare values that should be smaller or equal to a specified value. - LIKE: Perform pattern matching on text fields.
- ORDER BY: Sort the query results by one or more columns.
Example Queries
In the examples below, 97d23549-fd63-41b2-9950-bcc7f566e4d4
represents the UUID of the dataset being queried.
Basic SELECT with WHERE
"SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE age > 30"
This query retrieves all records from the dataset where the
age
field is greater than 30.SELECT with Multiple Conditions (AND)
"SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE country = 'United States' AND age >= 25"
This query retrieves all records where the
country
field is "United States" and theage
field is 25 or older.SELECT with OR Condition
"SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE occupation = 'Engineer' OR occupation = 'Scientist'"
This query retrieves all records where the
occupation
field is either "Engineer" or "Scientist."Using LIKE for Pattern Matching
"SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE name LIKE 'John%'"
This query retrieves all records where the
email
field starts with "John".SELECT with ORDER BY
"SELECT * FROM \"97d23549-fd63-41b2-9950-bcc7f566e4d4\" WHERE salary > 50000 ORDER BY salary"
ORDER BY queries require "orderedResults" to be set to true in the query payload. This query retrieves all records where the
salary
field is greater than 50,000 and orders the results by thesalary
field in ascending order.
Support for Date
KDP supports Date search for a field, if it contains ZonedDateTime formatted strings, as shown below:
- yyyy-MM-dd'T'HHmmss.SSSXXXX
- 2024-10-24T104416.352-0700
- 2024-10-24T104416.352+0700
- yyyy-MM-dd'T'HH:mm:ss.SSSXXXX
- 2015-10-09T00:00:00.237-07:00
- 2015-10-09T00:00:00.237+07:00
If the Date fields are in the ZonedDateTime format, then they can be queried using by =, >,<,>=,<= or ORDER BY.
If the Date fields are not in the ZonedDateTime format, they are treated as string and not as timestamp and so comparison operators (<, > etc.,) and order by will not work.
Example Queries
Simple SELECT
"select * from \"2436779a-b320-4645-925c-128410931de6\" where myDateField = '2024-08-01T065223.246-1200' "
This query retrieves all records where the myDateField matches the given timestamp
SELECT with ORDER BY
"SELECT * FROM \"2436779a-b320-4645-925c-128410931de6\" ORDER BY myDateField"
This query retrieves all records sorted by the field myDateField
SELECT with Greater than
"SELECT * FROM \"2436779a-b320-4645-925c-128410931de6\" WHERE myDateField >= '2024-08-12T090423.246-0330' "
This query retrieves all records where the myDateField is greater or equal to '2024-08-12T090423.246-0330'
SELECT with PART of Date
"SELECT * FROM \"2436779a-b320-4645-925c-128410931de6\" WHERE myDateField like '2024-09%'"
This query retrieves all records where the month of the myDateField is 09 of year 2024
Unsupported SQL Features
- JOINS: Combining data from multiple datasets is not supported.
- Other SQL Operations: Operations like INSERT, UPDATE, DELETE, GROUP BY, or complex functions are not supported.
For more information on using SQL queries through the API, refer to the API documentation.
Searching Your Datasets and Indexing
Datasets are automatically indexed during ingest; however, you can disable the index entirely before or after ingest if you choose. Alternatively, you may modify the index settings after data is ingested to add specific fields for indexing, or to selectively disable existing index fields. Indexing will impact how and what you can find using the dataset search function, as fields that are not indexed are not searchable. As stated previously, only indexed fields will populate as a result of your search query.
For additional information on indexing datasets see Index Management.
As an example, the below image depicts how a non-indexed dataset will present during the search: