Optical Character Recognition (OCR)
What is OCR?
Optical Character Recognition, or OCR, is a technology used to recognize and convert printed or handwritten text characters from scanned paper documents, PDF files, images or video into machine-encoded text. This conversion process enables digital manipulation of the content, facilitating tasks such as text searching, editing, storage and, and digital archiving. In the context of Koverse, it is used for storing non-encoded text from these sources into datasets, as well as text searching.
Usage
Koverse' OCR functionality is only available for use via the API, only compatible with S3 and URL data sources, and is disabled by default due to the processing overhead that OCR incurs, potentially resulting in a slowed ingest. You can however, enable OCR in the API call via the connectionInfo
attribute, by setting the processWithOcr
boolean to true
enabling text recognition processing on the ingested files. Once processing is complete, searches will run against the OCR extracted text as well, allowing you to search within images and files that, without OCR, would not be searchable. One of the options for interacting with Koverse via API is the Koverse Python Connector to abstract away some of the complexity.
Feel free to explore the API Reference for additional API details.
HTTP Response Codes
When working with the API, you will encounter HTTP response codes. These codes can have a range of values, but it is relatively straightforward, and can be understood by reviewing the below table:
Status Code | Category | Summary |
---|---|---|
201 | Successful | Created: Indicates the request has succeeded and led to the creation of a new resource. |
4XX | Client Error | These codes indicate an error that the client made, such as a bad request or an unauthorized attempt to access a resource. i.e. 400 Bad Request , 401 Unauthorized , 403 Forbidden , 404 Not Found |
5XX | Server Error | These codes indican an error on the server's side, meaning the server failed to fulfill a valid request. i.e. 500 Internal Server Error , 502 Bad Gateway , 503 Service Unavailable , 504 Gateway Timeout |
Authenticate
Email and Password Authentication
This section will provide guidance corresponding to the Authentication for Email and Password section of the Koverse API Reference.
To authenticate via email and password, you'll need to send the properly formed request to the standard authentication API endpoint https://api.app.koverse.com/authentication
that provides the strategy (enum
or proxy
), the email address, the password, and the workspaceId
you would like to authenticate against. You'll receive a response that will either confirm successful authentication (returning the accessToken
, as well as the authentication
and user
objects), or in the event a request fails and you encounter an error in your response, it will contain the name
of the error code(s), the specific error message
, as well as the HTTP response status code
.
Example Email Auth Request:
{
"strategy": "local",
"email": "string",
"password": "string",
"workspaceId": "string"
}
Example Error Response:
{
"name": "string",
"message": "string",
"code": 0
}
Example Successful Response:
{
"accessToken": "string",
"authentication": {
"accessToken": "string",
"payload": {
"email": "string",
"exp": 0,
"iat": 0,
"iss": "string",
"jti": "string",
"sub": "string"
},
"strategy": "string"
},
"user": {
"avatar": "string",
"changeEmailTokenExpiration": "string",
"createdAt": "string",
"deletedAt": "string",
"displayName": "string",
"email": "string",
"firstName": "string",
"githubId": "string",
"googleId": "string",
"id": "string",
"lastName": "string",
"linkedAccounts": [
"string"
],
"microsoftId": "string",
"oktaId": "string",
"stripeCustomerId": "string",
"updatedAt": "string",
"verified": true,
"workspaceCount": 0
}
}
SSO Authentication
This section will provide guidance corresponding to the Authentication for SSO section of the Koverse API Reference.
To authenticate via SSO, you must send the properly formed request to the SSO authentication API endpoint https://api.app.koverse.com/authentication?workspaceId={workspaceId}
replacing {workspaceId}
with the corresponding ID to your target workspace. You will need to set the corresponding authentication strategy
as one of keycloak
, microsoft
, google
, github
, okta
, or custom
, as well as the access_token
that is retrieved by signing into the SSO account itself. If the request is successful, you will receive both an authentication
and user
object containing the necessary details to conduct subsequent authenticated requests against the API endpoint.
Example SSO Auth Request:
{
"strategy": "keycloak",
"access_token": "string"
}
Example Error Response:
{
"name": "string",
"message": "string",
"code": 0
}
Example Successful Response:
{
"accessToken": "string",
"authentication": {
"accessToken": "string",
"payload": {
"email": "string",
"exp": 0,
"iat": 0,
"iss": "string",
"jti": "string",
"sub": "string"
},
"strategy": "string"
},
"user": {
"avatar": "string",
"changeEmailTokenExpiration": "string",
"createdAt": "string",
"deletedAt": "string",
"displayName": "string",
"email": "string",
"firstName": "string",
"githubId": "string",
"googleId": "string",
"id": "string",
"lastName": "string",
"linkedAccounts": [
"string"
],
"microsoftId": "string",
"oktaId": "string",
"stripeCustomerId": "string",
"updatedAt": "string",
"verified": true,
"workspaceCount": 0
}
}
Create an Ingest Job
This section will provide guidance corresponding to the Create an Ingest Job section of the Koverse API Reference.
To create an ingest job for the purpose of importing data into Koverse, you must send the properly formed request to the ingest API endpoint https://api.app.koverse.com/ingest
. You will need to set the desired target datasetId
that is available from within your authenticated workspace, as well as the attributes/parameters for the dataSourceParams
via the following elements:
type
: the type of the datasource (Enum: "URL" "JDBC" "S3" "KAFKA" "OTHER"connectionInfo
: contains the configuration parameters for making a connection to ingest datasecurityLabelInfo
: contains the configuration parameters for the security label parsersecurityLabeled
: boolean, set to true if ingest source data contains security labels
You d
Example Ingest Job Request:
{
"datasetId": "6586f21b-ad4d-4d06-a309-712af47184a2",
"dataSourceParams": {
"type": "URL",
"connectionInfo": {
"urls": "- \"https://kisp-test.s3-us-west-2.amazonaws.com/nightly_tests/test.csv\"\n- \"https://kisp-test.s3-us-west-2.amazonaws.com/nightly_tests/test.xls\"\n",
"processAsDocument": true,
"processWithOcr": false
},
"securityLabelInfo": {
"fields": [
"string"
],
"label": "string",
"labelHandlingPolicy": "ignore",
"parserClassName": "simple-parser",
"replacementString": "string"
},
"securityLabeled": true
}
}
Example Error Response:
{
"name": "string",
"message": "string",
"code": 0
}
Example Successful Response:
{
"name": "string",
"message": "string",
"code": 202
}