Introduction


Kiara DB is an in-memory database, that provides the freedom to search by any combination of search keys, at consistent high performance, without the need of creating any composite index.

Taking the popular SQL as an example, this would mean, that you can search with any possible values in the select clause, and any possible values in the where clause, with good performance, without creating any composite index! A composite index is simply an index created with more than one column or key.

This level of freedom, of searching by any possible key, without having to worry about performance is not provided by any other database in the market. Since almost all other databases use Hash based indexes or B-Tree indexes, the amount of indexes needed to be created to support all possibilities is HUGE and not achievable, as the number of columns/data-attributes grow.

Have a look at the table below, as of how Kiara compares to Hash and B-Tree based indexes, in terms of the number of indexes required, to serve all possibilities, as the number of data attribute (N) grows bigger.

 

nHash Indexes (2^n)B-Tree (1 + (n(n-1))/2))Kiara indexes (n)
1211
532115
1010244610
1001.2676506e+304,951100
5003.273391e+150 124,751500
1,000infinity499,5011,000
10,000infinity49,995,00110,000
100,000infinity4,999,950,001100,000

 

The tradeoff  with such an invaluable gain is that Kiara only operates on read-only data. If your data changes, you will have re load your data into memory. So, this solution is only suited for data that does not change, or if it changes, it should not change often.

Installation


At the moment Kiara can only be integrated with Java based applications. To include Kiara SDK in your project, checkout the latest version in Maven Repo, and include that as a dependency in your project.

Once that is done, your Java project can then be used as a single instance of Kiara’s in-memory database.

<!-- https://mvnrepository.com/artifact/com.kapoorlabs/kiara -->
 <dependency> 
    <groupId>com.kapoorlabs</groupId> 
    <artifactId>kiara</artifactId> 
    <version>2.0.6</version> 
 </dependency>

Terminology & Basic Structure


Kiara DB is a collection of Read-only data stores, with each data store being a collection of POJOs (plain old java objects). So, you can very well say that a POJO class defines the schema of a data store.

 

Terminology


Fig 1 Outlining basic structure of Kiara store.

There cannot be any relation between two data stores, which makes Kiara a non relational NoSql database. Since it is a non relational database, data across various data stores might be redundant, but redundancy is often a good tradeoff with horizontally scalable architecture, like with all other NoSql databases.

If you are familiar with SQL databases, the following comparison helps: –

 

    • Treat a data store as a Sql table that has no relation with any other tables.
    • Treat Java class that defines the data store, as the Sql schema that defines the Sql table.

Loading Data


To load data in a data store, after which it is ready to be searched, you need to follow the following steps : –

Step 1

Create a data store and associate the corresponding java class as the schema.

Step 2

Create a store loader for the store created in step 1.

Step 3

Loop through all the records, that you want to load, and pass each record to store loader.

Step 4

As a final step, let Kiara know that data loading has been completed.

A typical Data loading code looks like follows: –

import com.kapoorlabs.kiara.domain.Store;
import com.kapoorlabs.kiara.loader.StoreLoader;
...

//Step 1
Store<YOUR_CLASS> store = new Store<>(YOUR_CLASS.class);

//Step 2
StoreLoader<YOUR_CLASS> storeLoader = new StoreLoader<>(store);

//Step 3
for (YOUR_CLASS record: records) {
     storeLoader.loadTable(record));
}

//Step 4
storeLoader.prepareForSearch();

Allowed Data Types


Simple Data Types

The Java class that you use to define the schema of your store supports the following simple data types :-
+ All Primitives
  • byte
  • short
  • char
  • int
  • long
  • float
  • double
+ All Autoboxed Objects
  • Byte
  • Short
  • Character
  • Integer
  • Long
  • Float
  • Double
+ String
  • String

Complex Data Types

In addition to the above listed simple types, Kiara supports a variety of complex types as explained further.

To define a complex type, you basically declare a String variable and annotate it with a suitable annotation provided by Kiara SDK. For example if you want to define a Date type of format yy/mm/dd, you would create the following variable along with the annotation as described.

@DateFormat(value=”yy/MM/dd”)

String yourDateVariable;

Then you can populate it with the string, that matches the format, like for the above example “05/18/20”.

Various complex types are defined as below, click on the particular type to learn more!

Date Times


+ Date

To declare a date field in your POJO class, that defines the schema of your store, declare a String and annotate it with @DateFormat annotation, provided by Kiara SDK.

@DateFormat()

String dateField;

(Sample String value: “2018-05-18”)

 

@DateFormat annotation marks a String to be interpreted as a Date. The default date format is yyyy-MM-dd (2018-05-18).

If you want to specify a different date format, you can do so, by using value argument.

Below is an example of a different date format MM/dd/yy

 

@DateFormat(value=”MM/dd/yy”)

String dateField;

(Sample String value: “05/18/20”)

+ Date Time

To declare a date-time field in your POJO class, that defines the schema of your store, declare a String and annotate it with @DateTimeFormat annotation, provided by Kiara SDK.

@DateTimeFormat

String dateTimeField;

(Sample String value: “2018-05-18 16:10:01”)

 

@DateTimeFormat annotation marks a String to be interpreted as a Date-Time. The default date time format is yyyy-MM-dd HH:mm:ss (2018-05-18 16:10:01).

If you want to specify a different date time format, you can do so, by using value argument.

Below is an example of a different date time format MM/dd/yy HH:mm

 

@DateTimeFormat(value=”MM/dd/yy HH:mm”)

String dateTimeField;

(Sample string value: “05/18/20 16:10”)

Ranges


+ Date Range

Date Range is a field that has: –

 

  • A required lower date value
  • An optional upper date value

 

Default date format is yyyy-MM-dd, but a custom date format can be specified using “value” argument.

If upper date value is not specified, then upper date value in the range is same as the lower date value.

Consider the following examples to understand how date ranges are interpreted.

 

Example. (mm/dd/yy) lower value upper value
01/10/18-02/10/18 01/10/18 02/10/18
01/10/18 01/10/18 01/10/18

 

To declare a Date Range field in your POJO class, that defines the schema of your store, declare a String and annotate it with @DateRange annotation, provided by Kiara SDK.

@DateRange

String dateRangeField;

(Sample String value: “2018-01-10-2018-01-29”)

The above example means a date range, with all the dates between 2018-01-10 <-> 2018-01-29, inclusive of lower and upper date values.

If you want to specify a different date format, please specify the format in the value argument as below: –

@DateRange(value=”mm/dd/yy”)

String dateField;

(Sample String value: “06/01/12-06/02/12”)

Means a date range, with all the dates between 2012-01-12 <-> 2012-02-12

Lists


+ List of Numbers

To declare a List of Numbers in your POJO class, that defines the schema of your store, declare a String and annotate it with @CommaSeperatedNumbers annotation, provided by Kiara SDK. Then store your list of numbers as a single string with comma delimited values.

The minimum and maximum number, that can be stored in a list is the same as Java Long.

@CommaSeperatedNumbers

String listOfNumbers;

(Sample String value: “-343,12,65565”)

In the above example listOfNumbers contains, following list of numeric values: –

  • -343
  • 12
  • 65565
+ List of Date Ranges

To declare a List of Date Ranges in your POJO class, that defines the schema of your store, declare a String and annotate it with @CommaSeperatedDateRanges annotation, provided by Kiara SDK. Then store your list of date ranges as a single string with comma delimited values.

See Date Range, under Ranges category for more detailed information on date range data type.

The default date format is yyyy-MM-dd , but if you want to specify any custom format, you can do so in value argument as described in below example: –

 

@CommaSeperatedDateRanges(value=”MM/dd/yy”)

String listOfDateRanges;

(Sample value: “01/01/19-05/01/19, 06/01/19-12/01/19”)

In the above example, listOfDateRanges contains a list of following date ranges: –

  • 01/01/19-05/01/19,
  • 06/01/19-12/01/19
+ List of Date Time Ranges

To declare a List of Date Time Ranges in your POJO class, that defines the schema of your store, declare a String and annotate it with @CommaSeperatedDateTimeRanges annotation, provided by Kiara SDK. Then store your list of date time ranges as a single string with comma delimited values.

See Date Time Range, under Ranges category for more detailed information on date time range data type.

The default date format is yyyy-MM-dd HH:mm:ss , but if you want to specify any custom format, you can do so in value argument as described in below example: –

 

@CommaSeperatedDateTimeRanges(value=”MM/dd/yy HH:mm”)

String listOfDateRanges;

(Sample value: “01/01/19 11:00-05/01/19 11:00, 06/01/19 10:00-12/01/19 14:00”)

In the above example, listOfDateRanges contains a list of following date ranges: –

  • 01/01/19 11:00 – 05/01/19 11:00,
  • 06/01/19 10:00 – 12/01/19 14:00

Implicit Compression


Kiara internally stores the data in a Trie based structure, which offers an implicit data compression, by memory de-duplication, and hence optimizes memory storage for you.

Lets take a deeper look and understand the relation between the POJO class that defines the schema of your store, and the actual data structure that builds your store in memory.

 

Let’s take a very simple example of an Airport Data store defined by the following POJO class, and some sample data that we want to load.

public class Airport  {

    private String country;

    private String region;

    private String stateCode; 

    private String cityCode;

    private String airportCode;

    private char majorAirport;

}

↑ POJO Class for Data Store

Country Region State City Code Airport Code Major Airport
USA SOUTH FL FLL FLL N
USA SOUTH FL ORL MCO Y
USA SOUTH FL MIA MIA Y
USA SOUTH FL PBI PBI N
USA EAST NY NYC JFK Y
USA EAST NY NYC LGA Y
USA EAST NY NYC EWR Y
USA EAST NY IAG IAG N
USA EAST NY BUF BUF N
USA EAST NY HPN HPN N
USA EAST MA BED BED N
USA EAST MA BOS BOS Y
USA EAST MA PVC PVC N
IN NORTH PB ASR ATQ Y
IN NORTH PB CHD IXC N
IN NORTH PB LDH LUH N
IN CENTER DL DEL DEL Y
IN SOUTH KA MLR IXE N
IN SOUTH KA BLR BLR Y

↑ Sample Data To Load

POJO class merely defines the list of the data attributes/ Column names (Country, region, stateCode, etc) and their corresponding data types.

 

The data, then is stored in a Trie structure, level by level, and each level corresponds to a particular data attribute/ column name from the POJO class. The following figure will make this statement more clear.

 

From the above POJO class, Sample data and Internal Data structure, we can clearly make following observations: –

 

  • Each Data-attribute/ Column, maps to a particular level in the Trie structure.  In this example: –
    • Country maps to level 1 in trie.
    • Region maps to level 2 in trie.
    • State Code maps to level 3 and so on…
  • The position of the field in POJO class decides the level.
    • Country field is defined first, so it gets level 1
    • Region field is defined second, so it gets level 2 in trie.
    • Major Airport is defined sixth, so it gets level 6 in trie.

 

Now the most important fact, given a trie prefix path, we only store unique values at each level, hence the memory de-duplication.

 

There may be thousands or millions of records having “USA” as the country, but we store “USA” only once at level1. Now for a given country say “USA” we may have another thousands of records with Region as “East”, but we would store only one “East” node for “USA”. Thus storing data in a Trie provides us an implicit data compression by memory de duplication.

Maximize Implicit Compression


To maximize implicit compression for better memory optimization, follow the below simple rule: –

 

Rule: Declare primary keys at the end, in your POJO class.

Why?

The first field that we declare in POJO class, gets the first level in Trie. Trie would perform de-duplication on this column to optimize memory usage, but if it is a primary key, no de duplication operation is possible, because, primary keys are already unique.

Memory savings by de duplication follows the following sequence: –

Level 1> level 2 > level 3 …and so on.

The last level gets the least de duplication savings, so keep your unique values/ primary keys in the last level. Doing so, will not impact your search performance!

Explicit Compression


There are various useful techniques, as described below, using which explicit data compression is achievable, to better utilize the precious Heap memory.

 

  • Using Lists

 

 

If you have a data attribute/column that differs while having the same data in other attributes/columns, it’s better to declare the differing column as a list of values and save memory by not storing redundant data.

For example consider the dataset of Student names classified by their field of study subject and family income.

Student Name Subject Family income
Jaohn Doe Computer Science 60,000-100,000
Kelly Carvalho Computer Science 60,000-100,000
Molly Hastrich Electronics 100,000 – 150,000
Sammy Kapoor Computer Science 60,000-100,000

There can be hundreds of students studying the same subject and having the same income bracket for their family. So rather declaring Student name as a single String, its better to declare it as a List of Strings and save memory but not repeating data for other attributes/columns.

Student Name Subject Family income
Jaohn Doe, Kelly Carvalho, Sammy Kapoor Computer Science 60,000-100,000
Molly Hastrich Electronics 100,000 – 150,000

 

Don’t worry about searching capabilities with Lists, as Kiara provides some rich and optimal searching operations that can be performed on List data type.

 

 

  • Using Ranges

 

 

If you have a data attribute/column that differs for a continuous range while having the same data in other attributes/columns, it’s better to declare the differing column as a range of values and save memory by not storing redundant data.

For example consider the dataset of flight frequencies classified by their flight numbers as below: –

Carrier Flight Number Frequency/week
Delta DL101 5
Delta DL102 5
Delta DL110 5
Delta DL112 6
American AA1020 8
American AA1024 8
American AA1030 10

 

There can be hundreds of records with the same flying frequency for a particular carrier that falls within a flight number range. So rather declaring Flight number as a single String, it’s better to declare it as a Range and save memory but not repeating data for other attributes/columns.

Carrier Flight Number Frequency/week
Delta DL101-110 5
Delta DL112 6
American AA1020-1024 8
American AA1030 10

 

 

  • Using List of Ranges 

 

 

If you have a data attribute/column that differs for multiple continuous ranges, while having the same data in other attributes/columns, it’s better to declare the differing column as a List of ranges and save memory by not storing redundant data.

For example consider the dataset of temperature recordings, throughout the day for a given place, as below. The temperature can remain same for few hours then vary a little, and then may be reach the same value as recorded few hours earlier and so on..

City

State

Country

Temperature

Date Time

Monroe

CT

USA

70

2018/05/18 12:00

Monroe

CT

USA

70

2018/05/18 12:30

Monroe

CT

USA

70

2018/05/18 13:30

Monroe

CT

USA

72

2018/05/18 14:00

Monroe

CT

USA

72

2018/05/18 14:30

Monroe

CT

USA

70

2018/05/18 15:00

Monroe

CT

USA

70

2018/05/18 15:30

Monroe

CT

USA

68

2018/05/18 16:00

Monroe

CT

USA

68

2018/05/18 16:30

Monroe

CT

USA

72

2018/05/18 17:00

Monroe

CT

USA

72

2018/05/18 17:30

 

There can be several records with the same temperature reading for a various date-time ranges, for a particular place. So rather than declaring Date Time number as a single Date Time, it’s better to declare it as a List of Date Time Ranges and save memory but not repeating data for other attributes/columns.

 

City

State

Country

Temperature

Date Time

Monroe

CT

USA

70

2018/05/18 12:00 – 2018/05/18 13:30,

2018/05/18 15:00 –

2018/05/18 15:30

Monroe

CT

USA

72

2018/05/18 14:00 –

2018/05/18 14:30,

2018/05/18 17:00 –

2018/05/18 17:30

Monroe

CT

USA

68

2018/05/18 16:00 –

2018/05/18 16:30

 

Again Kiara provides some rich and optimal searching operations with ranges and Lists, so you can make this memory optimization with confidence!

Query Structure


You can query the data store, using the simple method called query, provided by Kiara SDK, as explained below:

Note:-FilterSet is an optional argument, when omitted the function returns List<YOUR_CLASS> as a return type. Therefore when you want all attributes in the result, omit filterSet argument

This style is very similar to SQL, and the following mapping will help you understand it better: –

Sample SQL


select airport_name, city_name from airport
where country = "US" AND
state = "NY";

Kiara’s Query to SQL mapping


Kiara Query element
Sample Sql element
store airport
conditions country = ‘US’ AND state = ‘NY’
filter set airport_name, city_name

Assuming you have loaded your store in a variable called airportStore, the above SQL query can be converted to Kiara’s search query as follows: –

import com.kapoorlabs.kiara.domain.Condition;
import com.kapoorlabs.kiara.search.StoreSearch;

...
StoreSearch storeSearch = new StoreSearch();

//select airport_name, city_name
Set<String> filterSet = new HashSet<>();
filterSet.add("airport_name");
filterSet.add("city_name");

//where country="US" AND state ="NY"
List<Condition> conditions = new LinkedList<>();
conditions.add(new Condition("country", Operator.EQUAL, "US"));
conditions.add(new Condition("state", Operator.EQUAL, "NY"));

List<Map<String, String>> result = storeSearch.query(airportStore, conditions, filterSet);

//OR if you want all attributes in result

List<AirportData> result = storeSearch.query(airportStore, conditions);

Let’s discuss, how to compose conditions in the next section!

Composing Conditions


A condition has following 3 parts: –

  • Column / Data Attribute name
  • Operator
  • Value/ List of values

Few examples below, will help you understand it clearly: –

new Condition(``city``, Operator.EQUAL, ``NYC``);

The first argument in the constructor is the column/ attribute name, followed by the operator and in the end, value/ list of values for the search criteria. ( city = “NYC” )


new Condition(``authors``, Operator.CONTAINS_EITHER, ``Donald E. Knuth, Robert Sedgewick``);

This would mean list of authors should either contain Donald E. Knuth, or Robert Sedgewick.


new Condition(``birth_date``, Operator.LESS_THAN, ``2018-05-20``);

This would mean records with birth date less than May 20, 2018 (birth_date < 2018-05-20).


Don’t worry about the specific operators in the above examples, as we would discuss about them in detail later on, but just grasp the basic structure of how conditions are composed.

Result Structure


Search results come in 2 variants

Variant1: Selected Attributes

When you only need selected few attributes in result, using this variant may result in more performant queries

List<Map<String, String>>

Variant2: All Attributes

When you need all attributes of POJO class, that you used to build your store in result, then use this variant. This variant give its consumer a better developer experience, because consumer doesn’t have to deal with a raw HashMap type.

List<YOUR_CLASS>

Let’s take a deeper look on what it actually means. Consider the following result set as a result of one of your search query, as described in query structure section : –

AIRPORT_NAME
CITY_NAME
Niagara Falls Intl Airport Niagara Falls
Adirondack Regional Airport Saranac Lake
Oneida County Airport Utica
Syracuse Hancock Intl Airport Syracuse
Buffalo Niagara Intl Airport Buffalo
John F Kennedy Intl Airport New York
Stewart Intl Airport Newburgh
Watertown Intl Airport Watertown
Westchester County Airport White Plains
Greater Rochester Intl Airport Rochester
Greater Binghamton Airport/Edwin A Link Field Binghamton
Plattsburgh Intl Airport Plattsburgh
Chautauqua County-Jamestown Airport Jamestown
Corning Regional Airport Elmira
Tompkins County Airport Ithaca
Massena Intl Airport-Richards Field Massena
Ogdensburg Intl Airport Ogdensburg
Long Island MacArthur Airport Islip
LaGuardia Airport New York
Albany Intl Airport Albany
Republic Airport Farmingdale

Now consider your complete search result as a list of records, and each record being a Map<String, String>, whose key and value are both of String type. The keys in the map is nothing but the attribute name/ column name, and value is their corresponding values.

 

So, in this example, the keys of  the Map are AIRPORT_NAME and CITY_NAME

Let’s run some examples, which will make the structure clear.

List<Map<String, String>> result;

result.get(0).get(``AIRPORT_NAME``)

This statement simply means, get me the AIRPORT_NAME of the first record in the result set.

That is Niagara Falls Intl Airport

result.get(5).get(``CITY_NAME``)

This statement simply means, get me the CITY_NAME of the sixth record in the result set.

That is New York

Please note:- The value in the Map of the result is always of a String type. If the data type of the column/attribute name is actually of another type say, a Number, Date, List , etc, then, you would have to parse the String value in the appropriate data type.

Simple Operations


Below is the list of query operations that can be applied on simple types, as explained in simple data types section.

Simple types include String, boolean and all numeric types.

Date Operations


Below is the list of query operations that can be applied on Date/ Date-Time types, as explained in Date types section.