Explicit Compression
There are various useful techniques, as described below, using which explicit data compression is achievable, to better utilize the precious Heap memory.
If you have a data attribute/column that differs while having the same data in other attributes/columns, it’s better to declare the differing column as a list of values and save memory by not storing redundant data.
For example consider the dataset of Student names classified by their field of study subject and family income.
Student Name |
Subject |
Family income |
Jaohn Doe |
Computer Science |
60,000-100,000 |
Kelly Carvalho |
Computer Science |
60,000-100,000 |
Molly Hastrich |
Electronics |
100,000 – 150,000 |
Sammy Kapoor |
Computer Science |
60,000-100,000 |
There can be hundreds of students studying the same subject and having the same income bracket for their family. So rather declaring Student name as a single String, its better to declare it as a List of Strings and save memory but not repeating data for other attributes/columns.
Student Name |
Subject |
Family income |
Jaohn Doe, Kelly Carvalho, Sammy Kapoor |
Computer Science |
60,000-100,000 |
Molly Hastrich |
Electronics |
100,000 – 150,000 |
Don’t worry about searching capabilities with Lists, as Kiara provides some rich and optimal searching operations that can be performed on List data type.
If you have a data attribute/column that differs for a continuous range while having the same data in other attributes/columns, it’s better to declare the differing column as a range of values and save memory by not storing redundant data.
For example consider the dataset of flight frequencies classified by their flight numbers as below: –
Carrier |
Flight Number |
Frequency/week |
Delta |
DL101 |
5 |
Delta |
DL102 |
5 |
Delta |
DL110 |
5 |
Delta |
DL112 |
6 |
American |
AA1020 |
8 |
American |
AA1024 |
8 |
American |
AA1030 |
10 |
There can be hundreds of records with the same flying frequency for a particular carrier that falls within a flight number range. So rather declaring Flight number as a single String, it’s better to declare it as a Range and save memory but not repeating data for other attributes/columns.
Carrier |
Flight Number |
Frequency/week |
Delta |
DL101-110 |
5 |
Delta |
DL112 |
6 |
American |
AA1020-1024 |
8 |
American |
AA1030 |
10 |
If you have a data attribute/column that differs for multiple continuous ranges, while having the same data in other attributes/columns, it’s better to declare the differing column as a List of ranges and save memory by not storing redundant data.
For example consider the dataset of temperature recordings, throughout the day for a given place, as below. The temperature can remain same for few hours then vary a little, and then may be reach the same value as recorded few hours earlier and so on..
City
|
State
|
Country
|
Temperature
|
Date Time
|
Monroe
|
CT
|
USA
|
70
|
2018/05/18 12:00
|
Monroe
|
CT
|
USA
|
70
|
2018/05/18 12:30
|
Monroe
|
CT
|
USA
|
70
|
2018/05/18 13:30
|
Monroe
|
CT
|
USA
|
72
|
2018/05/18 14:00
|
Monroe
|
CT
|
USA
|
72
|
2018/05/18 14:30
|
Monroe
|
CT
|
USA
|
70
|
2018/05/18 15:00
|
Monroe
|
CT
|
USA
|
70
|
2018/05/18 15:30
|
Monroe
|
CT
|
USA
|
68
|
2018/05/18 16:00
|
Monroe
|
CT
|
USA
|
68
|
2018/05/18 16:30
|
Monroe
|
CT
|
USA
|
72
|
2018/05/18 17:00
|
Monroe
|
CT
|
USA
|
72
|
2018/05/18 17:30
|
There can be several records with the same temperature reading for a various date-time ranges, for a particular place. So rather than declaring Date Time number as a single Date Time, it’s better to declare it as a List of Date Time Ranges and save memory but not repeating data for other attributes/columns.
City
|
State
|
Country
|
Temperature
|
Date Time
|
Monroe
|
CT
|
USA
|
70
|
2018/05/18 12:00 – 2018/05/18 13:30,
2018/05/18 15:00 –
2018/05/18 15:30
|
Monroe
|
CT
|
USA
|
72
|
2018/05/18 14:00 –
2018/05/18 14:30,
2018/05/18 17:00 –
2018/05/18 17:30
|
Monroe
|
CT
|
USA
|
68
|
2018/05/18 16:00 –
2018/05/18 16:30
|
Again Kiara provides some rich and optimal searching operations with ranges and Lists, so you can make this memory optimization with confidence!