interface RangeContainerFactory {
/**
* builds an immutable container optimized for range queries.
* Data is expected to be 32k items or less.
* The position in the “data” array represents the “id” for that instance
* in question. For the “PayrollResult” example before, the “id” might be
* the worker’s employee number, the data value is the corresponding
* net pay. E.g, data[5]=2000 means that employee #6 has net pay of 2000.
*/
RangeContainer createContainer(long[] data)
}
/**
* a specialized container of records optimized for efficient range queries
* on an attribute of the data.
*/
interface RangeContainer {
/**
* @return the Ids of all instances found in the container that
* have data value between fromValue and toValue with optional
* inclusivity. The ids should be returned in ascending order when retrieved
* using nextId().
*/
Ids findIdsInRange(longfromValue,
long toValue,
boolean fromInclusive, boolean toInclusive)
}
/**
* an iterator of Ids
*/
interface Ids {
/**
* return the next id in sequence, -1 if at end of data.
* The ids should be in sorted order (from lower to higher) to facilitate * the query distribution into multiple containers.
*/
static final shortEND_OF_IDS = -1
short nextId()
}
Validation:
Your implementation should pass the following simple JUnit test before you start tuning for performance with volume data:
public class RangeQueryBasicTest {
private RangeContainer rc
@Before
public void setUp(){
RangeContainerFactory rf = new YourRangeContainerFactory()
rc = rf.createContainer(new long[]{10,12,17,21,2,15,16}) }
@Test
public void runARangeQuery(){
Ids ids = rc.findIdsInRange(14, 17, true, true) assertEquals(2, ids.nextId())
assertEquals(5, ids.nextId())
assertEquals(6, ids.nextId()) assertEquals(Ids.END_OF_IDS, ids.nextId())
ids = rc.findIdsInRange(14, 17, true, false) assertEquals(5, ids.nextId()) assertEquals(6, ids.nextId()) assertEquals(Ids.END_OF_IDS, ids.nextId())
ids = rc.findIdsInRange(20, Long.MAX_VALUE, false, true) assertEquals(3, ids.nextId()) assertEquals(Ids.END_OF_IDS, ids.nextId())
} }
FAQ
Can't we just use a B-Tree/ NavigableMap/HashTable/BinarySearch/Lucene...
Sure, if you want. But there are a number of things we'd like to achieve:
● The container will be in memory and will need to hold lots of data, so we'd like it to be as space
efficient as possible, but not to the extent where speed is significantly compromised. Speed is
king, memory/disk can be purchased.
● Since we hate collecting garbage, we need to be careful about memory allocation during a query, as
my grandmother told me, "allocate not, gc not".
● Code should be clear and understandable.
● The rough order of trade off to consider for this case is: search performance, efficient
usage of memory, simplicity and maintainability of the code
These objectives are often at odds - we'd like to explore what's possible.