Object Library - Dynamic data storage, type safety and validation

rhelgeby · **Author**

Object Library

Note: This seems to work well for me, but this is still experimental and may have bugs.

Main Features

Key/value object storage manager
Create objects with dynamic content. Data is internally stored in ADT Arrays.

Object data can be accessed through get/set functions (which implies validation).

Mutable or immutable objects
Objects can be either mutable or immutable. Immutable objects can't modify their type (add/remove keys) when created, but data in existing keys can be modified.

Both use a type descriptor as a template. Mutable objects store a bundled descriptor so they can be modified independently, while immutable objects store a reference to a shared read-only type descriptor to save memory.

Supports built-in and custom data validation
The library supports basic validation constraints such as min/max limits in addition to a callback where the user can do custom validation of the object.

Type safe (as far as it's possible in SourcePawn)
Each key is assigned a type. This structure requires you to use appropriate get/set functions where the library will check if you use the correct function at runtime - and the compiler will be able to do tag checks.

Import (and validate) data from Valve's KeyValue file format
Creates objects based on the contents of a KeyValue file, and a user defined object type descriptor with optional validation constraints.

Reflection
Objects or types can be inspected at run time. Loop through keys, get data types or validation constraints.

Why Use This

This is an alternative to enumerated arrays. If you use various types of data sets, such as player profiles or weapon profiles, you don't have to create specific storage implementation for each data set when using this library. You just need to define types and create objects.

If you have many data sets, a "hard coded" manual solution for each set will result in somewhat repeated code.

If you also have validation constraints, that code will be repeated too.

This library will help you with everything from reading keyvalue files to storage and validation. You just need to declare types and validation constraints and the library will enforce it.

What It Doesn't Do

Memory management
You'll have to make sure objects and types are deleted when no longer in use. Otherwise there will be memory leaks. Read the API documentation carefully to see which functions that return resources that must be released again. (Hint: Cloning or creating objects and types, parser context object)

It's not a tree structure
Regular KeyValue files use a tree structure. This object manager use a plain associative array structure where each object has keys mapped to values.

However, a tree structure is indirectly supported by linking object references together. Objects can store references to other objects. It has its own object data type so that the compiler can do tag checking on object references as well.

Resource Usage

Small processing overhead
The main goal isn't a super efficient object manager, but efficient enough. Because of type checking and validation there is a small overhead when modifying data. These checks are basically comparisons of primitive values and shouldn't be an issue with normal usage.

If you have code that's very busy you should consider using buffers or caches in front of the data storage. Use the SourceMod profiler to measure if this really is an issue in your code - before optimizing.

Memory overhead
Since it's a dynamic storage manager, objects need to store meta data and will use a little bit more memory than a static hard coded solution would. But it's also a lot more flexible solution.

However, immutable objects are more memory efficient than mutable objects, since immutable objects share their type descriptor between objects of the same type. Mutable objects have their own private type descriptor.

Use immutable objects when you can to reduce memory overhead, especially on object types that aren't modified after creation.

The memory overhead also depends on how much space you reserve for each value entry. Memory will be wasted if you reserve more space than the longest value requires.

An object with 4 strings of 256 byte will require about 2 KB, including the object array itself, list of null keys and a type descriptor reference.

The type descriptor for this object use 8-9 KB where the trie (key name index) use 8 KB alone (probably a canditate for the SourceMod team to optimize).

A little memory and CPU overhead is a trade off for writing more code yourself. It can still be efficient if used correctly.

Examples

Creating Objects

Spoiler

Adding Validation Constraints

Spoiler

Reflection

Spoiler

Parse and Validate KeyValue File

Spoiler

PHP Code:


			
// Load and prepare a keyvalues file.
new Handle:kv = CreateKeyValues("Root");
FileToKeyValues(kv, "objectlibtest-kvfull.txt");    // Must be located in root of game directory.
KvRewind(kv);

// Declare types.

// DataType section. Some keys have constraints.
new ObjectType:dataTypes = ObjLib_CreateType(16);
ObjLib_AddKey(dataTypes, "cell", ObjDataType_Cell, ObjLib_GetCellConstraints(true, true, true, 5, 15));
ObjLib_AddKey(dataTypes, "bool", ObjDataType_Bool);
ObjLib_AddKey(dataTypes, "float", ObjDataType_Float, ObjLib_GetFloatConstraints(true, true, true, 1.0, 5.0));
ObjLib_AddKey(dataTypes, "string", ObjDataType_String);

// Root section. NestedSections doesn't use any object constraints so that
// the parser will add objects and keys automatically (strings).
new ObjectType:rootType = ObjLib_CreateType();
ObjLib_AddKey(rootType, "DataTypes", ObjDataType_Object, ObjLib_GetObjectConstraints(true, dataTypes));
ObjLib_AddKey(rootType, "NestedSections", ObjDataType_Object);

// Get a parser context. This object stores parser state and settings. Most
// default settings will do fine in this example, but it's recommended to
// give it a name so it can be identified in error logs.
new Object:parseContext = ObjLib_GetParseContext("objectlibtest", rootType);

// Run parser. Parsed sections are added to a list.
new Handle:list = ObjLib_ParseInListMode(kv, parseContext);

// All sections (except empty) in the KeyValue file is now parsed,
// validated and stored in objects. The objects for the root sections are
// added to the list. Sub sections are referenced through keys in objects.

// Parser context object must be deleted when no longer in use, otherwise
// there might be a memory leak.
ObjLib_DeleteParseContext(parseContext);

Code:

"Root"
{
    "DataTypes"
    {
        "cell"              "10"
        "bool"              "true"
        "float"             "2.5"
        "string"            "This is a string."
    }
    
    "NestedSections"
    {
        "valueA"            "1"
        "sectionA"
        {
            "valueB"        "2"
        }
        "sectionB"
        {
            "valueC"        "3"
            "sectionC"
            {
                "valueD"    "4"
            }
            "sectionD"
            {
                "valueE"    "5"
            }
        }
    }
    
    "EmptySection"
    {
    }
}

Types and Callbacks

Spoiler

PHP Code:


			

/**
 * Object tag.
 */
enum Object
{
    INVALID_OBJECT = 0
}

/*____________________________________________________________________________*/

/**
 * Number of bytes reserved for key names.
 */
#define OBJECT_KEY_NAME_LEN     32

/*____________________________________________________________________________*/

/**
 * Number of bytes reserved for string buffers.
 */
#define OBJLIB_MAX_STRING_LEN   255

/*____________________________________________________________________________*/

/**
 * Object type descriptor tag.
 */
enum ObjectType
{
    INVALID_OBJECT_TYPE = 0
}

/*____________________________________________________________________________*/

/**
 * Data types that objects can store.
 */
enum ObjectDataType
{
    ObjDataType_Any,            /** Does not include arrays/strings. */
    ObjDataType_Cell,
    ObjDataType_Bool,
    ObjDataType_Float,
    ObjDataType_Handle,
    ObjDataType_Function,
    ObjDataType_Array,
    ObjDataType_String,
    ObjDataType_Object,
    ObjDataType_ObjectType
}

/*____________________________________________________________________________*/

/**
 * Internal use only!
 * Object entries. Used internally to name array indexes.
 *
 * Possible future optimization:
 * Separate strings and numeric data. Currently, each entry has reserved the
 * same amount of space (block size). If an object only has one long string and
 * several number entries, there will be a lot of wasted space.
 *
 * Memory usage can be reduced by storing numeric data in a separate array with
 * block size of 1 cell.
 *
 * If further optmimization is required the data may be serialized, but in a way
 * that allow random access (both reading and writing).
 */
#define OBJECT_DATA_LEN         3   /** Number of elements below. */
enum ObjectData
{
    Object_Data = 0,    /** Data entry. Handle to raw data array. */
    Object_NullKey,     /** Handle to array that tell which key that is null (not initialized). */
    Object_MetaData     /** Object meta data entry. Handle to object type descriptor. */
}

/*____________________________________________________________________________*/

/**
 * Internal use only!
 * Object type entries. Used internally to name array indexes.
 */
#define OBJECT_TYPE_DATA_LEN    9   /** Number of elements below. */
enum ObjectTypeData
{
    ObjectType_Locked = 0,      /** Whether type descriptor is read only. */
    ObjectType_ParentObject,    /** Reference to parent object, if any. Used by mutable objects. */
    ObjectType_KeySize,         /** Block size of key name array. */
    ObjectType_BlockSize,       /** Block size of raw data array. */
    ObjectType_Keys,            /** Handle to array of key names (case sensitive). */
    ObjectType_NameIndex,       /** Handle to trie index of key names. */
    ObjectType_DataTypes,       /** Handle to array of data type for each value entry. */
    ObjectType_Constraints,     /** Handle to validation constraints info for each key. */
    ObjectType_ErrorHandler     /** General error handler. Optional. */
}

/*____________________________________________________________________________*/

/**
 * Error types. Used in error handler callbacks.
 */
enum ObjLibError
{
    ObjLibError_InvalidKey,         /** Key name or index is invalid. */
    ObjLibError_InvalidKeyType,     /** Key type mismatch. */
    ObjLibError_NullDataKey,        /** Attempted to read from uninitialized key. */
    ObjLibError_KeyExist,           /** Attempted to create a key with an existing name. */
    ObjLibError_Immutable,          /** Object or Object type is immutable. */
    ObjLibError_ValidationError     /** Data constraint violation. */
}

/*____________________________________________________________________________*/

/**
 * Number of bytes reserved for various strings.
 */
#define OBJECTLIB_WHITELIST_LEN 64      /** Space reserved in whitelist/blacklist for string constraints. */

/*____________________________________________________________________________*/

/**
 * General error callback.
 *
 * The return value may be used to decide what to do next. In most cases it's
 * irrelevant and the value is ignored. But the keyvalue parser is using the
 * return value to abort or continue parsing.
 *
 * It would be considered a good habit to return a proper value even if it's
 * ignored in some cases.
 *
 * @param typeDescriptor    Related type descriptor.
 * @param errorType         Type of error.
 * @param message           Error message.
 * @param object            Related object, if available.
 * @param data              Data bundle with additional data, if available.
 *
 * @return                  What to do next.
 *                          * Plugin_Handled - Error is handled and further
 *                          processing should be aborted.
 *                          * Plugin_Continue - Continue processing if possible.
 *                          This is useful to let the parser continue parsing
 *                          remaining keys.
 */
functag public Action:ObjLib_ErrorHandler(ObjectType:typeDescriptor, ObjLibError:errorType, const String:message[], Object:object, Object:data);

/*____________________________________________________________________________*/

/**
 * Custom key validation callback.
 *
 * @param object            Object being validated.
 * @param keyName           Key being validated.
 * @param dataType          Type of key.
 * @param values            Value(s) to validate. If the value is a single value
 *                          it's stored at index 0.
 * @param size              Number of elements in values parameter.
 * @param typeDescriptor    Object type (meta data).
 */
functag public bool:ObjLib_KeyValidator(Object:object, const String:keyName[], ObjectDataType:dataType, const any:values[], size, ObjectType:typeDescriptor);

Source
The newest code is available in SourceMod project base on Google Code (the "libraries" folder in the project-components repository). It's still experimental and some parts of this library may be broken when I work on it.

More documentation and full example usage is provided in the docs folder.

An older snapshot of the library collection is attached below:

Minoost

Great Job!

alongub · *Last edited by alongub; 04-24-2013 at 02:27.*

Is there any ObjLib_TypeOf method that returns the ObjectType of an Object?
How would you implement inheritance? Using ObjLib_CloneType?
What about performance and memory usage? Do you have some benchmarks of this vs enumerated arrays?

rhelgeby · *Last edited by rhelgeby; 04-26-2013 at 06:53.*

ObjLib_GetTypeDescriptor is the same as TypeOf, although ObjLib_TypeOf would be a better name.

Edit: Actually I've already made ObjLib_TypeOf a long time ago (in object.inc).

Inheritance is not in my plans and probably adds too much complexity. If you've studied my code and have ideas you're welcome to share it here, or in the issue tracker on google code.

My first thoughts is to add a key for a base type in the type descriptor (ObjectType) and then modify the accessor functions to also check the base type when reading or writing data. It might add some complexity in my KeyValue parser when it's validating keys and sections. I'm not sure what side effects I'd get from doing this. Hopefully my code is flexible enough that I could implement it later.

The intention of this library is to load configuration data from KeyValue trees into objects with automatic validation, not adding object oriented programming in SourcePawn. In the Zombie:Reloaded plugin we do this "manually" with enumerated arrays several places - with a lot of duplicated or very similar code. I want it just to declare data structures and constraints, so this library will handle the rest.

There obviously is a performance overhead because it has to work with meta data too, but I haven't done any benchmarks so far.

Internally it's all ADT Arrays and some ADT Tries for fast lookup in arrays. When objects are created all keys for that object is pre-created so it doesn't add or remove array elements when accessing object. The arrays are also created with a predefined size where possible so they can allocate all memory they need right away.

Some rough estimates of resource usage:

Read data in object, ObjLib_GetCell for instance:

Validation: object reference (0), null key (2), data type (2).
Get object type for meta data access (1).
Look up key name in trie to get key index (2).
Get object data array (1).
Get value in object data array (1).

The numbers are how many native calls it does to access data and meta data. 9 native calls in total for each get function. The library do a lot of native calls, but in an old benchmark (attached) I measured a simple dynamic function call overhead to be around 200 to 400 nanoseconds. It's probably a bit faster when calling natives. Parameters I pass through native calls are mostly cells and a few short strings. You have to do really many native calls to affect performance with native calls alone, so I don't consider that an issue.

Then the only performance concern left is all the meta data it has to access. That is mostly getting a single cell from an array and comparing it to something. The trie lookup is the most expensive part, which is O(keyLength) if I'm correct, which is pretty fast when key names are short (less than ~30 I suppose).

Because of this, the order of get functions is O(keyLength), which is much better than stuff like O(log n) and O(n^2). Correct me if I'm wrong here.

Write data (ObjLib_SetCell):

Validation: object reference (0), data type (2).
Get object type for meta data access (1).
Look up key name in trie to get key index (2).
Get constraints object, if any (1). If none it will skip all constraint stuff below.
Get constraint type (1).
Delegate work to correct constraint handler (x).
Get object data array (1).
Store value in object data array (1).
Remove null flag (1).

It's quite similar to reading data but now it has to go through constraint handlers to validate data being set, if any. This will read various constraint settings from the constraint object and validate the value being set. Usually O(1) stuff, or O(keyLength) when it reads keys. It depends on what kind of constraint each key has. Some constraint allow callbacks where you can do custom validation.

The goal isn't to have a super efficient object library, but good enough. I plan to figure out a cache solution that can be used to dump and load object data from enumerated arrays. Then you can use enumerated arrays directly in hot areas and flush/update the cache when ready.

Memory usage is obvious. It stores meta data about keys and apparently the trie lookup index alone use 8 KB with just one small key and value. Type descriptors use most memory, but they can be shared between objects. It also creates a lot of array handles so I'm worried it would be a hell to troubleshoot memory leaks (a feature in SM for adding a short description to handles would help a lot, but again, performance).

Objects use one data array for storing both cells and strings. Lots of space is wasted in non-string keys since it has to reserve space for string keys for every array element. Not sure if that is the case for enumerated arrays, but there you have to reserve space in the first dimension. Arrays and strings in the object data array can be moved to it's own data array, but that will mess up key indexes and I haven't figured out an elegant solution there. Internally the library heavily relies on accessing keys by index.

This is what you get with abstract libraries. You get a lot for free, but in turn it has some small overhead and use more memory (though we're still just talking about kilobytes, max a very few megabytes).

Recently I made a huge commit that adds lookup constraints where it reads a name (for instance a class name in the ZR plugin) and looks it up through a custom callback or a list/trie and replace it with an actual (class) object reference, with validation. It can also convert a lookup name to a number, enumerated value, an array, or another string. This is achieved by just declaring some meta data and making the lookup callback, and my KeyValue parser handles this automatically. In the old ZR plugin it adds a lot of messy code, now you just need the declaration.

The attached plugin is an old benchmark I did with dynamic function calls. It does a lot of native calls so native call overhead won't be higher than this.

rhelgeby · *Last edited by rhelgeby; 04-25-2013 at 08:44.*

Made some benchmarks with graphs: Object Library Benchmark

Based on this revision: r195

Server CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
RAM: 4GB

Basically we're talking about iteration times below 10 µs. Constraints and mutable objects are most expensive, but still quite fast. The rest is 1-2 µs.

What's surprising is that the iteration time isn't constant. It's doing the same thing up to a million times, but there are spikes and jumps. Though there might be something in my enviroment, some working processes or scheduled tasks affecting this. Some of the graphs are amplifying the details even though the numbers are pretty stable.

Since I double iterations for each test, the total time also increase exponentially. If there were any badly optimized code it would grow much faster.

The objects in this test only has one key, to test for general overhead when reading and writing data. With multiple keys, key name lookup will be a tiny bit slower.

Ermert1992

Well Done!

rhelgeby

Thank you. I recently started working on typed collection objects and had to refactor constraint handlers. It's not tested yet and constraints may be broken.