NAV

Guide

In this tutorial we will build a system pretty similar in functionality to instagram. The following features will be implemented:

While pretty compact, this system will allow us to work with the following aspects of the EGF2:

We will start with domain modelling, proceed with EGF2 deployment and necessary back-end changes and will create web, iOS and Android apps that will work with the system.

Full implementation of the back-end, iOS and Android apps are available at GitHub

Domain Modeling

I think it makes sense to start with a User model and proceed from there.

EGF2 comes with a User object pre-defined. We can expand it with more fields and edges if necessary, but we should not remove it unless we don’t want to expose any user related APIs.

Here is how it looks, in pseudo JSON:

User
{
    "object_type": "user",
    "name": HumanName,
    "email": "<string>",
    "system": "<string, SystemUser object ID>",
    "verified": Boolean
}

I think it is enough for a start. Looking at the specification of features it is clear that we need to support the following user roles:

Let’s define the roles.

CustomerRole
{
    "object_type": "customer_role",
    "user": User
}
AdminRole
{
    "object_type": "admin_role",
    "user": User
}

User’s roles should be stored using roles edge of a User object. We usually refer to an edge using a simple notation <Object type>/<edge name>, for example User/roles. ACL support assumes that roles are store using this edge. We will use ACL in order to control access to objects and edges so we will use User/roles edge to store user’s roles.

While we are at the user’s part of it let’s also add edges for the following bit - User/followers and User/follows.

That’s enough for the users part, let’s get to the meat of the system - Posts.

Post
{
    "object_type": "post",
    "creator": User,
    "image": File,
    "description": "<string>"
}

As you can see, we specify image property as a File. This object type is supported by EGF2 out of the box.

Users create posts, so we need an edge to store a list of posts created by a User. Edge User/posts should do just fine.

We will use AdminRole/offending_post edge to keep track of posts that were found offensive. Post/offended edge will hold a list of users that found this Post offensive.

Registered users can post comments for posts. Comment can be modelled as:

Comment
{
    "object_type": "comment",
    "creator": User,
    "post": Post,
    "text": "<string>"
}

We will connect Posts to Comments using Post/comments edge.

One more thing. We need to decide how user’s timeline should behave. There are at least two ways we can approach this:

We will take the second route as it scales better. There are a couple of consequences of this decision:

Both consequences can be avoided at the cost of additional processing, but I don’t think it is really necessary from the business logic standpoint.

Info about object and edges can (and I think should) be summarised in Model section of the system documentation, for more info see Suggested Documentation Format section of the EGF2 documentation.

That’s it for domain modelling, at least for now. In the next section I will show you how what needs to be done to implement this model with EGF2.

Configuring Objects and Edges

In the previous section I outlined what objects and edges we need to have in the system. Now we are ready to actually add them!

Open config.json file from client-data service. First we need to adjust User object declaration. We have added some edges so we need to let the system know about them. After the changes User object declaration should look like:

    "user": {
        "code": "03",
        "GET": "self",
        "PUT": "self",
        "fields": {
            "name": {
                "type": "struct",
                "schema": "human_name",
                "required": true,
                "edit_mode": "E"
            },
            "email": {
                "type": "string",
                "validator": "email",
                "required": true,
                "unique": true
            },
            "system": {
                "type": "object_id",
                "object_types": ["system_user"],
                "required": true
            },
            "verified": {
                "type": "boolean",
                "default": false
            },
            "date_of_birth": {
                "type": "string",
                "edit_mode": "E"
            }
        },
        "edges": {
            "roles": {
                "contains": ["customer_role", "admin_role"],
                "GET": "self"
            },
            "posts": {
                "contains": ["post"],
                "GET": "self",
                "POST": "self",
                "DELETE": "self"
            },
            "timeline": {
                "contains": ["post"],
                "GET": "self"
            },
            "follows": {
                "contains": ["user"],
                "GET": "self",
                "POST": "self",
                "DELETE": "self"
            },
            "followers": {
                "contains": ["user"],
                "GET": "self"
            }
        }
    }

As you can see if you compare original User declaration with the changed version I made the following changes:

With User object out of the way we can start adding objects that are totally new to the system. Let’s start with Roles objects:

    "customer_role": {
        "code": "10",
        "GET": "self",
        "fields": {
            "user": {
                "type": "object_id",
                "object_types": ["user"],
                "required": true
            }
        }
    }

    "admin_role": {
        "code": "11",
        "GET": "self",
        "fields": {
            "user": {
                "type": "object_id",
                "object_types": ["user"],
                "required": true
            }
        },
        "edges": {
            "offending_posts": {
                "contains": ["post"],
                "GET": "self",
                "DELETE": "self"
            }
        }
    }

These two objects are pretty straightforward, the only thing of note here is that AdminRole/offending_posts edge is readable and deletable by an admin, but admins can’t create them directly. The system will create these edges automatically.

Next object is Post, declaration:

    "post": {
        "code": "12",
        "GET": "any",
        "PUT": "self",
        "DELETE": "self",
        "fields": {
            "creator": {
                "type": "object_id",
                "object_types": ["user"],
                "required": true,
                "auto_value": "req.user"
            },
            "image": {
                "type": "object_id",
                "object_types": ["file"],
                "required": true,
                "edit_mode": "NE"                
            },            
            "desc": {
                "type": "string",
                "required": true,
                "edit_mode": "E",
                "min": 2,
                "max": 4096                
            },
        },
        "edges": {
            "comments": {
                "contains": ["comment"],
                "GET": "any",
                "POST": "registered_user"
            },
            "offended": {
                "contains": ["user"],
                "GET": "admin_role",
                "POST": "registered_user"
            }
        }
    }

Our Post objects will be accessible by anybody, even users who have not registered with the system. Only registered users can comment and mark posts as offensive. Only admins can see who finds a particular post offensive. Please note that once a Post is created fields creator and image are not editable, thus we have "edit_mode": "NE" for them. In case "edit_mode" is ommitted it means that a field can not be specified when an object is created and it can not be changed. Usually such fields are set automatically by the system.

Another thing of note with Post is "auto_value" parameter in "creator" field declaration. As you can see, "creator" field can’t be affected by a user at all. "auto_value": "req.user" parameter specifies that this field should be populated with User object of the currently authenticated user.

And the last object we will define is Comment:

    "comment": {
        "code": "13",
        "GET": "any",
        "PUT": "self",
        "DELETE": "self",
        "fields": {
            "creator": {
                "type": "object_id",
                "object_types": ["user"],
                "required": true,
                "auto_value": "req.user"
            },
            "post": {
                "type": "object_id",
                "object_types": ["post"],
                "required": true,
                "edit_mode": "NE"                
            },            
            "text": {
                "type": "string",
                "required": true,
                "edit_mode": "E",
                "min": 2,
                "max": 2048                
            },
        }
    }

We’ve got no edges defined for Comment objects. Comments are deletable and editable by creators.

With the prepared configuration we can proceed to deploying the system!

Deployment Part 1

We’ve got client-data config ready, let’s start deploying EGF2. I will explain how we can deploy framework and all necessary tools on a single Amazon Linux instance in AWS. Get an instance with at least 2GB of RAM, we are going to have a lot of stuff here. Also, please create an S3 bucket that will be used by the file service. An IAM role that allows full access to this S3 bucket should be configured and assigned to this instance.

But before we get to the console we need to fork necessary framework repositories. We will start with the following services (repositories):

Once you’ve got them forked please apply the changes to the client-data/config.json.

Before we deploy EGF2 services we need to install and configure RethinkDB.

RethinkDB

As we are going to a single instance deployment it makes sense to use RethinkDB. Cassandra in this situation seems like a bit of an overkill :-)

To install RethinkDB from the development repo please run the following commands:

In order to configure RethinkDB please create a file /etc/rethinkdb/instances.d/guide-db.conf with the following content:

bind=all
server-tag=default
server-tag=us_west_2
server-tag=guide_db

Please note that server-tag=us_west_2 means that we are using an instance started in US-WEST-2 region of AWS.

And now you can start it:

RethinkDB is now ready.

Deploying Services

Please install the latest stable version of the Node.js.

Clone the services that we decided to deploy in /opt directory. Run npm install in every service directory.

In order to run client-data service we need to specify what port we want it to run on and configure RethinkDB parameters:

{
    ...
    "port": 8000,
    "storage": "rethinkdb",
    "log_level": "debug",
    "rethinkdb": {
        "host": "localhost",
        "db": "guide"
    }
    ...
}

You can pick another port, if you’d like to. We just need to remember it as we will point other services to client-data using it.

To configure auth service please set the following parameters:

{
    ...
    "port": 2016,
    "log_level": "debug",
    "client-data": "http://localhost:8000"
    ...
}

And client-api is configured with:

{
    ...
    "port": 2019,
    "log_level": "debug",
    "client-data": "http://localhost:8000",
    "auth": "http://localhost:2016"
    ...
}

As you can see, client-data is the only service that talks to RethinkDB. Other services are using it for all data needs.

Before we start services we need to initialise DB. We can do it easily:

node index.js --config /opt/client-data/config.json --init

Output of this task will include a line:

SecretOrganization = "<SecretOrganization object ID>"

Please use the SecretOrganization object ID string in:

"graph": {
...
    "objects": {
        "secret_organization": "<object ID>"
    }
...
}

This will let client-data know object ID of SecretOrganization singleton.

In order to start a service please do the following:

Where <service> takes values “client-data”, “client-api”, “auth”. Please start client-data first, before other services.

NGINX

With our services started we need to do one more thing - install and configure NGINX with yum install -y nginx.

Create a file called api-endpoints in /etc/nginx folder. Add the following text to it:

#Client API
location /v1/graph {
    proxy_pass http://127.0.0.1:2019;
}

#Auth
location /v1/register {
    proxy_pass http://127.0.0.1:2016;
}
location /v1/verify_email {
    proxy_pass http://127.0.0.1:2016;
}
location /v1/login {
    proxy_pass http://127.0.0.1:2016;
}
location /v1/logout {
    proxy_pass http://127.0.0.1:2016;
}
location /v1/forgot_password {
    proxy_pass http://127.0.0.1:2016;
}
location /v1/change_password {
    proxy_pass http://127.0.0.1:2016;
}
location /v1/resend_email_verification {
    proxy_pass http://127.0.0.1:2016;
}

We also need to update /etc/nginx/nginx.conf file, add line in the following section:

server {
        listen       80;
        server_name  localhost;
        include api-endpoints;
...

We are adding one line - include api-endpoints;.

After that we are ready to start NGINX:

That’s it - the system is up!

What have we got:

  1. Full Graph API - all endpoints are functional and know about our designed model.
  2. Registration, login, logout (without verification emails).

I will continue with deployments in the next post - we will add ElasticSearch, sync, pusher and file services to the system.

Deployment Part 2

In the previous section we’ve got part of the services deployed along with RethinkDB and NGINX.

Now we need to finalise our works with the addition of ElasticSearch and the rest of the services. We will have search and file related endpoints powered as a result, email notifications will be sent for the email verification and forgot password features.

Let’s start with ElasticSearch.

ElasticSearch

First do RPM import: rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch

Create file /etc/yum.repos.d/elasticsearch.repo with content:

[elasticsearch-2.x]
name=Elasticsearch repository for 2.x packages
baseurl=http://packages.elastic.co/elasticsearch/2.x/centos
gpgcheck=1
gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch
enabled=1

Install package:

yum install -y elasticsearch

And to start ElasticSearch please do:

We need to add info on ES to the config files of our deployed and running services.

This line should be added to the end of client-api and auth service configs:

"elastic": { "hosts": ["localhost:9200"] }

Restart client-api and auth services.

client-api is using ES to power the search endpoint. auth needs ES to lookup users by email.

We also need to configure one additional endpoint with our NGINX, please add the following to the /etc/nginx/api-endpoints file:

location /v1/search {  
    proxy_pass http://127.0.0.1:2019;
}

Now we have search endpoint ready!

Deploying Services

Please fork and then clone file, sync and pusher services to the /opt folder.

sync

To configure sync service please change the following parameters:

{
    ...
    "log_level": "debug",
    "client-data": "http://localhost:8000",
    "queue": "rethinkdb",
    "consumer-group": "sync",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "guide",
        "table": "events",
        "offsettable": "event_offset"
    }
    ...
}

With the single instance deployment we are using RethinkDB changes feed feature as a system event bus. While it works great for development and testing please note that it is not a scalable solution. For production systems please use "queue": "kafka" setup instead. "rethinkdb" parameter holds config necessary for sync to connect to RethinkDB directly.

The rest of the config we will leave intact for now. It contains system ES indexes that are used by some of the services we have. I will do a separate post on sync and ES indexes shortly.

We are ready to start sync:

sync is running!

file

Let’s configure file service:

{
    "log_level": "debug",
    "port": 2018,
    "auth": "http://localhost:2016",
    "client-data": "http://localhost:8000",
    "s3_bucket": "egf2-guide-images",
    "queue": "rethinkdb",
    "consumer-group": "file",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "guide",
        "table": "events",
        "offsettable": "event_offset"
    }
}

Please note parameter "s3_bucket" - you need to use a name of an AWS S3 bucket here. Your instance should have an IAM role that allows full access to this bucket.

We will leave parameter called "kinds" intact for now. What it allows us to do is to have predefined image resize groups. When a new image is created a kind can be specified, file service will then create necessary resizes automatically.

In order to be able to resize images file service needs ImageMagick. To install the package please do yum install ImageMagick

And start the service:

Let’s allow file service endpoints with our NGINX, please add the following to the /etc/nginx/api-endpoints file:

location /v1/new_image {
    proxy_pass http://127.0.0.1:2018;
}

location /v1/new_file {
    proxy_pass http://127.0.0.1:2018;
}

pusher

Let’s configure pusher service:

{
    "log_level": "debug",
    "port": 2017,
    "auth": "http://localhost:2016",
    "client-data": "http://localhost:8000",
    "web_socket_port": 80,
    "email_transport": "sendgrid",
    "template_host": "localhost",
    "ignored_domains": [],
    "queue": "rethinkdb",
    "consumer-group": "pusher",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "guide",
        "table": "events",
        "offsettable": "event_offset"
    }   
}

"email_transport": "sendgrid" option specifies that we will use SendGrid to send our email notifications.

It is important to note that there is no SendGrid API key in config file here. In order to simplify config file deployment we assume that configs are not private, thus we don’t store any sensitive information in configs. Instead, we have a singleton object of type SecretOrganization that stores API keys etc.

Please set value for key “sendgrid_api_key” with your SendGrid API key. It can be done as follows:

curl -XPUT -H "Content-Type: application/json" http://localhost:8000/v1/graph/<SecretOrganization object ID> -d '{"secret_keys":[{"key":"sendgrid_api_key","value":"<sendgrid_api_token>"}]}'

Where <SecretOrganization object ID> can be found in client-data config, "graph"/"objects"/"secret_organization" field.

After that we can start the service:

We now have all the services and tools in place that are necessary for the Guide system at this stage. The only services that were not deployed yet are:

Both of them are not necessary at the moment.

The system we have deployed so far can be utilised by web and mobile system developers to create and test client apps. We find it a great help to have good part of back-end in place as soon as possible, it helps client app development a lot.

We will continue with adding ElasticSearch indexes in the next section.

Now that we’ve got the system operational it is time to add search functionality to our back-end.

We will add search for Posts and for Users. Let’s start with separate indexes and then we can do a custom index that works for both Users and Posts simultaneously.

Automatic Indexes

In order to add Post index please add the following text to the sync service config (can be found at /opt/sync/config.json), "elastic"/"indices" section:

...
"post": {
    "object_type": "post",
    "mapping": {
        "id": {"type": "string", "index": "not_analyzed"},
        "description": {"type": "string"},
        "created_at": {"type": "date"}
    }
}
...

Please check out ElasticSearch documentation for supported mapping options.

User index is already partially supported for us. We need to add some fields here though. Here is the way User index should look like:

...
"user": {
    "object_type": "user",
    "mapping": {
        "id": {"type": "string", "index": "not_analyzed"},
        "first_name": {"type": "string", "field_name": "name.given"},
        "last_name": {"type": "string", "field_name": "name.family"},
        "email": {"type": "string", "index": "not_analyzed"}
    }
}
...

As you can see, we have added "first_name" and "last_name" fields to the User index declaration.

Without writing a line of code we have just supported search for Users on name and for Posts on description.

Let’s see what we can do with custom index handlers.

Custom Index

Let’s add the following text to the config, "elastic"/"indices" section:

...
"post_user": {
    "mapping": {
        "id": {"type": "string", "index": "not_analyzed"},
        "combo": {"type": "string"}
    }
}
...

We need to implement the handler for our new index. Let’s start with adding a file "post_user.js" to the "extra" folder of the sync service:

"use strict";

const config = require("../components").config;
const elastic = require("../components").elasticSearch;

const indexName = "post_user";

function isUser(event) {
    let obj = event.current || event.previous;
    return obj.object_type === "user";
}

function onPost(event) {
    return elastic.index({
        index: indexName,
        type: indexName,
        id: event.object,
        body: {
            id: event.current.id,
            combo: isUser(event) ?
                `${event.current.name.given} ${event.current.name.family}` :
                event.current.description
        }
    });
}

function onPut(event) {
    let combo;
    if (isUser(event)) {
        if (event.current.name.given !== event.previous.name.given ||
            event.current.name.family !== event.previous.name.family) {
            combo = `${event.current.name.given} ${event.current.name.family}`;
        }
    } else if (event.current.description !== event.previous.description) {
        combo = event.current.description;
    }
    if (combo) {
        return elastic.update({
            index: indexName,
            type: indexName,
            id: event.object,
            body: {combo}
        });
    }
    return Promise.resolve();
}

function onDelete(event) {
    return elastic.delete({
        index: indexName,
        type: indexName,
        id: event.object
    });
}

module.exports = {
    onPost,
    onPut,
    onDelete
};

This code reacts to events related to User and Post and stores data in a combined ES index.

Last thing we need to do is let the system know about our new handler, please add these lines to the "extra/index.js", somewhere in handleRegistry map:

...
"POST user": require("./post_user").onPost,
"PUT user": require("./post_user").onPut,
"DELETE user": require("./post_user").onDelete,
"POST post": require("./post_user").onPost,
"PUT post": require("./post_user").onPut,
"DELETE post": require("./post_user").onDelete
...

Restart sync. We are all set!

We can now do searches for users, posts and for both as follows:

In the next post we will add several logic rules.

Adding Logic

There are situations when some actions need to be taken in response to a particular change in data. Change can be initiated by a user request, by some service working on a schedule or by any other source. In case actions that need to be done deal with data modifications (not sending notifications which we deal with in pusher) we implement such actions (called “rules”) in logic service.

We will add rules for the following situations:

  1. When a new User/follows edge is created logic should create a corresponding edge User/followers.
  2. When a User/follows edge is removed logic should remove corresponding User/followers edge.
  3. When a new User/posts edge is created logic should create User/timeline edge for each User that follows the author of this Post (e.g. Users from User/followers edge). Here we have an example of “fan-out on write” approach.
  4. When a User/posts edge is removed logic should remove User/timeline edge for all followers of this User.
  5. When a new Post/offended edge is created logic needs to find all users in the system that have AdminRole and create a new edge AdminRole/offending_posts. It should check if an edge was already created before creation though - we may have multiple users offended by a single Post.

In our system documentation we will have all rules listed in a table in Logic Rules section.

Let’s start with adding files with rules’ implementations and then proceed with configuring the registry.

Rule 1, 2

Please create file logic/extra/user_follows.js with the following content:

"use strict";

const clientData = require("../components").clientData;

// rule #1
function onCreate(event) {
    return clientData.createEdge(event.edge.dst, "followers", event.edge.src);
}

// rule #2
function onDelete(event) {
    return clientData.deleteEdge(event.edge.dst, "followers", event.edge.src);
}

module.exports = {
    onCreate,
    onDelete
};

Please note the use of clientData. clientData module provides easy and convenient way to interface with client-data service.

Rule 3, 4

File name: logic/extra/user_posts.js, content:

"use strict";

const clientData = require("../components").clientData;

// Rule #3
function onCreate(event) {
    return clientData.forEachPage(
        last => clientData.getEdges(event.edge.src, "followers", {after: last}),
        followers => Promise.all(followers.results.map(follower =>
            clientData.createEdge(follower.id, "timeline", event.edge.dst)
        ))
    );
}

// Rule #4
function onDelete(event) {
    return clientData.forEachPage(
        last => clientData.getEdges(event.edge.src, "followers", {after: last}),
        followers => Promise.all(followers.results.map(follower =>
            clientData.createEdge(follower.id, "timeline", event.edge.dst)
        ))
    );
}

module.exports = {
    onCreate,
    onDelete
};

In rules 3 and 4 we have an example of paginating over pages of an edge with clientData.forEachPage.

Rule 5

This rule is a bit more interesting. We need to get a list of all admins in the system. Graph API has no way to get a list of objects that are not connected to some edge.

Let’s define a new automatic ES index with sync and use search endpoint for this.

Add the following text to the /opt/sync/config.json:

...
"admin_role": {
    "object_type": "admin_role",
    "mapping": {
        "id": {"type": "string", "index": "not_analyzed"}
    }
}
...

Restart sync - we are ready for getting admins.

Now we need to implement the rule, file name: logic/extra/post_offended.js, content:

"use strict";

const clientData = require("../components").clientData;
const searcher = require("../components").searcher;
const log = require("../components").logger;

// Rule #5
function onCreate(event) {
    return clientData.getGraphConfig().then(config =>
        clientData.forEachPage(
            last => searcher.search({object: "admin_role", count: config.pagination.max_count, after: last}),
            admins => Promise.all(admins.results.map(admin =>
                clientData.createEdge(admin, "offending_posts", event.edge.src)
                    .catch(err => {
                        log.error(err);
                    })
                ))
        )
    );
}

module.exports = {
    onCreate
};

Please note here the way searcher module is used. searcher provides a way to interact with ElasticSearch without the need to work with ES driver.

We have prepared logic rule handlers, now we need to let logic know about them. Add the following lines in logic/extra/index.js file, in the handleRegistry map:

...
    "POST user/follows": require("./user_follows").onCreate,
    "DELETE user/follows": require("./user_follows").onDelete,
    "POST user/posts": require("./user_posts").onCreate,
    "DELETE user/posts": require("./user_posts").onDelete,
    "POST post/offended": require("./post_offended").onCreate
...

Restart logic service.

Even though EGF2 can not provide implementation for custom business rules for all systems and domains it does offer structure and a clear path for adding such rules. It suggests where implementation should live, how it should be connected to the rest of the system and it gives a recommendation on the way rules should be documented.

In the next post we will look into web app creation.

EigenGraph Framework V2 (EGF2) Modes

EGF2 can be deployed in three different modes:

  1. Big Data - this mode will include Spark and either HDFS or S3 persistence layer in addition to a data store. Suitable for Big Data applications
  2. Scalable - this mode is suitable for applications with any load level but without ML and Big Data processing requirements.
  3. Small mode uses RethinkDB changes feed as a queue solution and thus micro services are not horizontally scalable. The cheapest option, suitable for small apps with little load and for development purposes

Each of the modes illustrated below with more details.

Big Data Mode

  1. Client-data processes GET requests as follows:
    • GET object request will use caching solution and in case object is not cached it will use data store solution
    • GET edge request will retrieve edge data from data store and resolve objects using cache and, in some instances, data store
  2. For modifying requests client-data updates the cache
  3. For modifying requests client-data sends events to the queue solution
  4. For modifying requests client-data stores data in the data store
  5. Spark streaming is listening for changes from queue
  6. Spark streaming updates master data store in S3
  7. Spark streaming updates ES
  8. Micro services listen to the queue and react to events
  9. Micro services use client-data to store / retrieve data
  10. Batch jobs will send updates to client-data
  11. Batch jobs will operate on data from the master data set

Scalable Mode

  1. Client-data processes GET requests as follows:
    • GET object request will use cache and in case object is not cached it will use data store
    • GET edge request will retrieve edge data from data store and resolve objects using cache and, in some instances, data store
  2. For modifying requests client-data updates the cache
  3. For modifying requests client-data stores data in the data store
  4. For modifying requests client-data sends events to the queue solution
  5. Micro services listen to the queue
  6. Micro services work with client-data to retrieve / modify data
  7. Sync service updates ES based on events coming from the queue solution
  8. Client-api uses ES to search for data

Small Mode

  1. Client-api works with client-data to retrieve / modify data
  2. Client-data uses RethinkDB to store / get data
  3. Micro services work with client-data to store / get data
  4. Micro services listen to RethinkDB changes feed, process changes with events
  5. Sync service updates ES
  6. Client-api uses ES to search for data

Third Party Tools

Spark Streaming

Code will have to be prepared that will read event stream from a queue solution and update master datastore as well as all batch views incrementally. Within batch in Spark Streaming we may want to attempt at restoring proper event ordering based on event timestamp.

Spark Batch

This layer will contain all batch jobs we may have, including but not limited to:

Cache

Caching solution is utilized by client-data.

Data Store

We currently support RethinkDB, need to support Cassandra.

RethinkDB Data Structures

Objects table holds all objects documents. Each document contains "id", "object_type", "created_at", "modified_at" and "deleted_at" system fields. RethinkDB document example:

{
    id: "<UUID with object code suffix>",
    object_type: "<string, object type>",
    created_at: "<date>",
    modified_at: "<date>",
    ...
}

Edges table has the next structure:

{
    id: "<source object ID>_<edge name>_<destination object ID>",
    src: "<source object ID>",
    edge_name: "<string, edge name>",
    dst: "<destination object ID>",
    sort_by: "date, string date created",
    created_at: "<date>"
}

edge_sorting index contains [src, edge_name, sort_by, dst].

Event table contains events for objects and edges.

{
    id: "<timebased UUID with event object suffix>"
    // either object or edge fields must be filled
    object: "<optional, object id associated with the event>",
    edge: {
        src: "<string, source object id>",
        dst: "<string, destination object id>",
        name: "<string, edge name>"
    }, // optional, edge identificator associated with the event
    method: "<string, "POST", "PUT" or "DELETE" value>",
    previous: {
        <optional, object or edge body for previous state>
    },
    current: {
        <optional, object or edge body for current state>
    },
    created_at: "<date>"
}

Unique table contains field’s values. There is only id field with -- record.

Cassandra Data Structures

Storing Objects with Cassandra

CREATE TABLE objects (
    id text,
    type text,
    fields map<text, text>,
    PRIMARY KEY (type, id)
);

<<<<<<< HEAD

Unique table

CREATE TABLE unique (
        id text,
        PRIMARY KEY (id)
);

Storing Edges with Cassandra

Edges are stored in the following table:

CREATE TABLE edges (
    src text,
    name text,
    dst text,
    sort_value text,
    PRIMARY KEY ((name, src), sort_value, dst)
)
WITH CLUSTERING ORDER BY (sort_value DESC);
CREATE INDEX ON edges (dst);

Notes:

  1. "sort_value" contains edge creation date, is sorted in DESC order
  2. We don’t need an index on "sort_value" because we have it in the clustering fields

Storing Events with Cassandra

CREATE TABLE events (
    id text,
    object_type text,
    method text,
    object text,
    edge: {
    src: text,
    name: text,
    dst: text
    },
    current map<text, text>,
    previous map<text, text>,
    created_at bigint,
    PRIMARY KEY (id)
);

Unique table

This table stores values for model fields declared with “unique” = true

CREATE TABLE unique (
        id text,
        PRIMARY KEY (id)
);

Pagination

Graph API and other endpoints may return a set of results. In this case pagination will be provided with the help of the following query parameters:

Endpoints that return paginated results will use the following JSON wrapper:

{
        "results": [],
        "first": "<index or object ID of the first entry, string>",
        "last": "<index or object ID of the last entry, string>", // this field will not be populated in case this page is the last one
        "count": <total number of objects, int>
}

Micro Services

We will list all services that are present in the framework. For each service we will show public and internal endpoints, config parameters and extension points. Not all of the services will have something in all of the subsections. By “extension points” we mean an intended way of extending framework functionality. For example, all logic rules will be provided by clients using extension point. Internal endpoints are to be used by services, they are not exposed to the Internet, public endpoints are exposed to the Internet.

All services might be configured with config.json file, which provides default config settings. Another way is to use environment variables (env vars). Env vars are stored in .env file in the root of the service and overlap the parameters from config.json file. Please, pay attention to the fact that env vars have precedence over config.json. Also, please note env vars prefix “egf_” which allows to isolate env vars of the particular service from variables of deployment environment.

client-data

This service will serve as a focal point for data exchange in the system. As all events in the system will have to go through client-data it should be fast, simple and horizontally scalable. It should be able to support all graph operations from either cache or data store.

Internal Endpoints

GET /v1/graph - return graph config in JSON.

GET /v1/graph/<object_id> - to get an object by ID, returns object JSON.

GET /v1/graph/<object_id1>,<object_id2>, … - to get several objects at once, returns JSON structured in the same way as paginated result in graph API.

To create a new object use POST /v1/graph with JSON body of the new object. Newly created object will be returned as a response, along with "id" and "created_at" fields set automatically. Object IDs are generated in the format: "<UUID>-<object type code>".

To update an object use PUT /v1/graph/<object_id> with JSON body containing modified fields. In order to remove a field use "delete_fields": ["<field_name1>", "<field_name2>", ...] field in the JSON body. Updated object will be returned.

To delete an object use DELETE /v1/graph/<object_id>. Objects are not deleted from the data store, "deleted_at" field is added instead. Returns the following JSON: {"deleted_at": "<date and time, RFC3339>"}.

In order to get objects that listed on an edge use GET /v1/graph/<src_object_id>/<edge_name> with pagination parameters. Extension is not supported by client-data service but is supported by the “client-data” commons utility.

In order to check for an existence of an edge use GET /v1/graph/<src_object_id>/<edge_name>/<dst_object_id>. Service will return an object if the edge exists. 404 will be returned otherwise.

To create a new edge use POST /v1/graph/<src_object_id>/<edge_name>/<dst_object_id>. Returns JSON: {"created_at": "<date and time of creation for this edge>"}.

To delete an edge use DELETE /v1/graph/<src_object_id>/<edge_name>/<dst_object_id>. Returns the following JSON: {"deleted_at": "<date and time of deletion, RFC3339>"}.

Public Endpoints

None

Config

{
    "port": 8000,  // port that service will listen to
    "log_level": "debug | info | warning",
    "storage": "rethinkdb | cassandra" // storage system, currently supported are RethinkDB and Cassandra
    "cassandra": { // Cassandra config, should be present in case "storage" is equal to "cassandra". For more info on supported parameters please see Cassandra driver config specification. Parameters specified here are passed to the driver as is, without modifications.
        "contactPoints": ["localhost"],
        "keyspace": "eigengraph"
    },
    "rethinkdb": { // RethinkDB config, should be present in case "storage" is equal to "rethinkdb". For more info on supported parameters please see RethinkDBDash driver docs. Parameters specified here are passed to the driver as is, without modifications.
        "host": "localhost",
        "db": "eigengraph"
    },
    "queue": "rethinkdb", // For single instance only! For scalable solutions use kafka.
    "graph": { // graph config section, contains info on all objects, edges, ACLs and validations in the system
        "custom_schemas": {
            "address": {
                    "use": { "type": "string", "enum": ["official", "shipping", "billing", "additional"], "default": "additional" },
                    "label": { "type": "string" },
                    "lines": { "type": "array:string", "min": "1" },
                    "city": { "type": "string" },
                    "state": { "type": "string" },
                    "country": { "type": "string" }
                },
                "human_name": {
                    "use": { "type": "string", "enum": ["usual", "official", "temp", "nickname", "anonymous", "old", "maiden"], "default": "official" },
                    "family": { "type": "string", "min": 1, "max": 512, "required": true },
                    "given": { "type": "string", "min": 1, "max": 512, "required": true },
                    "middle": { "type": "string", "min": 1, "max": 64 }`
                }
        },
        "common_fields": {
                "object_type": { "type": "string", "required": true },
                "created_at": { "type": "date", "required": true },
                "modified_at": { "type": "date", "required": true },
                "deleted_at": { "type": "date" }
        },
        "system_user": {
            "code": "01", // object type code
            "suppress_event": true, // in case "no_event" is present and set to true client-data will not produce any events related to this object
            "back_end_only": true,
            "validations": { // for more info on validations please see Validations subsection
            }
        },
        "session": {
            "code": "02",
            "suppress_event": true,
            "back_end_only": true
            },
            "user": {
                "code": "03",
                "edges": { // edges allowed for this object type
                        "roles": { // here "roles" is edge name
                            "contains": [""] // array of strings with object type names that are allowed to be added to this edge
                        }
                },
            "fields": {
                "name": {
                    "type": "struct",
                    "schema": "human_name",
                    "required": true,`
                    "edit_mode": "E", // this field can take values: "NC | NE | E", NC means that this field can not be set at creation time, NE means that this field can not be changed, E means field can be edited
                    "default": "",
                    "auto_value": ""
                }
            }
            },
            "event": {"code": "04"},
            "job": {"code": "05"},
            "file": {"code": "06"},
            "schedule": {"code": "07", "back_end_only": true},
            "schedule_event": {"code": "08", "back_end_only": true},
            "objects": {
                "secret_organization": "<object ID for the SecretOrganization object, string>"
            }

    }
}

By default the next env vars are provided:

egf_port = 8000
egf_log_level = info
egf_storage = rethinkdb
egf_cassandra = { "contactPoints": ["localhost"], "keyspace": "eigengraph"}
egf_rethinkdb = { "host": "localhost", "db": "eigengraph" }
egf_queue = rethinkdb
egf_kafka = { "hosts": ["localhost:9092"], "client-id": "client-data", "topic": "events" }

Please note that for brevity sake we are not showing the full client-data configuration here, see it on GitHub.

Object declaration can contain “volatile” boolean field. In case it is set to true objects of this type will be physically removed from a DB upon deletion. Otherwise DELETE requests mark objects with “deleted_at”, objects are not physically removed from the DB.

Field Validations

Object field validations can be added to field declarations inside of an object declaration as follows:

"fields": {
    "name": { "type": "struct", "schema": "human_name", "required": true }
}

Another sample is for a hypothetical Account object:

"fields": {
    "user": { "type": "object_id:user" },
    "balance": { "type": "string", "required": true },
    "credit": { "type": "number", "default": 0 },
    "tax_collected": { "type": "number", "default": 0 }
}

Field validations are provided in the “custom_schemas”, “common_fields” fields inside of “graph” section. “custom_schemas” section contains validations for common reusable data structures. “common_fields” section provides validations for fields that can be present in all object types. Validations for particular objects are stored in field declarations.

Validation specification for a particular field can contain the following fields:

  1. "type" - required, can be one of the following strings: “string”, “number”, “integer”, “date”, “boolean”, “object_id”, “struct” and “array”.
  2. "validator" - optional, custom validation handler name.
  3. "required" - optional boolean field, defaults to false.
  4. "default" - optional field that contains default value for the field.
  5. "enum" - optional array of strings. If present only values from this array can be assigned to the field.
  6. "min" and "max" - optional fields that can be present for “string”, “array” and “number” typed fields. For “string” these fields restrict string length. For “array” these fields restrict array size, for “number” - min and max values that can be stored.
  7. "schema" - required for fields with type “struct” or “array:struct”. Contains nested field validation specification for the struct. Can also contain a name of one of declared custom schemas (section “custom_schemas”).
  8. "edit_mode" - can take values “NC”, “NE”, “E”. “NC” means field can not be set at creation time. “NE” means that field can not be changed. “E” means that the object is editable. In case “edit_mode” is not present it means that the field can’t be set at creation and also is not editable by users.
  9. "object_types" - array of strings, required in case “type” is equal to “object_id”
  10. "auto_value" - string, can take values: "req.user", "src.<field_name>". "req.user" will set the field with User object corresponding to the currently authenticated user. "src.<field_name>" is handy when an object is created on edge. client-api will take value of a field "<field_name>" and set this value into the field with "auto_value".

For fields with type “object_id” field "object_types" should be provided. In case this field can hold any object type please use "object_types": ["any"].

Fields with type “array” should have additional declaration for the type of entities that can be stored within this array. It should be specified in "schema" field.

ACL

ACL related fields can be added on object definition and edge definition levels in “graph” section.

"some_object": {
    "GET": "<optional, comma separated ACL rules for GET method>",
    "POST": "<optional, comma separated ACL rules for POST method>",
    "PUT": "<optional, comma separated ACL rules for PUT method>",
    "DELETE": "<optional, comma separated ACL rules for DELETE method>",
    "edges": {
        "<edge name>": {
            "GET": "<optional, comma separated ACL rules for GET method>",
            "POST": "<optional, comma separated ACL rules for POST /v1/graph/<src>/<edge_name> with new destination object body>",
            "LINK": "<optional, comma separated ACL rules for POST /v1/graph/<src>/<edge_name>/<dst> (create edge for existing objects)>",
            "DELETE": "<optional, comma separated ACL rules for DELETE method>"
        },
        ...
    }
}

Supported ACL rules:

Extension Points

Custom validation handler

In case built-in validations are not sufficient client can implement custom validation logic for any field of any object. For example, email field may require custom validation.

In order to add custom validation please do the following:

  1. Implement a module that will perform custom validation inside of “controllers/validation/” folder. This module should implement one or several functions with signature: function (val) {}` where val is the value of the field. Function should return true in case validation succeeded and false otherwise. Function (or functions, in case single module implements multiple validators) should be exported from the module.
  2. Add a record of the form "<type_name>": require("./<module_file_name>").<optional, function name in case module exports multiple functions> to the validationRegistry map in "validation/index.js".
  3. In client-data config, “graph” section, specify <type_name> in field validation declaration in "validation" field.

Example of a custom validator for "email" field:

function CheckEmail(val) {
    let re = new RegExp("^\S+@\S+$");
    return re.test(val);
}

Corresponding validationRegistry map will be:

const validationRegistry = {
    email: require("./check_email")
}

And object validations will look like:

"validations": {
    "user_email": { "type": "email" }
}

client-api

Internal Endpoints

None

Public Endpoints

Object Operations

In order to get an object use GET <gateway URL>/v1/graph/<object ID>. I will omit <gateway URL> going forward for brevity sake.

To create an object use POST /v1/graph with JSON body containing object fields. As a result full object JSON will be returned, including field "id".

To change object use PUT /v1/graph/<object ID> with JSON body containing only the fields that need to be modified. Unchanged fields can be sent as well but it is not required. In order to remove a field from an object completely you need to send "delete_fields": ["field names", ...] field inside of the JSON. We don’t support removing fields from within nested structures at the moment.

To delete an object use DELETE /v1/graph/<object ID>.

Edge Operations

To verify existence of an edge please use GET /v1/graph/<src object ID>/<edge name>/<dst object ID>. This request will return edge’s target object in case such an edge exists. 404 will be returned otherwise.

To get a list of objects that belong to an edge use GET /v1/graph/<src object ID>/<edge name>. The list can be paginated with "after", and "count" query fields. Request will return JSON of the following format:

{
    "results": [],
    "first": <index or object ID of the first entry, int>,
    "last": <index or object ID of the last entry, int>,
    "count": <total number of objects, int>
}

Please note that "first" and "last" field values can vary depending on the DB that is used for a deployment. For Cassandra these fields will contain object IDs, for RethinkDB they will contain indices of objects within a collection. We recommend to not try interpreting "first" and "last" field values, just use "last" as a parameter for "after" in case next page should be retrieved. In case the returned page is the last one in the collection field “last” will be omitted.

EGF2 supports expansion on edges for:

Nested expand up to 4 levels can be used as follows: GET /v1/graph/<Post object ID>?expand=user{roles{favorites}},comments. {} is used to go one level down in expansion for the field on the left of curly braces.

To create a new edge please use POST /v1/graph/<src object ID>/<edge name>/<dst object ID> in case target object already exists, or use POST /v1/graph/<src object ID>/<edge name> with JSON body that contains fields to be used for the new target object. In this case target object and edge will be created simultaneously.

To delete an edge use DELETE /v1/graph/<src object ID>/<edge name>/<dst object ID>.

Search Endpoint

client-api provides a GET /v1/search endpoint for searching. The following parameters are supported:

Results are returned in paginated format, the same as results for getting edges. Full expansion functionality is supported for search requests as well, the same as for regular graph API requests.

ACL rules are applied to the search results in the same way as they are applied to the results of graph API get edge requests.

Miscellaneous

[TO BE IMPLEMENTED IN V0.2.0]

Mobile client applications can use app version endpoint by calling GET /v1/version?app_id=&version=. Server will respond with JSON:

{ “latest”: “”, // latest available version of the app “update_mode”: “now | period | recommended”, “force_update_at”: “” }

This endpoint can be called by client apps once per day. Received JSON should be interpreted as follows: In case “latest” string is equal to the current installed version of the app fields “update_mode” and “force_update_at” will not be present. No action is required In case “update_mode” is present and equal to “now” the app should display a message to the user. When message is read the app should quit In case “update_mode” is equal to “period” the app should display a message informing the user that the app should be updated before “force_update_at” date. After message is presented the app should continue as usual. In case “update_mode” is “recommended” the app should display a message and them continue as usual.

client-api should use “graph/objects” section of client-data config in order to obtain references to MobileClient objects. These objects should be used in order to answer client requests.

Config

{
   "port": 2019,
   "log_level": “debug | info | warning”,
   "auth": "http://localhost:2016",
   "client-data": "<URL pointing to client-data service>"
   "elastic": { // ElasticSearch parameters. Passed to the ElasticSearch driver without modifications.
        "hosts": ["localhost:9200"]
   }
}

Default env vars for this service:

egf_port = 2019
egf_auth = "http://localhost:2016"
egf_client-data = "http://localhost:8000"
egf_elastic = { "hosts": ["localhost:9200"] }

Extension Points

Custom ACL rule

ACL handling covers basic needs with regards to restricting access to objects and edges. In case more elaborate control is needed custom ACL handler can be implemented.

In order to create a new custom ACL handler please do the following:

  1. Implement a module that will handle the request inside of "acl/extra/" folder. This module should implement one or several functions with signature: function (user, id, body) {} that will perform the processing of the request. Function should return a promise with the result. Function (or functions, in case single module implements multiple handlers) should be exported from the module.
  2. Add a record of the form "<custom ACL name>": require("./<module_file_name>").<optional, function name in case module exports multiple functions> to the rules map in "acl/rules.js".

Example of a custom ACL handler:

"use stict";

const clientData = required("./client-data");

function reviewOwner(user, id) {
    return clientData.getObject(id).then(review => user === review.from)
}

module.exports = reviewOwner;

Custom endpoint handler

There can be situations when simple processing of a request to the graph API is not sufficient and this extended processing should be performed synchronously within request (not asynchronously in logic service). For example, client-api should set some fields in a newly created or modified object that are based on fields sent in by a client. Such cases can be accommodated for by creating a custom endpoint handler.

Custom endpoint handlers can be assigned to any object type, any edge and any HTTP method. In order to create a new custom handler please do the following:

  1. Implement a module that will handle the request inside of "controllers/extra/" folder. This module should implement one or several functions with signature: function (req) {} that will perform the processing of the request. Function should return a promise with the result. Function (or functions, in case single module implements multiple handlers) should be exported from the module.
  2. Add a record of the form "<method, e.g. POST, GET, PUT, DELETE> <object type or edge>": require("./<module_file_name>").<optional, function name in case module exports multiple functions> to the handleRegistry map in "controllers/extra/index.js".

Example of a custom handler module that creates and removes a "followers" edge for creation and removal of "follows" edge:

"use strict";

const clientData = require("../clientData");

function Follow(req) {
    return clientData.createEdge(req.params.src, req.params.edge_name, req.params.dst)
        .then(follow =>
            clientData.createEdge(req.params.dst, "followers", req.params.src).then(() => follow)
        );
}
module.exports.Follow = Follow;

function Unfollow(req) {
    return clientData.deleteEdge(req.params.src, req.params.edge_name, req.params.dst)
    .then(deleted =>
        clientData.deleteEdge(req.params.dst, "followers", req.params.src).then(() => deleted)
    );
}
module.exports.Unfollow = Unfollow;

sync

In small and scalable modes this service is responsible for updating ES according to the changes happening in the system.

Internal Endpoints

None

Public Endpoints

None

Config

{
    "log_level": "debug | info | warning",
    "elastic":{
        "hosts": ["localhost:9200"],
        "settings": { // ES settings, used as is with ES, no changes made
            "filter": {
                    "autocomplete": {
                        "type": "edgeNGram",
                        "min_gram": 2,
                            "max_gram": 30,
                        "token_chars": ["letter", "digit", "symbol", "punctuation"]
                    }
                },
            "analyzer": {
                    "autocomplete_index": {
                        "type": "custom",
                            "tokenizer": "standard",
                        "filter": ["autocomplete", "lowercase"]
                    },
                    "autocomplete_search": {
                        "type": "custom",
                            "tokenizer": "standard",
                            "filter": "lowercase"
                    }
                }
            "indices": {
                "file": { // index name, to be used with search endpoint
                    "settings": {} // ES settings local for this index
                    "object_type": "file",
                    "index": "file", // ES index name
                    "mapping": {
                        "id": {"type": "string", "index": "not_analyzed"},
                        "standalone": {"type": "boolean"},
                        "created_at": {"type": "date"}
                    }
                },
            "schedule": { // index name, to be used with search endpoint
                "settings": {} // ES settings local for this index
                "object_type": "schedule",
                "index": "schedule", // ES index name
                "mapping": {
                    "id": {"type": "string", "index": "not_analyzed"}
                }
            }
        }
    },
    "client-data": "<URL pointing to client-data service>",
    "queue": "kafka | rethinkdb",
    "consumer-group": "sync",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "eigengraph",
        "table": "events",
        "offsettable": "event_offset"
    },
    "kafka": {} // TODO Kafka parameters
}

Default env vars:

egf_log_level = info
egf_client-data = "http://localhost:8000"
egf_queue = rethinkdb
egf_rethinkdb = { "host": "localhost", "port": "28015", "db": "eigengraph", "table": "events", "offsettable": "event_offset" }
egf_kafka = { "hosts": ["localhost:9092"], "client-id": "sync", "topic": "events" }

“elastic” section contains info on ES indexes and global ES settings. It is possible to specify ES settings on the index level using “settings” field. We take “settings” content without modifications and apply settings to ES.

sync supports automatic and custom index processing, for the brevity sake we will use “automatic index” and “custom index” terms to identify type of processing going forward.

Adding custom indexes is described in Extension Points section below.

sync will react to events related to an object specified in "object_type" field for automatic indexes (this field is ignored for custom indexes). Automatic handler presumes that object field names correspond directly to field names in ES index. I.e File.created_at field is mapped into "created_at" field in ES index. It is possible to override this by providing "field_name" parameter in field declaration. This feature is useful to support nested structures, for, example: "state": {"type": "string", "index": "not_analyzed", "field_name": "address.state"} will populate ES index field “state” using data from "address.state" nested object field.

Extension Points

Custom Index Handler

There can be situations when automatic ES index handling is not sufficient. For example, if client wants to build an index for an edge, which is currently not supported by automatic handling.

In order to create a new custom index handler please do the following:

  1. Implement a module that will handle events related to the index inside of "extra/" folder (in sync service). This module should implement one or several functions with signature: function (event) {} that will perform the processing of the event. Function should return a promise with operation result. Function (or functions, in case single module implements multiple handlers) should be exported from the module.
  2. Add a record of the form "<method, e.g. POST, GET, PUT, DELETE> <object type or edge>": require("./<module_file_name>").<optional, function name in case module exports multiple functions> to the handleRegistry map in "extra/index.js". Handlers from a single module will usually be specified for several records related to the events of interest for the custom index.

Example of a custom index handler that performs some pre-processing for an organization title:

function processOrgTitle(title) {
    // some custom processing
}

function onPostOrganization(event) {
    var doc = {
        title: processOrgTitle(event.current.title.toLowerCase())
    };

    return elastic.client.index({
        index: elastic.indices.organization.index,
        type: elastic.indices.organization.type,
        id: event.object,
        body: doc
    });
}

logic

This service contains business logic rules that can and should be executed in asynchronous mode.

Internal Endpoints

None

Public Endpoints

None

Config

{
    "log_level": "debug | info | warning",
    "client-data": "<URL pointing to client-data service>",
    "scheduler": "<URL pointing to scheduler service>", // we presume that logic may have some recurrent tasks that need scheduling
    "queue": "kafka | rethinkdb",
    "consumer-group": "logic",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "eigengraph",
        "table": "events",
        "offsettable": "event_offset"
    },
    "kafka": {} // TODO Kafka parameters
}

Default env vars:

egf_log_level = info
egf_client-data = "http://127.0.0.1:8000/"
egf_queue = rethinkdb
egf_rethinkdb = { "host": "localhost", "port": "28015", "db": "eigengraph", "table": "events", "offsettable": "event_offset" }
egf_kafka = { "hosts": ["localhost:9092"], "client-id": "logic", "topic": "events" }

Extension Points

Logic rules are implemented as handlers in logic service. In order to implement a rule:

  1. Implement a module that will handle events related to the rule inside of "extra/" folder (in logic service). This module should implement one or several functions with signature: function (event) {} that will perform the processing of the event. Function should return a promise with operation result. Function (or functions, in case single module implements multiple handlers) should be exported from the module.
  2. Add a record of the form "<method, e.g. POST, GET, PUT, DELETE> <object type or edge>": require("./<module_file_name>").<optional, function name in case module exports multiple functions> to the handleRegistry map in "extra/index.js". Handlers from a single module will usually be specified for several records related to the events of interest for the custom index.

job

This service will execute long and/or heavy tasks asynchronously. It will listen to the system event queue and react to creation of Job objects. Number of simultaneous jobs that can be handled is specified in config option "max_concurrent_jobs".

Internal Endpoints

None

Public Endpoints

None

Config

{
    "log_level": "debug | info | warning",
    "max_concurrent_jobs": 5,
    "client-data": "<URL pointing to client-data service>",
    "queue": "kafka | rethinkdb",
    "consumer-group": "job",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "eigengraph",
        "table": "events",
        "offsettable": "event_offset"
    },
    "kafka": {} // TODO Kafka parameters
}

Extension Points

Jobs are implemented as handlers in logic service. In order to implement a rule:

  1. Implement a module that will handle a job inside of "extra/" folder (in job service). This module should implement one or several functions with signature: function (jobCode) {} that will perform the job. Function should return a promise with operation result. Function (or functions, in case single module implements multiple handlers) should be exported from the module.
  2. Add a record of the form "<job code>”: require("./<module_file_name>").<optional, function name in case module exports multiple functions> to the handleRegistry map in "extra/index.js". Handlers from a single module will usually be specified for several records related to the events of interest for the custom index.

pusher

Is responsible for reacting to events by sending notifications using various mechanisms, be it WebSockets, emails, SMS, native mobile push notifications etc.

[TO BE SUPPORTED IN V0.2.0]

Client applications can subscribe for notifications based on: Object ID Particular edge, represented as source object ID / edge name pair

In order to subscribe for notifications please send the following JSON via WebSockets connection:

{
        subscribe”: [
                { object_id: <string> },
                { edge:
                        {
                        source: <string>,
                                name: <string>
                        }
                }               
        ]           
}

Subscription that is mentioned in the message will be added to the list of subscriptions for this client.

It is also possible to cancel subscription for an object or an edge:

{
    unsubscribe”: [
            { object_id: <string> },
            { edge:
                    {
                    source: <string>,
                        name: <string>
                    }
            }
    ]
}

Note: In case connection is dropped all subscriptions are lost. When connection is restored client should renew subscriptions.

Internal Endpoints

To send out an email please use POST /v1/internal/send_email with JSON:

{
    "template": "<template ID, string>",
    "to": "email address, string",
    "from": "email address, string",
    "params": {
        // template parameters
    }
}

We currently only support sending emails via SendGrid. SecretOrganization object is used to store SendGrid API token as follows: "sendgrid_api_key": "<key value>".

Public Endpoints

pusher exposes a WebSocket endpoint /v1/listen. Connection to the endpoint has to be authorized using Authorization header with bearer token, e.g. "Authorization Bearer <token>". All connections are pooled within the service and are available for custom event handlers. We use Primus and Primus Rooms to allow a user to be connected from multiple devices.

Config

{
    "port": 2017,
    "web_socket_port": 2000,
    "email_transport": "sendgrid",
    "log_level": "debug | info | warning",
    "client-data": "<URL pointing to client-data service>",
    "auth": "<URL pointing to auth service>",
    "template_host": "host": "<URL to the host to be used in templates>",
    "ignored_domains": ["<domain name>"], // list of domains for which emails will not be sent, for debug purposes
    "queue": "kafka | rethinkdb",
    "consumer-group": "pusher",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "eigengraph",
        "table": "events",
        "offsettable": "event_offset"
    },
    "kafka": {} // TODO Kafka parameters
}

Default env vars:

egf_port = 2017
egf_web_socket_port = 2000
egf_email_transport = sendgrid
egf_log_level = info
egf_client-data = "http://localhost:8000"
egf_auth = "http://127.0.0.1:2016",
egf_queue = rethinkdb
egf_rethinkdb = { "db": "eigengraph", "table": "events", "offsettable": "event_offset" }
egf_kafka = { "hosts": ["localhost: 9092"], "client-id": "pusher", "topic": "events" }

Extension Points

Email Template

Email templates can be added as follows:

  1. Add MJML template file to the "config/templates/mjml" folder. Template variables can be added in the template using double curly brackets: {{<parameter_name>}}.
  2. Add template description to the "config/templates/config.json" file. Specify a path to the MJML template using “template” property. Email subject should be specified using "subject" field. Use "params" field to list template parameters.

"sendEmail" function from "controller/email" module in pusher should be used to send templated emails.

Custom Event Handler

In order to implement a handler:

  1. Implement a module that will handle events related to the handler inside of "extra/" folder (in pusher service). This module should implement one or several functions with signature: function (event) {} that will perform the processing of the event. Processing usually means sending out a notification using some supported transport. Function should return a promise with operation result. Function (or functions, in case single module implements multiple handlers) should be exported from the module.
  2. Add a record of the form "<method, e.g. POST, GET, PUT, DELETE> <object type or edge>": require("./<module_file_name>").<optional, function name in case module exports multiple functions> to the handleRegistry map in "extra/index.js". Handlers from a single module will usually be specified for several records related to the events of interest for the custom index.

auth

Internal Endpoints

Services will be able to get session info using GET /v1/internal/session?token=<string, token value>. Auth server will respond with Session object in case it can find one, 404 otherwise.

Public Endpoints

This service is responsible for user related features, e.g. login, logout, password reset, email verification, registration, etc.

“auth” service will handle requests listed below.

To register POST /v1/register the following JSON:

{
    "first_name": "<string>",
    "last_name": "<string>",
    "email": "<string>",
    "date_of_birth": "<string>",
    "password": "<string>"
}

Service will return {"token": "<string>"} JSON in case of success. In other words, user will be authenticated as a result of successful registration.

Server side should check if an email is taken yet. Upon registration the following will happen:

To verify email GET /v1/verify_email?token=<secret verification token>, service will respond with 200 in case of success.

Login using POST /v1/login with the following JSON:

{
    "email": "<email string>",
    "password": "<string>"
}

Server will return {"token": "<string>"} that will be used to access API.

Logout using POST /v1/logout with token specified as usual when accessing protected APIs. Server will respond with 200 in case logout was successful.

To restore password use GET /v1/forgot_password?email="<string>". Server will send out an email with password reset instructions and respond with 200 in case operation was successful. To reset password POST /v1/reset_password with JSON:

{
    "reset_token": "<string>",
    "new_password": "<string>"
}

Service will respond with 200 in case of success.

In order to change password POST /v1/change_password with JSON:

{
    "old_password": "<string>", // not required in case User.no_password = true
    "new_password": "<string>"
}

This call should be authorized (accompanied with a token). Service will respond with 200 in case operation was successful.

To access protected endpoints either:

  1. Specify "token"="<string, token value>" in query parameters
  2. Add Authorization: Bearer <string, token value> to request headers

In order to resend email with user’s email address verification POST /v1/resend_email_verification. User has to be logged in in order to use this endpoint. Verification email will be sent to User.email address. Service will respond with 200 in case of success.

Config

{
    "port": 2016,
    "session_lifetime": 86400,
    "log_level": "debug | info | warning",
    "pusher": "http://localhost:2017",
    "client-data": "<URL pointing to client-data service>",
    "email_from": "<email from which notifications should be sent>",
    "elastic": {
        "hosts": ["localhost:9200"]
    }
}

Default env vars:

egf_port = 2016
egf_session_lifetime = 86400,
egf_log_level = "debug"
egf_client-data = "http://127.0.0.1:8000/"
egf_pusher = "http://localhost:2017"
egf_email_from = ""
egf_elastic =  { "hosts": ["localhost:9200"] }

Extension Points

None

file

Service responsible for file management features, e.g. file upload, download, removal, etc

Service creates a new S3 bucket once per month to store uploaded files. Buckets have LIST operation disabled. File names are formed using UUID.

File uploads are partitioned in S3 as follows:

  1. There is a root bucket for uploads (configurable)
  2. A new bucket is created each week inside the root bucket, uploads are stored in this bucket (e.g. egf-uploads/2015-23 where 23 is the number of the week within a year)

This partitioning allows us to split all uploads more or less evenly without the need to create huge amount of buckets.

Internal Endpoints

I think we have internal

Public Endpoints

file server will handle requests listed below.

In order to upload a file to EGF S3 first call GET /v1/new_image or GET /v1/new_file passing "mime_type", "title" and "kind" in query. Request will return File object JSON with a link that should be used for uploading image in File.upload_url field.

Client / server interactions:

  1. Client sends GET request
  2. Server creates new File object, sets "mime_type" and "title". It prepares a short lived S3 link for file uploading and sets it to File.upload_url and also sets File.url link.
  3. Server sets File.resizes based on "kind" parameter
  4. Client uploads file using File.upload_url link
  5. Client sets File.uploaded = true and updates the object using regular graph PUT.
  6. Server listens to the File.uploaded = true event and
    • Sets File.url to a permanent S3 GET link, saves the File object
    • Schedules resizing jobs for all File.resizes entries. Jobs will upload resizing results to S3 and update File object

Config

{
    "standalone_ttl": 24, // time to live for stand alone files, in hours; when elapsed and file was not connected to any other object such file will be deleted
    "port": 2018,
    "auth": "<URL to the auth service>",
    "client-data": "<URL pointing to client-data service>",
    "s3_bucket": "test_bucket",
    "queue": "kafka | rethinkdb",
    "consumer-group": "pusher",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "eigengraph",
        "table": "events",
        "offsettable": "event_offset"
    },
    "kafka": {}, // TODO Kafka parameters
    "elastic": { // ElasticSearch parameters. Passed to the ElasticSearch driver without modifications.
            "hosts": ["localhost:9200"]
        },
    "kinds": {
            "avatar": [
                {"height": 200, "width": 200},
                {"height": 300, "width": 400}
            ],
            "image": [
                {"height": 200, "width": 200},
                {"height": 300, "width": 400}
            ]
    }
}

Default env vars:

egf_port = 2018
egf_auth = "http://127.0.0.1:2016"
egf_client-data = "http://127.0.0.1:8000/"
egf_queue = rethinkdb
egf_rethinkdb = { "host": "localhost", "port": "28015", "db": "eigengraph", "table": "events", "offsettable": "event_offset" }
egf_kafka = { "hosts": ["localhost:9092"], "client-id": "file", "topic": "events" }, "elastic": { "hosts": ["localhost:9200"] }
egf_elastic = { "hosts": ["localhost:9200"] }

Extension Points

None

scheduler

This service creates ScheduleEvent objects as specified by Schedule objects. Services can create ScheduleEvent objects and listen to the system object queue to get notified when scheduled actions should be performed.

Internal Endpoints

None

Public Endpoint

None

Config

{
    "log_level": "debug | info | warning",
    "client-data": "<URL pointing to client-data service>",
    "elastic": { // ElasticSearch parameters. Passed to the ElasticSearch driver without modifications.
            "hosts": ["localhost:9200"]
    }
    "queue": "kafka | rethinkdb",
    "consumer-group": "scheduler",
    "rethinkdb": {
        "host": "localhost",
        "port": "28015",
        "db": "eigengraph",
        "table": "events",
        "offsettable": "event_offset"
    },
    "kafka": {} // TODO Kafka parameters
}

Extension Points

None

Utilities

EGF2 provides a set of convenient utilities to be used by service developers. The list is below:

  1. "auth", "client-data" and "pusher" libraries make working with corresponding services easier
  2. "event-consumer" hides details of working with the system event queue
  3. "option" is a simple library that allows a service to be able to obtain config from a URL specified in “-config” start option.
  4. "search" library facilitates work with ElasticSearch by providing an interface that is consistent with API search endpoint.

Utilities reside in “commons” repository.

Graph API

EGF will provide a graph based API (similar to Facebook) to facilitate web and mobile clients:

  1. GET, POST, PUT, DELETE /v1/graph/<object ID> to work with objects
  2. GET /v1/graph/<object ID>/<edge name> to get paginated list of object’s edges
  3. POST /v1/graph/<object ID 1>/<edge name>/<object ID 2> to create new edge between existing objects
  4. POST /v1/graph/<object ID 1>/<edge name> JSON body of target object to create new target object and an edge in one request
  5. DELETE /v1/graph/<object ID 1>/<edge name>/<object ID 2> to delete an edge
  6. GET /v1/graph/me will return User object for currently authenticated user.
  7. GET /v1/graph/<object ID>/<edge name>/<object ID 2> to get object if edge exists.

Most client app needs will be covered by Graph API endpoint. In case a separate endpoint is required it will be documented explicitly (e.g. auth related endpoints).

When an object or an edge is requested via GET client apps can use “expand” option. Expand is a comma separated list of fields and edge names which has the next format: expand=<field1>{<nested field2>,<edge 2>},<edge1>(<count>){<field3>}.

System Objects

This part contains description of objects, relations between objects and reusable data structures (shown in italic) that are present in all EGF deployments. Objects will be named with bold font, reusable data structures will be named using italic. All objects will have "created_at" and "modified_at" fields set by server side automatically. All objects will have "id" field, not shown here for the brevity sake.

Fields that are not allowed to be changed via user requests will be marked with NE. In case a field can not be set at creation time it will be marked with NC. The same notation will be used for objects as well. Please note that this restriction is only related to user initiated requests. Server side will still be able to set fields and do other changes.

In case API is consumed with the help of EGF2 mobile client libraries the following keywords should not be used as object types, edge and field names:
“description”, “class” and all names that start with “new”, “abstract”, “assert”, “boolean”, “break”, “byte”, “case”, “catch”, “char”, “class”, “const”, “continue”, “default”, “do”, “double”, “else”, “enum”, “extends”, “final”, “finally”, “float”, “for”, “goto”, “if”, “implements”, “import”, “instanceof”, “int”, “interface”, “long”, “native”, “new”, “package”, “private”, “protected”, “public”, “return”, “short”, “static”, “strictfp”, “super”, “switch”, “synchronized”, “this”, “throw”, “throws”, “transient”, “try”, “void”, “volatile”, “while”, “repeat”.

KeyValue

{
    "key": "<string>",
    "value": "<string>"
}

Field validations:

Field Restrictions
“key” required, string
“value” required, string

HumanName

{
    "use": "usual | official | temp | nickname | anonymous | old | maiden",
    "family": "<string, surname>",
    "given": "<string, first name>",
    "prefix": "<string>",
    "suffix": "<string>",
    "middle": "<string>"
}

Field validations:

Field Restrictions
“use” optional, string, value set “usual | official | temp | nickname | anonymous | old | maiden”. Defaults to “official”.
“family” required, string, length: 1..512
“given” required, string, length: 1..512
“prefix” optional, string, length: 1..64
“suffix” optional, string, length: 1..64
“middle” optional, string, length: 1..64

Event

{
    "object_type": "event",
    "object": "<string, object ID>",
    "user": "<string>",
    "edge": {
        "src": "<string>",
        "dst": "<string>",
        "name": "<string>"
    },
    "method": "<string, POST, PUT, DELETE>",
    "current": {current state of an object or edge},
    "previous": {previous state of an object or edge},
    "crearted_at": <number, unix timestamp>
}

Object Code: Z4

Field validations:

Field Restrictions
“object_type” required, string
“object” optional, either this or “edge” field should be present, object ID
“user” optional, User object ID
“edge” optional, either this or “object” field should be present, struct
“edge.src” required, object ID
“edge.dst” required, object ID
“edge.name” required, string, can be one of a set of values for edge names
“method” required, string, can be “POST”, “DELETE”, “PUT”
“previous” previous state of an object or edge
“current” current state of an object or edge

Job

{
    "object_type": "job", NE
    "user": User, NE
    "job_code": "<string>",
    "input": [KeyValue, ...],
    "output": [KeyValue, ...],
    "completed": Boolean,
    "read": Boolean,
    "delete_at": "<string>"
}

Object Code: Z5

Note: this object is used by “job” service and can be omitted in case “job” service is not deployed.

Field validations:

Field Restrictions
“object_type” required, string
“user” required, User object ID
“job_code” required, string
“input” optional, array of KeyValue
“output” optional, array of KeyValue
“completed” optional, Boolean, defaults to false
“read” optional, Boolean, defaults to false
“delete_at” optional, string, date RFC3339

SystemUser

{
        // TODO: list all system fields
    "service_ids": ["string formatted as <service prefix, e.g. fb, google>:<service user ID>", ]
}

Object Code: Z1

Session

{
    "user": "<string>",
    "token": "<string>",
    "expires_at": "<date>"
}

Object Code: Z2

User

{
    "object_type": "user",
    "name": HumanName,
    "email": "<string>",
    "system": "<string, SystemUser object ID>",
    "verified": Boolean
}

Object Code: Z3

Field Validations:

Field Restrictions
“object_type” required, string
“name” required, HumanName
“email” required, string, email
“system” optional, SystemUser object ID
“verified” optional, Boolean, defaults to false

Edges:

  1. roles - role objects should be named Role, for example CustomerRole, AdminRole.

Note: other fields can be added to this object as necessary.

File

{
    "object_type": "file", NE
    "mime_type": "<string, file type>", NE, NC
    "user": User, NE, NC
    "url": "<URL string>", NE, NC
    "upload_url": "<URL, string>", NE, NC
    "title": "<string>",
    "size": <file size in Kb>, NE, NC
    "dimensions": {
        "height": <int>,
        "width": <int>
    }, NE, NC
    "resizes": [{
        "url": "<URL string>",
        "dimensions": {
            "height": <int>,
            "width": <int>
        }
    }], NE, NC
    "hosted": Boolean, // true in case file is stored on MBP S3, NE, NC
    "uploaded": Boolean,
    "standalone": Boolean, NE, NC
}

Object Code: Z6

Field validations:

Field Restrictions
“object_type” required, string
“mime_type” required, string, we should have a list of file types we want to allow
“user” required, User object ID
“url” required, string, URL format
“upload_url” optional, string, URL format
“title” optional, string, length: 1..256
“size” required if File.hosted = true, int
“dimensions” Struct, required if File.hosted = true and File is an image
“dimensions.height” int, required
“dimensions.width” int, required
“resizes” optional, only available for image files
“hosted” optional, Boolean, defaults to false
“uploaded” optional, Boolean, defaults to false
“standalone” optional, Boolean, defaults to false

Note: This object is used by “file” service and can be omitted in case “file” service is not deployed.

Schedule

{
"object_type": "schedule", NE
"listener": "<string, service name>", NE
"schedule_code": "<string>",
"time": {"hour": <int>, "minute": <int>, "second": <int>}, NE
"date": {"year": <int>, "month": <int>, "day": <int>, "day_of_week": <int>}, NE
"repeat": "daily | weekly | monthly | yearly", NE
}

Object Code: Z7

Field validations:

Field Restrictions
“object_type” required, string
“listener” required, string
“schedule_code” required, string
“time” required, struct
“date” optional, struct
“repeat” optional, string, value set

Note: This object is used by “scheduler” service and can be omitted in case “scheduler” service is not deployed.

ScheduleEvent

{
    "object_type": "schedule_event", NE
    "schedule": Schedule, NE
    "schedule_code": "<string>",
    "listener": "<string, service name>", NE
}

Object Code: Z8

Field validations:

Field Restrictions
“object_type” required, string
“schedule” required, Schedule object ID
“schedule_code” required, string
“listener” required, string

Note: This object is used by “scheduler” service and can be omitted in case “scheduler” service is not deployed.

SecretOrganization

{
    "object_type": "secret_organization",
    "secret_keys": [{"key": "<string>", "value": "<string>"}, ...]
}

Object Code: Z9

Field validations:

Field Restrictions
“object_type” required, string
“secret_keys” optional, array of struct

[TO BE IMPLEMENTED IN V0.2.0]

MobileClient

{
        object_type”: mobile_client,
        client_id”: <string>,
        versions”: [
                        {
                                index: <int>,
                                version: <string>,
                                update_mode: now | period | recommended,
                                force_update_at: <string, RFC3339>
                        }                       
        ]               
}

Object Code: ZA

Field validations:

Field Restrictions
“object_type” required, string
“client_id” required, string
“versions” optional, array of struct
“versions.version” required, string
“versions.index" required, int
“versions.update_mode” required, string, value set
“versions.force_update_at” optional, string, RFC3339

System ES Indexes

Index Name Subsystem Objects Fields
File file File field: id
filter: standalone
filter: created_at
Schedule scheduler Schedule field: id
User / auth auth User field: email
field: id

Caching

We plan to add Redis caching in the nearest future.

Queue Solution

Currently Apache Kafka and RethinkDB changes feeds are supported. For scalable deployments we strongly recommend using Apache Kafka as the second option is not horizontally scalable.

Suggested Documentation Format

Each point below represents a section in documentation:

  1. Model - list all object types in separate sections, similar to System Objects section here.
  2. Search - this section can contain a table with all supported ES indexes. The column can have columns:
    • Index Name
    • Fields
    • Index Type
    • Description
  3. Jobs - a table with all supported jobs, columns:
    • Task Code
    • Input Parameters
    • Output Parameters
    • Description
  4. Logic Rules - a table with a list of all rules implemented in logic service. Suggested columns:
    • Rule Number
    • Trigger
    • Action
  5. Notifications - this section can contain info on all notifications and notification templates supported in the system. It will contain a list of templates and a table with all notifications supported in the system. Suggested columns:
    • Notification Number
    • Notification Type
    • Trigger
    • Target
    • Description

In general, documentation will contain much more sections than suggested here, depending on the system specifics. At the same time, we believe that virtually any system built with EGF2 will benefit from having sections, suggested above.

Deployment

Config Management and Service Discovery

These two topics are pretty broad. There are different systems that are available in this area which can be utilized with EGF2 with some additional effort. Out of the box, config management and service discovery are organized as follows.

There is a single required parameter for each service to start - “config”. This parameter should hold a URL pointing to a reachable location with a JSON config file for the service. When service is started it downloads config file. In case config file is not reachable the service will die with an error message.

Config files for services can be stored in S3 bucket or any other convenient location reachable by services. Services are not checking whether config has changed or not. In order to apply changes in config to a set of services sysops person will have to restart a service or restart an instance that runs the service.

We strive to minimize inter service dependencies as a conscious architectural choice. There are some dependencies though, listed in the table below.

Service Talks To
client-api client-data, auth, Search Solution
client-data Data Storage, Queue Solution
auth client-data, pusher, Search Solution
sync Queue Solution, client-data, Search Solution
pusher Queue Solution, client-data
scheduler Queue Solution, client-data, Search Solution
logic Queue Solution, client-data
file Queue Solution, client-data, S3, Search Solution
job Queue Solution, client-data

In case a service should talk to another service (e.g. client-api talks to auth and almost all services talk to client-data) it will support an option in config file that will point to a URL for the required service. We don’t support multiple URLs pointing to a single service at the moment.

In order to make a particular service fault tolerant and scalable we recommend the following AWS based setup:

  1. Create an ELB for the service.
  2. Create ASG for the service and link it with the ELB.
  3. Create a Route53 DNS record for the ELB.
  4. Use Route53 DNS address to connect a service to another.

Advantages of the setup:

  1. ELB will load balance requests to the service.
  2. ASG will provide auto scaling for the service - no need for sysops intervention in case of load spikes.

Small Mode

This part will be described as step by step instructions how to deploy working infrastructure in AWS. So, if you have no AWS account, please create one.

Deploy Data storage

In this step RethinkDB and Elasticsearch clusters/instances should be created. In small deployment it is not necessary to have separate instances for RethinkDB and Elasticsearch. General purpose instance of small size is enough for testing and at least medium-sized instance should be used for real data.
Please consult original documentation of service/OS for proper configuration.

Deploy RethinkDB

For service configuration please consult original documentation.

Deploy single instance

This is the simplest option. No sharding, no replication and no fault-tolerance as a result. Enough for testing purposes.

Deploy cluster

As RethinkDB have no tools for cluster configuration all steps should be made manually. One of the instances should be chosen as "master" and the rest must be instructed to "join" it.

Deploy Elasticsearch

For service configuration please consult original documentation.
EC2-discovery using cloud-aws plugin is a preferable way for inter-cluster nodes discovery.

Deploy Subsystems

All subsystems can be deployed on the same instance. RAM memory of this instance should be more then 1Gb. In low load deployment it can be even micro instance with swap. Micro instance itself have only 1Gb of RAM and it is not enough to successfully install subsystem dependencies using "npm install" procedure. For subsystems configuration simplicity and to achieve fault tolerance smart proxy should be used for data requests forwarding. So called client-node in case of elasticsearch and rethinkdb-proxy in case of RethinkDB. Both of them should be deployed on the same instance with subsystems.

Deploy Single Entry Point

This step will finalize deployment process. In this example Nginx will be used. As Nginx will be used as simple request router it is possible to run it on the same instance with subsystems. Sample configuration can be found in commons repository.

Scalable Mode

TODO

Big Data Mode

TODO

Client Libraries

EGF2 provides a set of client libraries that simplify working with the provided API. Mobile libraries provide the following features:

  1. Graph based API methods
  2. Data caching
  3. Auth endpoints support
  4. Search endpoint support
  5. File operations support
  6. Model generation

iOS

Installation

  1. Clone the latest release of EGF2 from GitHub.
  2. Go to your Xcode project’s “General” settings. Drag EGF2.framework to the “Embedded Binaries” section. Make sure “Copy items if needed” is selected and click “Finish”.
  3. Create a new “Run Script Phase” in your app’s target’s “Build Phases” and paste the following snippet in the script text field: bash "${BUILT_PRODUCTS_DIR}/${FRAMEWORKS_FOLDER_PATH}/EGF2.framework/strip.sh".
  4. Go to your Xcode project’s “Capabilities” settings. Enable “Keychain Sharing”.
  5. For Objective-C project only. Go to your Xcode project’s “Build Settings”. Set “Always Embed Swift Standard Libraries” to “Yes”.

Model Generation

EGF2 model generator can be utilized to create model and other classes that simplify work with the EGF2 back-end.

Before generation please prepare a “settings.json” file in some folder with the following content:

{
    "name": "string, your project name, is used to name Core Data file, among other things",
    "server": "string, back-end server URL",
    "model_prefix": "string, prefix that will be used for your models",
    "excluded_models": ["model type name", ...] // array of strings listing models that should be omitted in generation
}

Get a "client-data/config.json" file from your repository and copy it to the same folder you have your "settings.json" file in. Copy EGF2GEN file to the same folder. Run EGF2GEN.

Model generator is capable of producing Objective C and Swift code. Import generated files into your project.

Error Handling

Almost all methods provided by EGF2 library end with the one of the following common blocks which return either a result of an operation or an error if something went wrong. You should always check if an error has happened to take an appropriate action.

ObjectBlock = (NSObject?, NSError?) -> Void
ObjectsBlock = ([NSObject]?, Int, NSError?) -> Void
Completion = (Any?, NSError?) -> Void

Classes and APIs

EGF2Graph

EGF2Graph is main class of EGF2 library. It provides methods for authentication and operations on graph objects.

EGF2Graph Properties

All properties of main EGF2Graph instance are being set during model generation. While there is no need to edit these properties it is still possible to do so, for example in case URL of the back-end has changed.

Property Description
var serverURL: URL URL of server, for example TODO
var maxPageSize: Int Max size of a page in pagination
var isObjectPaginationMode: Bool true if current pagination mode is object oriented (otherwise index oriented)
var idsWithModelTypes: [String : NSObject.Type] Contains objects’ suffixes with appropriate classes

EGF2Graph Auth Methods

Methods Description
func register(withFirstName firstName: String, lastName: String, email: String, dateOfBirth: Date, password: String, completion: @escaping Completion) Register a new user. As well as ‘login’ method ‘register’ returns auth token, so you don’t need to call ‘login’ after ‘register’.
func login(withEmail email: String, password: String, completion: @escaping Completion) Login a new user.
func logout(withCompletion completion: @escaping Completion) Logout. EGF2 library will clear all user data even if logout with back-end was not successful.
func change(oldPassword: String, withNewPassword newPassword: String, completion: @escaping Completion) Change password of logged in user.
func restorePassword(withEmail email: String, completion: @escaping Completion) Initiate a restoring process. Send a message with a secret token to the specified email.
func resetPassword(withToken token: String, newPassword: String, completion: @escaping Completion) Reset the password of logged in user. User must use the secret token which was sent before.
verifyEmail(withToken token: String, completion: @escaping Completion) Verify an email which was used while registering a new user.

EGF2Graph Notification Methods

Methods Description
func notificationObject(forSource source: String) -> Any Create a notification object to be used with NotificationCenter in order to listen for changes in the source object.
func notificationObject(forSource source: String, andEdge edge: String) -> Any Create a notification object to be used with NotificationCenter in order to listen for specified edge changes.

EGF2Graph Graph API Methods

Methods Description
func object(withId id: String, completion: ObjectBlock?) Get object with specific id. If the object has already been cached it will be loaded from cache otherwise it will be loaded from server.
func object(withId id: String, expand: [String], completion: ObjectBlock?) Get object with id and expansion. See more about expansion in Edge Operations section.
func refreshObject(withId id: String, completion: ObjectBlock?) Get object with id from server. In this case cache will not be used for retrieval but will be updated when data has arrived from the server side.
func refreshObject(withId id: String, expand: [String], completion: ObjectBlock?) Get object with id and expansion from server. Retrieved data with all expanded objects will be cached. See more about expansion in Edge Operations section.
func userObject(withCompletion completion: ObjectBlock?) Get User object for currently logged in user.
func createObject(withParameters parameters: [String : Any], completion: ObjectBlock?) Create a new object with specific parameters.
func updateObject(withId id: String, parameters: [String : Any], completion: ObjectBlock?) Update object with id according to values from parameters.
func updateObject(withId id: String, object: NSObject, completion: ObjectBlock?) Update object with specific id according to values from object.
func deleteObject(withId id: String, completion: Completion?) Delete an object with id.
func createObject(withParameters parameters: [String : Any], forSource source: String, onEdge edge: String, completion: ObjectBlock?) Create a new object with specific parameters on a specific edge.
func addObject(withId id: String, forSource source: String, toEdge edge: String, completion: @escaping Completion) Create an edge for an existing object with id.
func deleteObject(withId id: String, forSource source: String, fromEdge edge: String, completion: @escaping Completion) Delete an object with id from an edge.
func doesObject(withId id: String, existForSource source: String, onEdge edge: String, completion: @escaping (Bool, NSError?) -> Swift.Void) Check if an object with id exists on a specific edge.
func objects(forSource source: String, edge: String, completion: ObjectsBlock?)
func objects(forSource source: String, edge: String, after: String?, completion: ObjectsBlock?)
func objects(forSource source: String, edge: String, after: String?, expand: [String], completion: ObjectsBlock?)
func objects(forSource source: String, edge: String, after: String?, expand: [String], count: Int, completion: ObjectsBlock?)
Get edge objects. If edge data was cached it will be loaded from cache otherwise it will be loaded from server.
func refreshObjects(forSource source: String, edge: String, completion: ObjectsBlock?)
func refreshObjects(forSource source: String, edge: String, after: String?, completion: ObjectsBlock?)
func refreshObjects(forSource source: String, edge: String, after: String?, expand: [String], completion: ObjectsBlock?)
func refreshObjects(forSource source: String, edge: String, after: String?, expand: [String], count: Int, completion: ObjectsBlock?)
Get edge objects from back-end, bypassing the cache. Retrieved data will be cached.
func uploadFile(withData data: Data, title: String, mimeType: String, completion: @escaping ObjectBlock) Upload file data to server.
func uploadImage(withData data: Data, title: String, mimeType: String, kind: String, completion: @escaping ObjectBlock) Upload image data to server.
func search(forObject object: String, after: Int, count: Int, expand: [String]? = default, fields: [String]? = default, filters: [String : Any]? = default, range: [String : Any]? = default, sort: [String]? = default, query: String? = default, completion: @escaping ObjectsBlock)
func search(withParameters parameters: EGF2SearchParameters, after: Int, count: Int, completion: @escaping ObjectsBlock)
Search for specific objects according to parameters.

Expand Objects

EGF2 uses graph oriented approach to modeling data. In practice it means that data is represented as objects and edges, where edges are connections between objects. It is often convenient to be able to not only get an object or an edge from the back-end but get expanded portion of a graph. For example, when we get a list of favorite Posts for some user it is beneficial to get Post authors as well. We may also want to get Post comments in the same request. Expansion feature allows us to do this.

Relations between objects in EGF2 are modelled using two concepts:

  1. An edge is a list of objects related to this one
  2. An object property is a connection from this object to another

Every object property is modelled with two properties within library object, one holds string object ID, and another holds a reference to a connected library object.

For example:


class Post: GraphObject {
    var text: String?
    var image: String?      // id of image object
    var imageObject: File?  // imageObject is instance of image object
    var creator: String?        // id of user object
    var creatorObject: User?    // creatorObject is instance of user object
}

So if you want to get just a post object you use the code below:

Graph.object(withId: "<post id>") { (object, error) in
    guard let post = object as? Post else { return }
    // post.image contains id of image object
    // post.imageObject is nil
}

But if you also want to get an image object you use the code below:

Graph.object(withId: "<post id>", expand: ["image"]) { (object, error) in
    guard let post = object as? Post else { return }
    // now post.imageObject contains instance of image object
}

It is possible to expand several object properties of an object at once:

Graph.object(withId: "<post id>", expand: ["image","creator"]) { (object, error) in
    guard let post = object as? Post else { return }
    // post.imageObject contains image object
    // post.userObject contains user object who has created the post
}

You can also specify multi level expand:

Graph.object(withId: "<post id>", expand: ["image{user}"]) { (object, error) in
    guard let post = object as? Post else { return }
    // post.imageObject contains image object
    // post.imageObject.userObject contains user object who has uploaded the image
}

Edges are also expandabe. It’s useful when you need to get data from an edge in advance:

Graph.object(withId: "<post id>", expand: ["comments"]) { (object, error) in
    // If there are objects on comments edge they will be downloaded and cached
    // Later you can get them using ‘objects’ method
}

You can specify how many objects should be taken while expanding:

Graph.object(withId: "<post id>", expand: ["comments(5)"]) { (object, error) in
    // The first five comments will be taken from cache or downloaded from server
}

This feature also works for edges:

Graph.objects(forSource: "<user id>", edge: "posts", after:nil, expand:["image"]) { (objects, count, error) in
    guard let posts = objects as? [Post] else { return }
    // use posts
}

Using expand you can get all information you need in one request:

// You want to get all newest posts
// Also you want to know who created these posts (creator) and what image you should show
// Besides you want to show the last comment under each post
let expand = ["image","creator","comments(1)"]
Graph.objects(forSource: "<user id>", edge: "posts", after:nil, expand:expand) { (objects, count, error) in
    guard let posts = objects as? [Post] else { return }
    // use posts
}

Auxiliary NSObject methods

EGF2 iOS framework provides an extension for NSObject class which contains several useful methods for working with graph objects

Methods Description
func copyGraphObject() -> Self Make a copy of graph object.
func isEqual(graphObject: NSObject) -> Bool Checking if object is equal to another graph object.
func changesFrom(graphObject: NSObject) -> [String : Any]? Get a dictionary of changed fields in comparison with another object.
var idsWithModelTypes: [String : NSObject.Type] Contains objects’ suffixes with appropriate classes
Example
Graph.userObject { (object, error) in
    guard let user = object as? User else { return }
    self.currentUser = user
    print(self.currentUser?.name?.given) // Mark
}
let changedUser: User = currentUser.copyGraphObject() // a copy of current user object
changedUser.name?.given = "Tom"


if currentUser.isEqual(graphObject: changedUser) {
    print("Objects are equal")
}
else {
    print("Objects are different")
}
var dictionary = currentUser.changesFrom(graphObject: changedUser)!
print(dictionary)  // ["name": ["given": "Mark"]]
or
var dictionary = changedUser.changesFrom(graphObject: currentUser)!
print(dictionary)  // ["name": ["given": "Tom"]]

You can use objects and dictionary while working with graph objects:

Graph.updateObject(withId: "<user id>", parameters: dictionary) { (_, error) in
}
or
Graph.updateObject(withId: "<user id>", object: changedUser) { (_, error) in
}

Saving Data

By default all changes are being saved every time your app receives the following notifications:

  1. UIApplicationDidEnterBackground
  2. UIApplicationWillTerminate

So if your app suddenly crashes all unsaved changes will be lost.

Notification

EGF2 library posts the following notifications when appropriate actions happen:

Objective-C notification names Swift notification names
EGF2NotificationEdgeCreated EGF2EdgeCreated
EGF2NotificationEdgeRemoved EGF2EdgeRemoved
EGF2NotificationEdgeRefreshed EGF2EdgeRefreshed
EGF2NotificationEdgePageLoaded EGF2EdgePageLoaded
EGF2NotificationObjectCreated EGF2ObjectCreated
EGF2NotificationObjectUpdated EGF2ObjectUpdated
EGF2NotificationObjectDeleted EGF2ObjectDeleted
Notification Description
EdgeCreated A new edge was created
EdgeRemoved An existing edge was removed
EdgeRefreshed An edge was refreshed. All cached pages were dropped, the first page downloaded and cached.
EdgePageLoaded The next page for some specific edge has been loaded.
ObjectCreated The object has been created
ObjectUpdated The object has been updated
ObjectDeleted The object has been deleted

UserInfo object of each notification may contain different objects which can be obtained using the following keys:

Key name Description
EGF2EdgeInfoKey Name of edge
EGF2ObjectIdInfoKey Id of current object
EGF2ObjectInfoKey Current object
EGF2EdgeObjectIdInfoKey Id of object on edge
EGF2EdgeObjectsInfoKey Objects on edge
EGF2EdgeObjectsCountInfoKey Count of objects on edge
Example
// Get notification object for specific graph object
let object = Graph.notificationObject(forSource: id)

// Want to get notifications only for an appropriate graph object
// if object == nil then we will get notifications for all updated objects
NotificationCenter.default.addObserver(self, selector: #selector(didUpdateObject(notification:)), name: .EGF2ObjectUpdated, object: object)



func didUpdateObject(notification: NSNotification) {
    guard let s = notification.userInfo?[EGF2ObjectInfoKey] as? User else { return }
    // user - the updated graph object
}

Android

EGF2 client library for Android is implemented using Kotlin and can be used from Java and Kotlin code. Most of the samples in this documentation are given using Kotlin.

Realm is used for persistent caching purposes.

Installation

Please install and use Gradle to compile the framework:

compile “com.eigengraph.egf2:framework:

EGF2 model generator can be used to create models and other classes that simplify work with the EGF2 back-end. Please find below excerpts from build.gradle with necessary configuration options:

buildscript {
        repositories {
                ...
                maven {
                        url "https://dl.bintray.com/dmitry-shingarev/egf2-android-client"
                }
                ...
        }
        dependencies {
                ...
                classpath 'com.eigengraph.egf2:gradle-plugin:<latest_version>'
                …
        }
}
    apply plugin: 'egf2-generator'

Generation parameters can be set in build.gradle file.

EGF2 {
        url = "",   // back-end server URL, required
        urlPrefix = "", // back-end server prefix, e.g. 'v1/', required
        prefixForModels = "",   // prefix that will be used for your models, required
        source = file("${project.rootDir}/schema/config.json") // path for config.json, required
        modelForFile = "file" // model for the implementation of IEGF2File interface, required
        kinds = ["avatar", "image"] // list of image kinds supported by the back-end, for more info please see file service section, optional
        excludeModels = ["schedule"]     // models that should be omitted in generation, optional
}

Please note that Maven & Ant build systems are not supported.

Model generator creates classes for working with the backend, configuration class that implements an interface IEGF2Config, and a class which offers GsonFactory and implements IEGF2GsonFactory interface.

Custom Gson deserializers are created for generated model classes. It is possible to create your own custom deserializer. In order to do so please implement IEGF2GsonFactory interface.

Classes and APIs

EGF2

EGF2 is the main class of EGF2 library. It provides methods for authentication and operations on graph objects.

Initialization of the EGF2 library in Java:

EGF2.INSTANCE.builder(getApplicationContext())
.config(<IEGF2Config>) // implementation IEGF2Config interface, eg generated class, required
.gson(<IEGF2GsonFactory>)   // implementation IEGF2GsonFactory interface, eg generated class
.types(<IEGF2MapTypesFactory>)
.dbName(<String>) // the name of the database file cache
.dbKey(<ByteArray>) // encryption key of the database file cache
.version(<long>) // the version of the database file cache
.token(<String>) // token for authentication
.debug(<Boolean>) // debug mode
.build();

and in Kotlin:

EGF2.Builder(applicationContext)
.config(<IEGF2Config>) // implementation IEGF2Config interface, eg generated class, required
.gson(<IEGF2GsonFactory>)   // implementation IEGF2GsonFactory interface, eg generated class
types(<IEGF2MapTypesFactory>) //implementation IEGF2MapTypesFactory interface, eg generated class
.dbName(<String>) // the name of the database file cache
.dbKey(<ByteArray>) // encryption key of the database file cache
.version(<long>) // the version of the database file cache
.token(<String>) // token for authentication
.debug(<Boolean>) // debug mode
.build()
EGF2 Methods
Methods Description
fun register(body: RegisterModel): Observable Register a new user. As well as ‘login’ method ‘register’ returns auth token, so you don’t need to call ‘login after ‘register’.
fun login(body: LoginModel): Observable Log-in an existing user. Returns auth token.
fun verifyEmail(token: String): Observable Verify an email which was used while registering a new user.
fun forgotPassword(email: String): Observable Initiate a password restore process. Back-end sends a message with a secret token to the specified email.
fun resetPassword(body: ResetPasswordModel): Observable Reset the password of logged in user. User must use the secret token which was sent before.
fun changePassword(body: ChangePasswordModel): Observable Change password of logged in user.
fun resendEmailVerification(): Observable Prompts back-end to send another email with user’s email address verification.
fun logout(): Observable Logout. EGF2 library will clear token and cache even if logout with back-end was not successful.
fun getSelfUser(expand: String? = null, useCache: Boolean = true, clazz: Class): Observable Get User object for currently logged in user.
fun getObjectByID(id: String, expand: String? = null, useCache: Boolean = true, clazz: Class): Observable Get object with specific id.
fun getEdgeObjects(id: String, edge: String, after: EGF2Model?, count: Int, expand: String? = null, useCache: Boolean = true, class: Class): Observable> Get edge objects.
fun getEdgeObject(idSrc: String, edge: String, idDst: String, expand: String? = null, useCache: Boolean = true, clazz: Class): Observable Get edge object with specific id.
fun createObject(body: Any, clazz: Class): Observable Create a new object.
fun createObjectOnEdge(id: String, edge: String, body: Any, clazz: Class): Observable Create a new object on edge.
fun createEdge(idSrc: String, edge: String, obj: EGF2Model): Observable Create an edge for an existing object with id.
fun updateObject(id: String, body: Any, clazz: Class): Observable Update object.
fun deleteObject(id: String): Observable Delete an object with id.
fun deleteObjectFromEdge(idSrc: String, edge: String, obj: EGF2Model): Observable Delete an object with id from an edge.
fun search(q: String, `object`: String, fields: String, filters: String, sort: String, range: String, expand: String): Observable> Search for objects according to parameters.
fun uploadFile(file: String, mime: String, title: String, clazz: Class): Observable Upload file data to server.
fun uploadImage(file: String, mime: String, title: String, kind: String, clazz: Class): Observable Upload image data to server.
fun clearCache() Clear cache
fun compactCache() Compact cache - performs defragmentation on Realm DB file
fun compactCache() Compact cache - performs defragmentation on Realm DB file
fun isLoggedIn():Boolean Returns true if the user is authorized
EGF2 Methods
Property Description
DEF_COUNT:Int the default page size
MAX_COUNT:Int maximum page size
paginationMode: PAGINATION_MODE pagination mode, “index | object”

IEGF2Config

Methods Description
fun url(): String Back-end URL
fun urlPrefix(): Stringfun urlPrefix(): Stringfun urlPrefix(): String API URL prefix, e.g. “v1/”
fun defaultCount(): Int Default page size (edge objects and search pagination)
fun maxCount(): Int Max page size
fun paginationMode(): String Pagination mode

IEGF2GsonFactory

Methods Description
fun create(): Gson This method implementation should register custom deserializer that was generated by EGF2 model generation

IEGF2MapTypesFactory

Methods Description
fun create(): HashMap Implement this method in order to prepare a map from type name to Type objects.

IEGF2File

Methods Description
fun getUploadUrl(): String Get URL that should be used for file uploading
fun getId(): String Get file ID

EGF2Bus

EGF2Bus is publish/subscribe event bus

Notification Description
OBJECT_CREATED New object is created
OBJECT_UPDATED Existing object is updated
OBJECT_DELETED Object is deleted
OBJECT_LOADED An object was loaded from the back-end
EDGE_ADDED New edge was added
EDGE_REMOVED Edge was removed
EDGE_REFRESHED Edge data was refreshed, first page cached
EDGE_PAGE_LOADED New page of edge objects was loaded from back-end and cached
Methods Description
fun subscribeForObject(event: EVENT, id: String?, onNext: Action1): Subscription Listen for object events.
fun subscribeForEdge(event: EVENT, id: String, edgeName: String, onNext: Action1): Subscription Listen for edge events.
fun post(event: EVENT, id: String?, obj: EGF2Model?) Post an object event
fun post(event: EVENT, id: String, edgeName: String, edge: EGF2Edge?) Post an edge event
fun post(event: EVENT, id: String, edgeName: String, obj: EGF2Model) Post an event about edge object
ObjectEvent
Property Description
event: EVENT Event type
id: String? Event ID
obj: EGF2Model? Object reference
EdgeEvent
Property Description
event: EVENT Event type
id: String Event ID
edgeName: String Edge name
edge: EGF2Edge? Edge objects
obj: EGF2Model? Edge object

RegisterModel

Property Description
first_name: String First name
last_name: String Last name
email: String email address
date_of_birth: String Date of birth
password: String password

LoginModel

Property Description
email: String email
password: String password

ResetPasswordModel

Property Description
reset_token: String Reset token
new_password: String New password

ChangePasswordModel

Property Description
old_password: String Old password
new_password: String New password

EGF2Edge

Property Description
count:Int Number of objects on this edge
first: String Either object ID or index of the first object in this page (depends on pagination mode)
last: String Either object ID or index of the last object in this page
result: List Edge objects

EGF2Search

Property Description
count:Int Number of objects on this edge
first: String Either object ID or index of the first object in this page (depends on pagination mode)
last: String Either object ID or index of the last object in this page
result: List Edge objects

EGF2Model

Const Description
ME = “me” Reference to the authenticated self user
Methods Description
un getId(): String Get object ID
fun update(): JsonObject Update object
fun create(): JsonObject Create new object

Web

Coming soon, stay tuned!