Skip to content

Getting Started (Work In Progress)

The GraphBLAS is is sparse linear algebra library with three core objects, Matrices, Vectors and Scalars. OneSparse provides Postgres types that let you do sparse algebraic and graph operations on these objects. OneSparse is a binding to the powerful SuiteSparse:GraphBLAS library, which implements a Just-In-Time Compiler to target CPUs and CUDA GPUs without changing any of the algebraic code. The same code that can be written and run on laptops with small datasets, can be run on powerful multi-core GPU systems with no changes.

First we need some data. Run the command \i demo/fixtures.sql from a Docker Demo container to include the data used in this guide:

psql:demo/fixtures.sql:1: NOTICE:  extension "onesparse" already exists, skipping
psql:demo/fixtures.sql:2: NOTICE:  table "ostest" does not exist, skipping
psql:demo/fixtures.sql:21: NOTICE:  table "karate" does not exist, skipping
psql:demo/fixtures.sql:29: NOTICE:  table "mbeacxc" does not exist, skipping
psql:demo/fixtures.sql:41: NOTICE:  table "bcsstk01" does not exist, skipping
psql:demo/fixtures.sql:49: ERROR:  extra data after last expected column
CONTEXT:  COPY bcsstk01, line 1: "0 0  2.83226851852e+06"
psql:demo/fixtures.sql:51: NOTICE:  table "ash219" does not exist, skipping
psql:demo/fixtures.sql:62: NOTICE:  table "fs_183_1" does not exist, skipping
psql:demo/fixtures.sql:79: NOTICE:  table "test_graphs" does not exist, skipping
The examples below will use the Newman/Karate graph the demo test fixtures:

%3 1 1 2 2 1--2 3 3 1--3 4 4 1--4 5 5 1--5 6 6 1--6 7 7 1--7 8 8 1--8 9 9 1--9 11 11 1--11 12 12 1--12 13 13 1--13 14 14 1--14 18 18 1--18 20 20 1--20 22 22 1--22 32 32 1--32 2--3 2--4 2--8 2--14 2--18 2--20 2--22 31 31 2--31 3--4 3--8 3--9 3--14 10 10 3--10 28 28 3--28 29 29 3--29 33 33 3--33 4--8 4--13 4--14 5--7 5--11 6--7 6--11 17 17 6--17 7--17 9--31 9--33 34 34 9--34 14--34 20--34 32--33 32--34 31--33 31--34 10--34 28--34 29--32 29--34 33--34 15 15 15--33 15--34 16 16 16--33 16--34 19 19 19--33 19--34 21 21 21--33 21--34 23 23 23--33 23--34 24 24 24--28 24--33 24--34 26 26 24--26 30 30 24--30 26--32 30--33 30--34 25 25 25--32 25--28 25--26 27 27 27--34 27--30

BFS

The core operation of Graph Algorithms is Breadth First Search, or BFS. In the GraphBLAS, this pattern usually consists of vector matrix multiplication in a loop until some terminal condition is reached.

Min Parent BFS returns a sparse vector of nodes containing the minimum parent id:

CREATE OR REPLACE FUNCTION bfs(
        graph matrix,          -- This is the input graph
        start_node bigint      -- This is the node id to start the BFS
    )
    RETURNS vector             -- will return a vector of (node:minparent)
    LANGUAGE plpgsql AS
    $$
    DECLARE
    bfs_vector vector = vector('int64');  -- The result vector to accumulate ids
    prev_vector vector = vector('int64'); -- temp vector to detect termination
    BEGIN
        bfs_vector = set_element(bfs_vector, start_node, 1); -- set the start node value to 1
        WHILE bfs_vector != prev_vector LOOP  -- keep looping when the result changes
            prev_vector = dup(bfs_vector);    -- dup bfs and assign to previous
            bfs_vector = vxm(
                bfs_vector,          -- Starting from the current frontier
                graph,               -- traverse this graph.
                'any_secondi_int64', -- At any edge value, store the parent node id
                c=>bfs_vector,       -- assign back into bfs_vector
                accum=>'min_int64'   -- accumulate the minimum parent id for collisions
            );
        END LOOP;
    RETURN bfs_vector;
    end;
    $$;
Now run the function passing a graph and a starting point:

%3 1 0 2 0 1--2 3 0 1--3 4 0 1--4 5 0 1--5 6 0 1--6 7 0 1--7 9 0 1--9 14 0 1--14 20 0 1--20 32 0 1--32 8 8 1--8 11 11 1--11 12 12 1--12 13 13 1--13 18 18 1--18 22 22 1--22 2--3 2--4 2--14 2--20 31 0 2--31 2--8 2--18 2--22 3--4 3--9 10 0 3--10 3--14 28 0 3--28 29 0 3--29 33 0 3--33 3--8 4--14 4--8 4--13 5--7 5--11 6--7 6--11 17 17 6--17 7--17 9--31 9--33 34 34 9--34 10--34 14--34 15 0 15--33 15--34 16 0 16--33 16--34 19 0 19--33 19--34 20--34 21 0 21--33 21--34 23 0 23--33 23--34 24 0 26 0 24--26 24--28 30 0 24--30 24--33 24--34 25 0 25--26 25--28 25--32 26--32 27 0 27--30 27--34 28--34 29--32 29--34 30--33 30--34 31--33 31--34 32--33 32--34 33--34

SSSP

The Single Source Shortest Path algorithm is a varation on BFS with a different semiring called min_plus. Instead of taking the minium parent id, the semiring takes the minimum path length from a single source to every node. This implies adding path lengths instead of multiplying them as would happen in the common plus_times semiring. In this example we use one of OneSparse experimental operators @<+< to specify an in place matrix multiplication and accumulation with the min_plus semiring (@<+) and the minimum accumulator < combine to form @<+<.

CREATE OR REPLACE FUNCTION sssp(graph matrix, start_node bigint)
    RETURNS vector LANGUAGE plpgsql AS
    $$
    DECLARE
    sssp_vector vector = vector(type(graph));
    next_vector vector = vector(type(graph));
    BEGIN
        sssp_vector = set_element(sssp_vector, start_node, 1);
        WHILE sssp_vector != next_vector LOOP
            next_vector = dup(sssp_vector);
            sssp_vector = sssp_vector @<+< graph;
        END LOOP;
    RETURN sssp_vector;
    end;
    $$;
%3 1 1 2 2 1--2 3 2 1--3 4 2 1--4 5 2 1--5 6 2 1--6 7 2 1--7 9 2 1--9 14 2 1--14 20 2 1--20 32 2 1--32 8 8 1--8 11 11 1--11 12 12 1--12 13 13 1--13 18 18 1--18 22 22 1--22 2--3 2--4 2--14 2--20 31 3 2--31 2--8 2--18 2--22 3--4 3--9 10 3 3--10 3--14 28 3 3--28 29 3 3--29 33 3 3--33 3--8 4--14 4--8 4--13 5--7 5--11 6--7 6--11 17 17 6--17 7--17 9--31 9--33 34 34 9--34 10--34 14--34 15 4 15--33 15--34 16 4 16--33 16--34 19 4 19--33 19--34 20--34 21 4 21--33 21--34 23 4 23--33 23--34 24 4 26 3 24--26 24--28 30 4 24--30 24--33 24--34 25 3 25--26 25--28 25--32 26--32 27 4 27--30 27--34 28--34 29--32 29--34 30--33 30--34 31--33 31--34 32--33 32--34 33--34

Triangle Centrality

Based on the Algorithm from https://arxiv.org/abs/2105.00110

Triangle Centrality is a node ranking metric for finding nodes in the center of the most triangles. Triangles represent strongly connected nodes and their centrality can convey more importance than node centrality.

CREATE OR REPLACE FUNCTION tc(a matrix)
    RETURNS vector LANGUAGE plpgsql AS
    $$
    DECLARE
    m matrix;
    t matrix;
    y vector;
    k scalar;
    BEGIN
        m = tril(a, -1);
        t = mxm(a, a, 'plus_pair_int32', mask=>m, descr=>'st1');
        y = reduce_cols(t) |+ reduce_rows(t);
        k = reduce_scalar(y);
        RETURN 3 * ((a @ y) |- 2 * (one(t) @ y) |+ y) / k;
    END;
    $$;
%3 1 1.888889 2 0.733333 1--2 3 0.511111 1--3 4 -0.400000 1--4 5 -0.244444 1--5 6 -0.200000 1--6 7 -0.422222 1--7 9 0.155556 1--9 14 -0.666667 1--14 20 -0.311111 1--20 32 1.022222 1--32 8 8 1--8 11 11 1--11 12 12 1--12 13 13 1--13 18 18 1--18 22 22 1--22 2--3 2--4 2--14 2--20 31 0.844444 2--31 2--8 2--18 2--22 3--4 3--9 10 0.577778 3--10 3--14 28 0.533333 3--28 29 0.666667 3--29 33 -0.155556 3--33 3--8 4--14 4--8 4--13 5--7 5--11 6--7 6--11 17 17 6--17 7--17 9--31 9--33 34 34 9--34 10--34 14--34 15 0.644444 15--33 15--34 16 0.644444 16--33 16--34 19 0.644444 19--33 19--34 20--34 21 0.644444 21--33 21--34 23 0.644444 23--33 23--34 24 0.844444 26 0.155556 24--26 24--28 30 0.600000 24--30 24--33 24--34 25 0.133333 25--26 25--28 25--32 26--32 27 0.444444 27--30 27--34 28--34 29--32 29--34 30--33 30--34 31--33 31--34 32--33 32--34 33--34

Page Rank TODO