CHAOSS SortingHat — Coding period

Week 1 and 2 — June 7th to June 20th, 2021

Over the last month, I have been working on CHAOSS’s SortingHat tool as part of Google Summer of Code. My project focuses on improving the data we capture around organizations in SortingHat (sidenote — the names of tools in this community are a Potterhead’s salvation), specifically around organizational hierarchies. A lot of my work in the past two weeks revolves around trying to extend the existing system to capture such information. Since Week 1 and 2 of the coding period has passed, I tried capturing the work I have done so far in this short article.

Week 1: Figuring out how to capture organizational hierarchy information

I spent a lot of time over the first week discussing implementation details with my mentors (shoutout to Santiago Dueñas, Miguel Ángel Fernández, Eva Millán, and Venu Vardhan Reddy 🔈)! Trying to capture hierarchical information in a relational database was a bit tricky and we spent a lot of time discussing how we could go about this in the most efficient way.

The problem

An organization could have any number of groups, and all of these groups can have their own set of subgroups. We need to capture all of this group information efficiently. We should try not to limit the depth of the hierarchies that organizations could have, so we needed a way to capture the hierarchies without losing out on performance.

Solution (v1.0)

We decided to have a model called Group, representing groups within the organization. Group objects would capture the name of the group and the organization it belongs to. Group objects also needed to store information about their subgroups in the table.

But how do we squeeze a hierarchy into MySQL?

To implement this effectively, we discussed using a library called django-treebeard. This library provides three different ways of storing hierarchies — materialized paths, adjacency lists, and nested sets. We picked materialized paths as it provided optimal reads/writes.

(If you want to check out how we arrived at this solution, take a look at https://github.com/chaoss/grimoirelab-sortinghat/issues/541)

When using materialized paths, every item in the table has a path field that is computed by the treebeard library during insertion. A parent group in an organization would have a path 1, its children have the path 1.1, 1.2, and so on.

The catch

There had been cases in the past where groups are not linked directly to organizations/companies. People in those groups usually worked for other companies and maybe they didn’t directly belong to the organization. So how do we capture such groups?

Solution (v2.0)

We could have regular groups linked to organizations but also allow for objects to be created without having to have a link to an organization.

Week 2: Coding and raising my first PR

With the implementation details all set, I started coding early last week. I was eager to get my hands on the code and stumbled over in a bunch of places, but eventually finished coding all of this up.

Although I had contributed small changes to some CHAOSS tools in the past, raising my first PR as part of GSoC felt daunting. I double-checked my commit messages and triple-checked if I had everything I needed as part of this PR and went through the guidelines again and again and again….

…. and finally clicked “Create a Pull Request”.

A few more changes remain as part of this PR and once this is done, I will move on to tackle my next milestone — writing the GraphQL layer for handling Groups!