Skip to content

Cell budgets

To give cells autonomy to make decisions about the work being done, create and schedule some tasks independently, without approval, some of the budgeting is delegated to them. The idea is to ensure that someone close to the work can be taking more of the decisions regarding the internal budgets, someone who can have in mind all the details of the work. It keeps our efficiency high and allows a better use of our internal budgets.

For this to work, a cell needs a simple way to check that it remains sustainable while it takes these decisions. A cell is only sustainable when enough hours worked can be billed from clients, proportionally to the hours that aren't billable. Rather than requiring management approval for each new task or internal project budget, each cell ensures that it doesn't exceed a budget of 1h of internal/unbilled budget for every 2.5h the cell bills to clients.

Work is accounted per Jira project, no matter which cell member logged it. For example:

  • if time is logged on the ticket with 'SE' prefix, it's counted towards 'Serenity',
  • if time is logged for non-cell project (e.g. 'MNG'), it's counted only there.

Counting it this way allows cells to ask for help from other cells on the tickets without affecting sustainability of other cells. Thanks to this, members of the cell, which is low on billable hours, can find more work without decreasing its sustainability, by reviewing tasks from another cells. However counting it like this means that the owner of the task should agree to logging the time there by external members.

Task budgets are all-inclusive: As stated in Time logging best practices, all work should be logged. If two people meet for 1h over a task, they log 1h each. However, time logged in tasks does influence the epic's budget (and by implication the cell's too) even if it's not strictly binding. When creating tasks or requesting somebody's time, think about where that time will be logged and what budget will it affect. This, and warning before going over a task's estimate, will help everyone have a more effective and predictable workflow.

List of budgets managed by the cells

Hours from epics with a budget which a client has approved or recurring client time budgets are billable. The rest isn't - like internal projects, discussions or meetings aren't. Not all the internal/unbilled budgets are the responsibility of cells, the others are handled by management. The ones for which cells are responsible are:

  • Ocim/DevOps: improving our architecture, fixing ongoing issues, developing new features, etc.
    • Xavier still sets the priorities for the Ocim epics schedule and approves the scope of new developments, but the budgets themselves are to be handled by the cells taking the epics on.
  • Instances updates & upgrades: part of the DevOps maintenance work, it includes upgrading all Open edX instances to new stable versions when they are released, along with security updates to all servers.
    • The Open edX upgrade epics will alternate between the cells. Cells will need to take this into account in their annual epics/budgets planning, to include the budget for supervising one upgrade.
    • Note that this responsibility is only for the common parts - Ocim, self-hosted instances, etc. Everything proper to only one instance is the responsibility of the cell having ownership over it.
    • It is the responsibility of each individual cell to ensure that the upgrade work it is responsible for is done quickly after the release: preferably within 1-2 weeks of the release, and at most within 1-2 months.
  • Prospects: Cells would decide how much time to allocate to prospect work - the more prospect work, the more billed hours coming on later on, so this is a way to invest to increase future internal budgets and cell size.
  • Meetings: time spent in things like sprint planning or all hands.
  • Cell management: time spent fulfilling the cell management roles & discussing cell-related matters.
  • Contributions: Making contributions to the platform for which we aren’t being paid by clients.
  • Overflow: Time spent in work for clients that exceeds a fixed budget we agreed to for the completion of an epic, and which will thus not be paid by the client.
  • Conference: Time spent preparing and attending the conference.
  • Learning: Time spent learning things, outside of the context of specific projects (almost every task requires to learn something, that learning is part of this task's scope).

Cell budgets independence

The proportion of the time spent on individual budgets can vary per cell - that’s part of the decisions individual cells will be taking, with the decisions and monitoring of the budgets being coordinated by the person with the sustainability of the cell role.

They are each proper to each cell, and they aren’t influenced by the other cells budgets. No matter what happens, each cell has 1 hour available for its internal budgets for every 2.5h billable hours logged. To keep accounting and coordination work sane, budgets can't be exchanged between cells.

Firefighting and Cell Budgets

Firefighting tasks can pose a unique challenge when it comes to cell budgets since in most cases these are not tasks that can be delayed. In the case of firefighting tasks you can keep the following guidelines in mind:

  • The task is related to our infrastructure:
  • Is the task urgent? i.e. is an instance or multiple instances down? Is there risk of them going down any moment?
    • If so, whichever FF encounters it first should begin work immediately, irrespective of budget, cell or sustainability.
  • Is the task less urgent? i.e. certificate rotation in a few days
    • If so, discuss and pass on the task to whichever cell has better sustainability numbers.
  • The task is related to a client:
  • Often these tasks come from the client budget, so sustainability is not a concern.
  • Is the task urgent? i.e. client instance is down, or the client needs to immediately scale their instance, or has some other urgent request.
    • It should be completed by whichever FF encounters it first, irrespective of cell. This may be harder/impossible in case of more specialised clients like Yonkers or LX.
  • Is the task less urgent? i.e. the client requested a minor change, fix, etc. that needs to be done, came up in the current sprint and should be completed in the same?
    • It should be completed by the FF from the appropriate cell.
  • The task is neither related to a client nor infrastructure:
  • These are rare, but if such a task comes up it should go to whichever cell has the capacity to handle it. In case both do, then we can take sustainability into account.
  • e.g. One of our XBlocks needs a fix that is currently breaking on edx-platform:master, or a secret (password etc.) has been leaked and needs to be updated ASAP.

Note that FF tasks are often small, a few hours at most and they will not greatly impact sustainability when dealing with many hundreds or even over a thousand hours. Often, the more important thing is to begin work immediately.

Rolling budgets

To account for the fact that there are sometimes variations in needs, or in volume of work, budgets are accounted for on a rolling basis. If a cell uses less budget one month, it can spend it in the next months, and if the budget is exceeded one month, it is to be caught up on in the following months. It is recommended to build a bit of budget excess, to be able to weather surprises without having to make cuts every time.

The budgets for the next month will be recorded in the epic planning spreadsheet, since that directs the time assignation for individual sprints. Updating the spreadsheet to include the expected volumes is something the epic management role would do each sprint, and the sustainability role reviews and is responsible for.

Management

As always, Xavier or Braden can still make decisions on those topics, and these decisions would still take precedence over individual cells’ decisions. But like now for the rest of the decentralized responsibilities, ideally we’ll limit this to as few cases as possible - it’s simply a way to ensure we have a way to efficiently handle issues where self-management isn’t enough.