Cyberweekly #229 - Doing the hard work to make it simple

Oct 29, 2023

When I first joined GDS, there was a poster on the wall that said "Do the hard work to make it simple", and I loved the concept that this selection of smart people who I was part of was there to make life as simple and easy as possible for the average citizen.

There seems to be a natural predilection for many people to use complex language, define complicated solutions and to make things sound hard. You can assume that it's because it makes them seem so much smarter, or because they get incentivised for solving complex problems in some way, but in reality I don't think we really know why we do that. What I do know is that it's far more difficult to summarise a concept, problem or solution in a simple way than it is to expose all of the complexity.

It doesn't matter whether you are setting out a product strategy, discussing the risk of a given action, securing users data or describing a new technology, we love to expose the details, to wave the complexity in peoples face and use that to justify why the solution is worthwhile.

Doing the more complex work needed to produce something that sounds simpler, looks less complex, but enables everyone to engage with the thing in a simple and well defined way is so much more effort, but almost always pays dividends.

There are no shortcuts for this, because if there were it wouldn't be the hard work that makes things simple, but there are some interesting facets that I've noticed over time.

Firstly, users of systems rarely care about all of the bells and whistles. The iPhone was a success partly because Steve Jobs wanted to massively simplify the first solutions put in front of him.

Secondly, leaders need to have a radical focus on simplicity to drive their teams to achieve it. There's a story (which I can't find online now, I think I read in "Working Backwards" or "One Click") that when Amazon was first inventing the "one click purchasing", the team kept coming back with prototypes that required a confirmation click, or a credit card submission phase, and Jeff had to keep sending them away to work out how to make "it all work in a single click".

Thirdly and finally, humans tend to be story driven machines. As Julia points out in her recent talk, asking people why they do a thing will often result in a narrative or story around the decisions that they make. To sell simplicity to users you also need to sell the same story to them.

New talk: Making Hard Things Easy

One way I see people kind of trying to share terrible things that their computers have done to them is by sharing "best practices".
But I really love to hear the stories behind the best practices!
If someone has a strong opinion like "nobody should ever use bash", I want to hear about the story! What did bash do to you? I need to know.
The reason I prefer stories to best practices is if I know the story about how the bash hurt you, I can take that information and decide for myself how I want to proceed.
Maybe I feel like -- the computer did that to you? That's okay, I can deal with that problem, I don't mind.
Or I might instead feel like "oh no, I'm going to do the best practice you recommended, because I do not want that thing to happen to me".
These bash stories are a great example of that: my reaction to them is "okay, I'm going to keep using bash, I'll just use shellcheck and keep my bash scripts pretty simple". But other people see them and decide "wow, I never want to use bash for anything, that's awful, I hate it".
Different people have different reactions to the same stories and that's okay.

Love this from Julia, whose zines on how things work I’ve always loved.

I too love to hear the stories behind opinions. Part of that is that I am very narrative and story driven. I learn, teach and operate best when crafting a narrative becuase I can see a journey, a destination and a sense of achievement and accomplishment.

But also because I think stories bring the subjectivity back into some of what we do. It moves away from “Always do X” to “When I was in context Z, I did thing X and it solved problems A, B and C” which I find far more useful and reusable.

Working defensively

Another example is IT helpdesk teams. Again, overwhelmed with requests, a ticketing system is typically brought in to manage the flow of work. If you have a problem you first have to fill in a (usually) page-long form and wait for a response. The helpdesk staff mechanically work through the ever-moving backlog. These systems are not always terrible. I’ve been in a few places where I’ve had a basic IT problem, filled in a form, and had it resolved within the hour. My expectation has always been that it would take a few days, so in these cases I’ve made sure to tell the team how impressed I was. Unfortunately this experience isn’t common. Too often this wall of defence only solves the problem locally—life might be better for the team, but it doesn’t produce a tangibly better outcome for the customers. Requests and other interactions still take too long to resolve. No-one but the team is happier, and there is a stalemate. The customers aren’t happy and the team says they’re dealing with things as fast as they can. They may even have numbers to show… something.

Fix the system, don’t just just fix the problem in front of you. Why does the problem exist and what is causing it? Sometimes the solution to a problem can ossify the problem and the system itself, making it so that the systemic problem can never be fixed because that would break your solution.

Stories of reaching Staff-plus engineering roles - StaffEng | StaffEng

The transition into Staff Engineer, and its further evolutions like Principal Engineer, remains particularly challenging and undocumented. What are the skills you need to develop to reach Staff Engineer? What skills do you need to succeed after you've reached it? How do most folks reach this role? What can companies do to streamline the path to Staff Engineer? Will you enjoy being a Staff Engineer or toil for years for a role that doesn't suit you?
The StaffEng project aims to collect the stories of folks who are operating in Staff, Principal or Distinguished Engineer roles. How did you get there? What were your lucky breaks? How did you learn to be effective? As more of these stories are collected, I hope to build a dataset that helps folks draw their own map to Staff Engineer.

This is a nice project that is trying to document a set of jobs that didn’t really exist a decade ago.

As orgnaisations increasingly recognise that software development and technology form a core backbone of their business rather than something that can be outsourced, the roles of the internal software engineers have changed over time.

Of course, I love a story. I’m a firm believer that we learn well through narrative, and that we love to hear what worked, what didn’t work and what we should try in our own future, so this is something to watch with interest.

The Product Model at Spotify - Silicon Valley Product Group : Silicon Valley Product Group

Spotify’s leaders and product teams understood early in the journey that a better approach to discovering and delivering product was necessary.
Consequently, substantial investments were made to support the necessary experimentation and provide the product teams access to crucial user behavior data. This included the infrastructure for instrumentation, telemetry, monitoring and reporting. The company also invested heavily in deployment infrastructure, especially for A/B testing, with a dedicated platform product team focused on enabling these live-data tests.
Spotify was also an early advocate of small, frequent, uncoupled releases, and invested in the tools and techniques of continuous delivery.
Since Spotify’s skills in product delivery are fairly well known in the industry, we won’t spend much time on that here. However, it is critical to realize that these investments are what enable Spotify’s empowered product teams to deliver outcomes , and not just output.
This delivery infrastructure paved the way for Discover Weekly and countless other Spotify innovations, large and small.

This is a good overview of the product model at Spotify and how they developed the Discover Weekly tools.

What stood out to me is the amount of internal engineering investment that is needed for this. It’s not something that you see talked about as much, but in order to have high performing teams you need to ensure that they are provided with high performing tools

Optimizing for Taste

Opportunity Cost. This is an economics theory that you’re likely familiar with. If we spend time doing one thing, it means we are not doing another. This is a key principle we apply in all decision making, but is compounded when it comes to testing. If you’re testing something you’ve intentionally decided to add cost, complexity, and time to the initiative. To make matters worse, testing requires you to have a control in place, and practically speaking that means you avoid making multiple changes within the same sphere of influence (for the duration of the test) to maximize correctness. That means you’ve doubled down on the time spent doing one thing vs another.
Alternatively we simply make decisions based on the information at hand and measure the results. This means we unblock future development as quickly as we can, but it doesn’t mean we won’t act on the results. If the results perform negatively, we may need to try something else. When they do, you of course want to understand why, but fortunately there are still many ways to do post-analysis. Cohort analysis is particularly useful in this fashion. This is not to say cohorts are a superior technique, as they are generally going to be less accurate, but its usually enough for you to draw conclusions from, and if needed, you can bisect from there. Again, and most importantly, you’re not blocking additional development on the result of the test, which can often take a considerable amount of time, nor are you spending cycles waiting for information before moving on to the next project.

I found myself strongly agreeing with this. When I worked at Government Digital Service, I was always impressed by how much impact just a small amount of real world user testing made to developers capabilities. Watching real users use your system educates you as to how people interact with the things you write.

But that can be taken too far, and we can end up with queues of features that we don’t have capacity to test with real users. We need a healthy balance and as set out in this article, there’s a cost to doing A/B testing or real world user testing. We need to use it where it can make the most impact and rely on our development teams experience and judgement for the other calls

The Smart-Talk Trap

Most executives know what they should do when their companies get into trouble—when sales slip or customer satisfaction erodes or productivity and quality problems emerge. To plot a course, they can draw on their own experience and insight, their colleagues’ ideas, and the reams of data produced by sophisticated information systems. If that’s not enough, they can tap into the myriad resources that exist outside the walls of their own companies—the 1,700 business books and thousands of articles published every year, the legions of management consultants armed with the latest tools and concepts, the dozens of gurus making the rounds on the speaking circuit. In today’s business world, there is no shortage of know-how.
But all too often, even with all that knowledge floating around, nothing happens. There’s no doing . Yes, some companies are adept at translating ideas into action. A handful are even famous for it, such as General Electric, IDEO Product Development, and AES Corporation. But they are the exceptions. Most organizations have trouble bridging the knowing-doing gap. Brought to a standstill by inertia, their problems fester, their opportunities for growth are lost, and their best employees become frustrated and leave. If the inactivity continues, customers and investors react accordingly and take their money elsewhere.
[…]
That’s exactly what Xerox discovered in the 1980s, when the company’s executives decided that quality improvements were necessary to bring down costs and raise customer satisfaction. Over the next four years, employees at every level attended an almost endless series of meetings and off-site conferences to discuss the quality initiative. About 70,000 employees received six days of training each, and executives created a 92-page book of implementation guidelines.
But all that talking was just hot air. In 1989, a Harvard Business School case study on the project revealed that there had been very little change in the attitudes of Xerox’s managers toward quality. Few concrete decisions had been made to change the quality of the company’s products. Nor had beliefs and behaviors been altered. For instance, only 15% of Xerox employees said they believed that recognition and rewards were based on improvements in quality, and only 13% reported using cost-of-quality analyses in their decision making.

This is a great review of why senior executives value the smart-talker over the do-ers, and what we can do about it.

It talks about 5 things that senior executives who buck the trend do effectively:

Leaders who can do
Simple language and concepts
How questions, not just why questions
Close the loop / gather feedback
Learn from experience

It’s amazing how much of a difference these things can make, from the value of prototyping, to making the data around objectives more visible with OKR’s, to focusing on radically simple language and strategies.

Overchoice and How to Avoid it - by Gurwinder - The Prism

According to various polls , people estimate that they spend between 2.5 and 3 hours per day making trivial decisions, such as what to eat for dinner. That’s around 1000 hours, or 40 days, of dithering per year, and it doesn’t include the weightier decisions like where to live or who to marry.
Most of our everyday choices are between similar things; what movie to watch, what brand of toothpaste to buy. Fredkin’s paradox states that the more similar two choices seem, the less the decision should matter, yet the harder it is to choose between them. As a result, we often spend the most time on the decisions that matter least.
This is illustrated by Buridan’s ass, a mythical donkey that finds itself precisely equidistant from two identical bales of hay. The ass tries to make a rational decision as to whether to eat from the left bale or the right, but since there’s no rational reason to prefer either, the donkey wavers until it dies of hunger.
Buridan’s ass illustrates that there’s a cost to weighing options, which can exceed the cost of any of the options. Thus, the choices we make don’t need to be the best; they just have to be worth more than the time spent making them. If we spend less time making decisions, we can spend more time making whatever decision we made work.

Sometimes we just need to make a decision and move on. One of the things that I’ve learned over the years is that it’s rare that a decision cannot be changed later.

There’s two important gotchas that I think come out of this. The first is that we’re bad at recognising that. Decision fatigue and a sense of over importance on decisions can mean that we spend far too much time torturing ourselves over decisions, and that particularly when things feel tough, we want to be really sure that we’re making the right decision.

The corollary to this, if you take a more laissez-faire attitude, is that we can sometimes fail to actually commit to a decision. If you take my approach of “We can always change it later”, then you need to ensure that you still commit to taking the decision and raising the bar for changing your mind. Make decisions fast, but actually commit, take the decision and don’t spend every second questioning yourself, otherwise you aren’t getting the value from taking the decision.

Ragnar Locker ransomware deploys virtual machine to dodge security – Sophos News

A new ransomware attack method takes defense evasion to a new level—deploying as a full virtual machine on each targeted device to hide the ransomware from view. In a recently detected attack, Ragnar Locker ransomware was deployed inside an Oracle VirtualBox Windows XP virtual machine. The attack payload was a 122 MB installer with a 282 MB virtual image inside—all to conceal a 49 kB ransomware executable.
In the detected attack, the Ragnar Locker actors used a GPO task to execute Microsoft Installer (msiexec.exe), passing parameters to download and silently install a 122 MB crafted, unsigned MSI package from a remote web server. The primary contents of the MSI package were: * A working installation of an old Oracle VirtualBox hypervisor—actually, Sun xVM VirtualBox version 3.0.4 from August 5, 2009 (Oracle bought Sun Microsystems in 2010). * A virtual disk image file (VDI) named micro.vdi— an image of a stripped-down version of the Windows XP SP3 operating system, called MicroXP v0.82. The image includes the 49 kB Ragnar Locker ransomware executable.
The install.bat script then goes on to enumerate all local disks, connected removable drives and mapped network drives on the physical machine, so they can be configured to be accessed from within the virtual machine

This is an interesting evasion technique. This one is generic and requires the infected user to have the permissions and privileges to install and run VirtualBox on the computer itself, which might prove tricky for some attackers. But you can imaginet hat if this is run on your domain controller, file server or other core system then it would work rather neatly to avoid the normal process inspection that many antivirus and EDR tools use.

CISA’s newest tool is a free and open logging and protective monitoring solution serving all organizations. Secure your Windows-based equipment today with Logging Made Easy.

Initially created by NCSC and now maintained by CISA, Logging Made Easy is a self-install tutorial for small organizations to gain a basic level of centralized security logging for Windows clients and provide functionality to detect attacks. It's the coming together of multiple free and open software platforms, where LME helps the reader integrate them together to produce an end-to-end logging capability. We also provide some pre-made configuration files and scripts, although there is the option to do it on your own.
Logging Made Easy can: * Show where administrative commands are being run on enrolled devices * See who is using which machine * In conjunction with threat reports, it is possible to query for the presence of an attacker in the form of Tactics, Techniques and Procedures (TTPs)

I was really sad when the UK’s NCSC retired the Logging Made Easy project due to lack of capacity to maintain it, so it’s really nice to see that CISA has picked it up and now released their next version of it. Having consulted on logging systems, this really is the best basic bar for logging systems, and while it isn’t going to work for complex enterprises, it provided free and simple technical deployment for event capture, forwarding and simple analysis, that is far beyond some of the poorer commercial tools that I’ve seen.

Worth watching this project, especially for your smaller projects, companies and systems.

Wait, is cloud bad? - by Forrest Brazeal - Good Tech Things

For us, the cloud represented a hope that things maybe could be better, because they really could not be worse. There’s a reason that even now when I talk to VP types at large enterprises, the first words out of their mouth are often “We want to get out of the data center business.” Because when you have low overall technical competency, the data center business suuuucks.
[…]
Come with me outside the tech-hub bubble into an average midsized US city. Say, Greenville, South Carolina, where I lived and worked for the better part of a decade. There are like five significant employers there that hire IT people. The same people boomerang back and forth between those five companies throughout their career.
Most of these companies are now running on the cloud to one extent or another. As a result, they have finally started to get that local talent pool on board with basic cloud concepts like ephemeral instances and automated deployments.
If you, running one of those companies, make the bold decision to leave and go back to the data center today, what kind of talent is going to come with you? It is not going to be DHH’s brilliant visionaries who see the limitations of cloud. It is going to be the crabby people with pet servers who just want things to go back to the way they were.
Travis Cole, who ran data centers for decades and knows whereof he speaks, says he would choose cloud under these circumstances]
[I]f your team just doesn't have the skillset or desire to do any of the DIY stuff. …[Or] when it's easier to spend money than hire people with the skills to do all this stuff. Also giving teams their own AWS accounts to do stuff in is powerful.
That’s why for the low competence/low growth quadrant I say “choose cloud, but seek help”. Your goal should be to raise the skill level of your people by forcing them into contact with cloud practices and tools. Take away their learned helplessness by giving them throwaway cloud environments they can spin up without asking procurement. Make friends with your cloud TAMs, maybe some good consultants (I know a few), and inject their competence into your team.

Excellent summary of why most firms should be choosing cloud solutions.

The sets of skills, capabilities and technical knowhow to run a data center is a whole investment that your organisation doesn’t need now. It does need to replace those with some people who can understand how to setup cloud environments well, how to manage multiple accounts, put in place billing restrictions and support development teams and outsourced dev teams with using the cloud efficiently. That might sound like a straight swap, and it is in some cases.

The cloud isn’t necessarily cheaper (although it can be), in fact on a like for like on simple architectures it can be significantly more expensive. What the cloud provides is the ability to be more responsive to changing technical requirements. Whether that be simply autoscaling based on customer demand, or it means changing architectures based on changing environments, it’s far easier to use cloud systems in a more dynamic way.

Thanks for reading

Michael

CyberWeekly

Discussion about this post