- Company moves at a glacial pace. Every geo outside of US lacks any management or authority from most teams, so nearly every request must go round the globe to get approval from US.
- Every team is extremely siloed. "SRE" at ServiceNow is about as far from the Google SRE model as possible. It should just be called: "Operations". Teams designing or developing applications or infrastructure are lost in a maze of Hyderabad, impossible to contact. Stuff is written or changed without any thought to alerting or outages, or how it will integrate with other systems.
- Communication in general is the worst I've seen at any company of this size. Teams simply won't respond to emails, ignore their team distribution lists, ignore high severity tickets assigned to their queue. Manager tried to reach out to them too, got no response for over a month. For a critical production issue. Not even an acknowledgement. Many critical infrastructure systems are owned by a single person in the US, rather than a team. Any question or issue is met with "we need to wait for xyz to look at it" - sometimes these issues linger for years.
- Project management are sycophants. Any project is always "100% green, everything is going ahead of schedule!". Identifying issues or asking questions is met with fierce opposition. Many cases where testing was requested, then found numerous issues, and launched anyway. Ultimately leading to major customer dissatisfaction due to advertised features not working properly at all.
- Extreme emphasis on ITIL, while missing the point. Zealous adherence to ITIL for change management, but a terrible user experience. Working with the archaic and laggy interface for tickets is miserable. Despite this, many breaking changes and other regressions would often (every week) be introduced, which would have zero thought given to rollback or deployment, happily approved by multiple change advisory board meetings. Oh and all of this was done through the browser. Combined with the lack of communication from other teams, and the near-inability to find the developers involved in changes, meant every 'deployment' was a trash fire.
- Monitoring is so bad. It is unbelievable. Operations handle all alerts, but have no say in how alerts are configured, thresholds etc. Constant noise. This is paired with an illogical and anachronistic mentality from upper-SRE management which want "a single pane of glass" whereby every single alert goes to SRE first. A huge portion of workload is simply reassigning tickets - to another team which then dutifully ignores them.
- Every team is so lazy, but so comfortable, that trivial issues take months to solve. Communiation is a constant struggle, trying to get blood out of a stone.
- The pay is far below-average.
- Fundamental issues are ignored in favour of over-complicated solutions, and then those solutions only consider 'the happy path'. Failures will explode and cause unknown side effects.
- Hiring calibration is broken. Our team routinely was floored by revelations such as *Principal* or *Senior Staff* engineers who didn't know how to use ssh, didn't know what a segfault was, etc. If we even managed to get them to reply in the first place, that is.
- Management will sell you on the idea that 'our team will make a difference and improve this!'. Well, 2 years later and things only went backwards.
- Poor design choices and horrible technical debt are hitting scaling limitations *hard*. Everything is constantly on fire.