That's not a change to the system! Tech Support |
- That's not a change to the system!
- The Lidless Barrel of Unstable Fission Products
- Standard Operating Procedure
That's not a change to the system! Posted: 02 Nov 2019 04:02 PM PDT So I work in the government and during a previous assignment I acted as a program manager (not my primary job skill). We ran multiple systems in a variety of different classifications. For those not use to working in classified environment, they usually contain a cross domain solution (CDS). This allows data to flow from one classification to another. Our system had expanded so we needed more channels and rule sets through the CDS. After getting all the approvals needed over many months, it was time to have the manufacturer come out to update their software.... The individual that was sent arrived with no back history, literally showed up and asked us what he was supposed to do. So even though we had weekly tag ups for almost a year, somehow they sent somebody completely unprepared and without the configurations and data sets needed. We supplied all of this and got to work. Our system was an operational system with 24/7 run time. No down time allowed. We had morning and close of business tag ups with all the team members of what was suppose to be done and then (at COB) what we are doing the next day. We could update the backup units as needed, then do testing when the Ops floor schedule allowed. Things were moving smoothly, until they weren't. We had some issues, like IPs not being correct or ports misconfigured, but we got through it. All looked good until the CDS went down in the middle of the night and the Ops floor had to switch back to the original hardware. We had to call the manufacturer rep back in before his flight and figure out the cause. Number 1 rule on the Ops floor, no changes are made unless the operators are aware of it and agree mission impact is minimal/non existent. Every time we switched between Ops and backup hardware, I had to poll every stake holder to make sure we were good to go. Before I gave the green light to the team, I'd say wait and let me check with the different Ops floors. We switched to the updated baseline and everything looked good, they had found some more port issues and we hoped that was the issue. At the tag up I had everyone go through what they did and why. We get to the manufacturer/our prime contractor and they walk through the fixes and mention they switched back to original hardware. WHAT?!?! I never gave approval or checked with anybody to make the switch back. They claimed ignorance and thought it was the plan, but couldn't justify why they switched without explicit approval from me like the previous 8 or so times. Now we had mission ops that precluded us from making any changes. No changes allowed on a Friday afternoon. So next week rolls around and we get on the updated baseline, overnight the CDS fails again. Now the shit has hit the fan, and we look like idiots to our customers. At this point I get very involved. After a few days of testing I'm at a lost, it fails every night. I'm a EE with minor in Computer engineering, not a complete idiot to servers and networking. I had the manufacturer walk me through everything. Here's what I found, everything worked fine during the day but not the night. So obvious clue, what is different? I found out the manufacturer was running a hardening script every night before he left, but not telling anybody about. I lost my mind when I heard that. What part of make no changes without my explicit approval do you not understand? His response: well hardening really isn't a change. His hardening script made some changes that other cyber protection scripts saw and would shut down the system at their checks at night. He never mentioned to anybody he was running that script at the end of the day and we spent days trying to figure out the problem. Moral of the story, don't trust the unprepared manufacturer rep. And people's definition of "changes to the system" can be very different. [link] [comments] |
The Lidless Barrel of Unstable Fission Products Posted: 02 Nov 2019 06:13 PM PDT I originally posted this in r/Idontworkherelady, and it got content policed. So I'm trying it here because I was there to offer technical support - I wrote the bleeding manual - even if my main function was to sell the stuff to people I judged wouldn't be ringing tech support too often. TL;DR : Prospective customer makes hopelessly vague request. OP briefly investigates and declines the 'opportunity'. OP and their employer GTFO. A fair few years before the new and exciting job at XYZ Inc I mentioned in my last post, I learnt my trade somewhere else. Just to make it confusing, we'll call them XYZ Limited. XYZ Ltd did all-sorts, basically they were a one-stop-shop for industrial fixes. Anything from "I need a motor for my conveyor belt and it goes about this fast", to all the power, transmission and control gear on a machine to make berets - the hats - start to finish in less than a minute a piece. Berets are knitted by the way, never realised that until I saw them being made. Anyway, you come to XYZ Ltd with your specification - that means you understand your manufacturing process and design the parts that actually interface with the materials and product, and then you tell us how and when they're supposed to move. We engineer that and sell you a kit of parts to do the job including commissioning if you need it. These jobs might be a one-off or they might be low-production jobs where the customer called off repeat kits as they sold machines. You get the idea. My job at XYZ Ltd was to be the technical expert on a particular technology/product range. Enquiries came in, and if they looked like they needed my kind of tech, they landed on my desk. Assuming it was my area, I worked out how to do the job, using other XYZ Ltd products as well if needed, and then went back and quoted the customer. Often it was necessary to see the thing first hand to really understand, and it also helped drum up sales to go round prospects with the sales reps. So my time was spent maybe 60/40 in the office/field. So one day I'm in the office and a fax lands on my desk. Customer specification, quotation required. "We have released these tender documents to several potential suppliers in your industry, and your best price and a speedy response will be needed for XYZ Ltd to have any chance of getting the order." I'll call the possible customer D-Tech. For Dunce. This thing runs to seven pages, which isn't a bad sign in itself - the devil is always in the detail, so the more data I have to start with, the better. The customer is building a machine to lift something up from one level to another. We can do that. The first warning sign is the sketch of the mechanism on page one. I can tell just by looking at it that it'll jam solid as soon as you switch it on. My colleagues are all of the same opinion. Any teenager who maintained their own bike would also have come to this conclusion. Well maybe that's just a drawing error, they'll have to correct it anyway to build the thing. I'll just assume that they will, so I can still calculate loads, forces and the equipment needed, specify the control gear, etc. etc. Go ahead and quote the job with a caveat about the mechanics. Oh no I can't. The remaining six pages contain literally no dimensional information of any kind. The writer has waxed long and lyrical on their own and D-Tech's design capabilities, moral ethos and general attractiveness. And also about just how important it was that this job be done to the highest possible standards by industry experts such as themselves. Unfortunately they have failed to include what I was looking for, including : How big/heavy is the object to be lifted? How far does it need to be lifted? How quickly does this need to be done? How often is this going to happen? Do we need to hold the thing in place when it gets to the top, or will some other machine take over supporting the weight? Is the load coming back down again, or does it get taken away when it gets to the top? What accuracy does all this need to be done to? What resources are available to power the system, and are there any limits on those? What IP rating is needed and are there any special conditions we need to design for? Obviously, those questions could easily lead to yet more questions as I learnt the context of a machine design and its application, but you get the idea. So things aren't looking good, but my employer will be happier if I at least try to get a spec before dumping it in the bin. So I ring D-Tech. I get to speak to the single snottiest receptionist I've ever encountered. They explain to me that this specification was written by their Technical Director personally. They are an Expert in EVERYTHING THEY DO, and there is therefore no possibility whatsoever that the document could be lacking any of the technical information needed by XYZ Ltd to quote. If I don't think I have what I need, that's because I can't do my job properly and I should either sort myself out or just save time and ask my boss to do it. Probably the latter because of the need to quote ASAP. No, the receptionist will not be passing either my call or my questions onto TechnicalDirector, that would be a waste of time. With maturity and hindsight, I now realise I fully agree with this and am even now grateful for their decision, but not for the reasons intimated to me at the time. I did get one question answered. "What is the load this machine's lifting?" "It's a barrel of nuclear waste, but I don't see why you need to know that. The barrel gets lifted up to have the top welded on." XYZ Ltd respectfully declined to quote. [link] [comments] |
Posted: 02 Nov 2019 09:30 PM PDT One of my clients was running a hosted server in a data centre that was unfamiliar to me. The software was a typical LAMP (Linux, Apache, MySQL, PHP) stack. It had been running for nearly a decade. I was contacted via, via, because the original developer had moved on to greener shores. The first order of business was to get access to the system, which consisted of a collection of domains for several different organisations who were collaborating within the web-platform. After spending weeks, yes weeks, getting some form of documentation together with credentials, host names, DNS entries, hosting providers, the standard stuff, we finally got down to the important stuff. The first item on the list was: "Why is the server crashing so often?" I said: "Wot?" "Yes, it crashes every few days." So, I started digging through the logs and found that it was indeed crashing, regularly, about once every two days. Turns out that there was a database query that ran regularly that caused the server to run out of memory. Then the OOM Killer (The Out Of Memory Killer) running under Linux would come along and kill the offending process - MySQL. Then the hosting company would notice that MySQL wasn't running and would reboot the server. I set up a swapfile, configured a one-minute cron-job that told OOM Killer that MySQL was a priority job to start to stabilise the environment. Of course, killing MySQL had some side-effects. There were several corrupt tables which exacerbated the issue. Managed to repair those. Backups was another fun experience. It was supposed to back up to S3, but it would run out of disk space, since it would create a backup file that included all the previous backups. The S3 bucket itself was used for both caching and backups, so public and private objects in the same bucket. The last actual backup was at least 12 months old. At this point I had created a new private bucket, got backups running, cleared out some dead wood on the drive (can you say PHP "temp" cache?) and had the system mostly stable. The real work was yet to begin, but at least the system wasn't falling over every few days and running out of disk space whilst making a backup. I still hadn't managed to locate the spurious SQL query that was causing havoc, so I'd turned on query logging so I had a fighting chance to catch the culprit. I then had a family member die and had to spend a week away from the office. Of course this was the time that the server chose to crash, again. The hosting company had been contacted by the client and I managed to log in to see what they were up to. The first thing they did was delete the logs. At that point I terminated their connection and changed the root password. I didn't actually know until then that the hosting company had root access. When asked why on earth they had deleted the logs? "Standard Operating Procedure". There is more to tell about this particular installation. For example, a database table with more than 700 columns! An installation with 100+ add-ons installed. Oh, did I mention that nothing had been updated or patched for 7 Years? [link] [comments] |
You are subscribed to email updates from Tales From Tech Support. To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States |
No comments:
Post a Comment