Posted: 24 Nov 2019 12:19 PM PST

Working at a helpdesk-company with about 50 to 100 customers, mostly involved in projects either internally or for our customers, but I get my share of T2-cases too, when our interns cannot solve it. One of our customers, who sells some gardening equipment, contacted us that their two NUCs at their production facilities need to be replaced. Nothing unusual, so we buy 2 new ones and get them ready. That includes OS-updates, remotecontrol-software and some other customer-specific software. Finally we install a VPN-client and join their domain.

I set up one of them and one of my bosses the other one. Somehow the one he set up is not working properly, or to be more precise, one of the software doesn't work properly on it. It just crashes when you try to send a customer-order out of the software straight to Outlook. Sadly the software was written by a company for our customer and solely for that purpose at the time. They didn't plan to sell that software to anyone else, as it is too specific to be sold on the market. So the company didn't put too much effort into it.

We contact the company but they don't reply to phone calls or e-mails. So we try to somehow figure it out by ourselves. But the software doesn't produce any error-codes or crash-logs. It just loads endlessly and can only be stopped with the help of good old taskmanager. We try several different steps as reinstall the software-module, the whole software, the whole pc, we clone the working one....but to no avail. Still same symptons.

After almost a month we finally get a hold of that company and the reason why they haven't been replying is that the one employee, who wrote the software by himself, is no longer with the company. They reached out to him, pretty much begging him to help them out on this one. We got his e-mail and phone number and started communicating. So we try some troubleshooting but he cannot figure it out either. Everything looks like it should. We do some sql-queries from both NUCs and both come out identical etc.

Then he asks me to send him the output if I start the software through cmd and check the output inside the cmd. And behold we finally get a clue. Some form of exception that software cannot handle. I don't have the source code, so I couldn't check which part causes it to crash. I sent him the exception and the next morning my boss starts to laugh maniacly. The other boss and I look over to him and he just makes a hand-gesture for us to come over and look at his screen, as he cannot talk.

What the other boss and I see is the response of that developer. Apparently (and for whatever reason) his code wants to play a sound whenever a work is successful. And since that NUC doesn't have a device attached, that could play the sound, the software crashes. Since I worked as the internal sysadmin within a software-development company, I somehow couldn't laugh at first. It just pained me....

After several moments of being speechless and shaking my head, I install a virtual device on that NUC that is supposed to play sounds. Windows recognizes it and I test the software. It works. I call the people who work on those NUCs to tell that they can work on the NUC again, without telling them the cause of their problem. We inform the software-company and let them handle the call with the superiors of our customer. Later on we find out that the other NUC has some speakers attached to it for music, that's why that one worked while the other one didn't.

TL;DR

Dev codes an unneccessary feature into the software and doesn't catch the exception, causing our customer to lose 50% of their productivity.

submitted by /u/Chilled_IT
[link] [comments]

Do you mean we need to actually verify the backup?

Posted: 25 Nov 2019 02:27 AM PST

DieselSELdr's tale gave me nightmare flashbacks to a previous employer.

I was hired in an extremely fast process to replace the previous network admin ($OldGuy), who had been fired on the spot a week before for reasons I never learned.

After learning backups were also my responsibility I investigated the system. Basically, every Monday morning I would have to get the tapes from the backup system and put them in a secure transit case, which was then sent to offsite storage. Then I'd place new blank tapes in the system so the backups could run over the next weekend.
One of my first questions to my manager was when the last restore test had been.

You guessed it, they had never done so under $OldGuy, so I scheduled to have one week, one month, and one year old tapes get sent back from the storage location. Most were fine but for a very $CriticalDatabase all turned out to not be blank, but hold the same data... data that was at least 2 years old, going by the file dates and content.

Turned out the backup backed up a single folder on $CriticalSystem to tape, and that folder was supposed to hold a copy of the $CriticalDatabase that was copied over the day of the backup itself.

I suspected that $OldGuy had 'optimised' the system at one point so the actual process that copied the backup copy to the folder in question no longer worked, and the backup system was happily writing the same data week in week out to tape.

I changed the backup to copy the actual $CriticalDatabase backup file instead and scheduled a quarterly restore test to see if everything was working fine, after test restoring the first new weekly backup.

The company was extremely lucky in that they never had a serious incident in the 2 years prior, and I had a very grateful manager before my first month trial period was even over. Ended up working there for a good year until the new management decided to outsource IT to India, and they went under a few years later :/

submitted by /u/hovercraft_of_eels
[link] [comments]

of coffee

Posted: 24 Nov 2019 11:22 PM PST

TL;DR now matter how much you prepare, check EVERYTHING in EVERYWHERE

3 years ago

$me: obvious

$cw: not exactly happy

$devs: cooperative, just not good in administration

Financial company. Near years end. For those of you not familiar, this is reporting time, with lots of people going in to stress-level 11 on a scale of 1 to 5 because everything needs to be done yesterday or else penalties galore. On the reporting department there are 3 lockers dedicated specifically for storing hard liquor.

$me: *sees Java is out of date by about 2 major versions and some 50 security updates*

$me: *builds package to update and clear out old versions*

$me: *checks the list of software based on Java*

$me: *checks each and every application on that list if it will survive on the new version*

$me: *tells some developers to get their act together*

$devs: *deliver new versions of their software, which works flawless on the new version*

$me: *puts new Java into test environment*

$me: *tests with $devs and key-users. Get an OK*

$me: *puts new Java into acceptance environment*

$me: *tests with $devs and key-users. Get an OK*

$me: *communicates to each and every person involved we will update Java and now is the time to come forward about anything they might think is using Java*

silence

$me: *starts rolling out to a smal group including the key-users in production*

all hell breaks loose

how, wait, what?

$cw: "You $me, what on earth?"

$me: "Good question, I have been preparing this from A to Z with all people involved in T and A and now a pilot group in P. I don't understand."

$cw: "They are reporting issues on $appC, $appF and $appQ"

WTAF

$me: "Right, this isn't funny, I will come over and check"

No point for guessing the backends in production environment for those apps WERE NOT YET UPDATED to the version they had in T/A!

<NSFW> now what

Well, it turned out all of the client software was already virtualized (with AppV if you want to know). All I had to do was package the older version of Java INSIDE the bubble for the $appC,$appF,$appQ clients and all was well again.

Fall-out: minimal production loss, one year older in a few hours and an actual complete list of software dependent on Java plus much tighter version control on all applications. Not a bad score.

Time for New years Eve!

PS: Coffee is sometimes referred to as a cup of Java

submitted by /u/evasive2010
[link] [comments]

You cabled that wrong (hint: I had cabled it just right!).

Posted: 24 Nov 2019 01:30 PM PST

This happened right about twenty years ago (ok, that makes me feel old) when I worked for a company that was contracted to do break-fix and software support for a now-defunct (acquired) UNIX systems company (visible in the sky, usually when the moon is not visible). I was placed on-site with a very important financial services publishing company at their primary data center in New Jersey. Having a desk on the raised floor and nothing more than a news-equipped pager to get me through the boredom, I really enjoyed when tickets came in.

This story happened when one of a pair of midrange servers in a cluster (AntiMoon-Cluster?) had been acting up for weeks. I was instructed to replace that machine, keeping only the Dual Port UWD SCSI HBA, which was cross connected to a pair of SCSI arrays, using "Truth Volume Manager" (TxVM, to keep the brand out of it). Because these arrays were cross connected to the two hosts in the cluster, each port on the SCSI HBAs had their IDs changed accordingly, to avoid conflict of the default SCSI initiator ID of 7. Since we were not changing the HBA, the IDs would stay intact - I clearly labeled each cable as to its orientation and took copious notes about how everything was connected. After reconnecting all of the cables to the new server, the young sysadmin from the customer insisted I had done it wrong. I showed him my labels and my notes, but he disagreed and insisted we change the cables before turning the server on. I refused to make the change, so he went ahead and changed them himself. You can see where this is going... when the SCSI IDs clashed, the TxVM filesystems on the pair of cross connected arrays was completely and utterly corrupted.

Escalations ensued - we had a TxVM expert attempt to fix the file systems - no luck. We attempted to locate the most recent backup tapes - which were on the cart as expected, but they were (gasp!) blank! Yes, the interns dutifully swapped the tapes every day, but nobody had bothered to check if anything was on them. The system was dead as a doornail.

What I have yet to share is the purpose of these machines. They were the primary pagination systems, responsible for final page layout of the important financial publication. Since the change window was Friday afternoon, we worked furiously through the weekend to get things back online. Finally, by Sunday afternoon, a suitably close replica of the server had been restored from the distant DR site over the WAN. I was told that it came in just an hour before the cutoff was - much later there would not have been a Monday issue of the important financial publication. That would have most certainly been a newsworthy event, as this publication has been printed every weekday for a very long time.

I suppose, then, it is no surprise that I never, ever saw that young sysadmin again.

submitted by /u/DieselSELdr
[link] [comments]

Breaking News

Tech Support

Monday, November 25, 2019

Quick-and-dirty solution comes back to bite you! Tech Support

Quick-and-dirty solution comes back to bite you! Tech Support

No comments:

Post a Comment

Popular

Blog Archive

Fashion

Beauty

Travel