Electronics: UPS Repair

What do you do when you accidentally cook the microcontroller of a commercial grade uninterruptible power supply, containing $100s worth of power conversion hardware?

Well, you buy another one of course, tear it apart too, and don't do the same dumb thing that broke the first one!

The device I'm talking about is the APC Smart UPS 1500, model SMT1500I. It's a 230V uninterruptible power supply running on two 12V 17Ah lead acid batteries, with a 50/60Hz transformer based inverter and an add-in card slot that can support remote monitoring. These devices have been around in some form or other since the 1970s, but the SMT1500I is the latest model in the series with an LCD display, improved power efficiency and better monitoring capabilities, first released in the early 2010s and still in production today.

Image credit: APC by Schneider Electric

What happened?

Suffice it to say, I was curious, and mistakes were made involving multimeter probes. There are plenty of headers inside such a device which have small gaps between the pins, and multimeter probes slip off of those pins quite easly! The main battery voltage rail inside the unit, which sits at about 27.5V under normal conditions, was shorted directly to one of the I/O pins of the microcontroller. The poor chip was cooked instantly.

What can you do?

As far as options for actually fixing such a device when this happens, the average consumer or technician doesn't have many. That's because the microntroller in question - one of two in the unit - is a PLCC44 chip soldered to the board, with proprietry firmware on it. Once you cook that, there's no going back - and it didn't help that the chip used in the particular unit I had looked to be some custom APC part with a part number I couldn't find reference to anywhere.

Long part numbers and a long history

I decided to try pulling the sticker off of the microcontroller. What on earth did I have to lose? Not an awful lot, the thing was already dead. Guess what I found underneath?

Image credit: eBay - liaoxiyuan

The part in question was actually a NXP P89CV51RC2FA - not a custom APC ASIC of some sort. Haha, can't fool me with stickers. Anyone who knows anything about 8051 microncontroller part numbers will recognise that as an 8051 series 8 bit microcontroller. In this case, the "P" indicates the manufacturer, Philips (who renamed their semiconductor division as NXP) and the second "C" indicates 32kb of flash memory. I found, to my disappointment, that it was discontinued at the end of 2011.

An 8051 microcontroller, in a post-2010 UPS? That's a bit old, isn't it? Well. As it turns out, APC have been building these UPSs on 8051 microcontrollers pretty much since the beginning - decades ago - and have continued to use that platform to this day. If it ain't broke, don't fix it, I guess. In this most recent SMT series of UPSs, they have actually added a second microcontroller, a much more modern STM32 ARM Cortex M3 based chip, which interfaces with the main microcontroller via UART and handles external communications and the LCD front panel of the unit.

But wait, there's more

I managed to get hold of a document from Atmel (now owned by Microchip) that cross referenced the discontinued NXP part number to an Atmel drop-in replacement - the AT89C51RC2 - which was still in production!

Getting the firmware

Getting a replacement microcontroller is one thing. Soldering it in is quite another. But both of those things are just a matter of logistics. The real problem for me was the firmware.

APC offers firmware upgrade files for this unit, but they're not the sort of thing you can just chuck straight onto the flash of the microcontroller and expect it to work! Through a process of research and experimentation, I was able to figure out that the firmware files APC provides are encrypted, and contain firmware both for the STM32 communications processor, and the firmware I needed, for the 8051 main processor. The communications processor runs a bootloader that accepts the file from a host computer over a USB or serial interface, and decrypts both its own firmware and the main processor's firmware, before flashing the main processor's firmware. Long story short, it was a dead end. I'm no hacker (well, that's what they all say isn't it) and didn't want to spend countless hours trying to figure out how APC encrypted their firmware files.

But that's not the only way to get hold of the firmware. Mhuwahahahaha. Well, I probably shouldn't actually tell you how I actually got hold of the firmware, because it's proprietry, and APC clearly don't want people getting access to it judging by their encryption of the firmware files. Suffice it to say, every UPS has the firmware stored on it, and I had a second UPS of the same model in my posession...

The repair

I bought a replacement microcontroller, and went through the laborious process of soldering it in - not easy when it has 44 pins and I have only one soldering iron and two hands. I can't say the soldering looked neat. I damaged many pads in my various attempts to solder it on and that means that the final result involves many 'bodges'. However, after closely inspecting everything to ensure there were no actual dry joints, I was at least mostly satisfied that it wouldn't fail on me.

I flashed the firmware, this time using standard microcontroller programming tools and the un-encrypted firmware image I'd sneakily obtained. Low and behold it worked. Well, that is to say, the microcontroller worked. I was lucky that the EEPROM chip, separate from the microcontroller, survived the entire process with its calibration values intact, however in the process I'd managed to screw up part of the inverter.

I went through a long and gruelling process of tracking down the problem with the inverter, during which the main 12V SMPS decided to call it quits and pass its 25V input voltage through to the output. This didn't help matters: it fried a good handful of ICs and ASICs on the board. It was like I'd fixed the real problem, but at the same time, completely botched up the rest of the repair. I was able to replace everything with the help of some Asian eBay stores which somehow had supplies of 'genuine ,high quality ! [sic]' replacements for the custom inverter ASICs. I theorise that they had gotten hold of rejected batches of those ICs from the OEM, because several of the ICs I recieved were open circuit on all pins while some worked perfectly. Suffice it to say, I just found one that worked and soldered it in!

With all that done, there was a heroic moment when the thing finally produced a 230V sinusoidal output for the first time in several years. Soon after, it gained enough trust to be plugged into the mains(!) and some time after that I considered it fully fixed.

Corrupted EEPROM

However, there was one thing still wrong. It wouldn't estimate its runtime correctly. These things are supposed to give you a number of minutes that they could supply battery power for, at the present load. But it would calculate exactly 168.0 minutes no matter the load on it. Why?

I did some digging, knowing that the estimate was being passed from the main processor to the communications processor and then out to the network card. What I found was that the main processor was actually estimating 9999 minutes - the highest value it could - but the communications processor was suffering from some kind of overflow and displaying something else, clearly not expecting something like that. Normal runtime estimates top out at less than 400 minutes.

Why? Well to cut a long story short, I found that the thing thought it had no less than 85 external battery packs connected! I mean, full credit to APC for making something that can, at least in software, support up to 85 external battery packs, but really? It seems like the EEPROM did somehow get just a little bit corrupted during my work on the PCB, and that was the result. With that fixed - the number set to zero - the problem was gone.

The best part of 3 years later, the thing still sits on my desk doing exactly what it's supposed to do. Looking on from the outside, you wouldn't have a clue how much I've messed with it!