Chatbox Thread

Right? I often find something on marketplace to sleep on and then when I go to search it out later I end up having to scroll for an hour to find it again, even being very specific in my search terms! It also has a bad habit of giving listings from 1000 miles away even though I set distance to 0-30 miles


Craigslist shot themselves in the foot ending personals, all the weirdos and freaks left and they had all the cool stuff for sale 😆

You can find deals there still, I got my current set of wheels and tires there in 2022 for stupid cheap because they weren’t listed anywhere else.
just save the post, I've got numerous saved folders with different content.
 
just save the post, I've got numerous saved folders with different content.

I do that, but occasionally I’ll scroll past a listing I’m not interested in in the moment or am looking at something but the app refreshes in the background as I’m doing something else.
 
Probably just dust on the connection in the knobs from sitting unmoved for a long time. Sometimes you can blow air into it or connecter cleaner slowly around the knobs while turning them to clear up the connections.

I ended up pulling it apart, and cleaning 40 years of dust out of it.
I used DeoxIT on the back of the controls like Grog6 suggested.
It works great! I don’t have anymore sound issues, except the back lighting no longer works, so I will have to pull it apart again and check that out, I was considering swapping it with an led, but I’ll have to do some research on that first.
 
I stumbled upon an article written by some armature writer on Facebook about the Thunderbird SC.
I was going though the comments and I couldn’t believe how misinformed people are about this platform, other than the stupid comments “Thunderturd” and Thunderchicken”
There were a lot of comments about how the SC is on the Fox platform, they were FWD, they had a transaxle, they had cable clutch issues, bottom end issues. The 35th anniversary paint scheme wasn’t factory. 🤦🏻‍♂️
Ugh, if you don’t know what you’re talking about just STFU.

Sometimes, I’ll school them on the MN12 platform, but then it turns into them calling me the idiot. You seriously can’t fix stupid.

This is why I took a social media break, there is just so much toxicity on there.
 
I stumbled upon an article written by some armature writer on Facebook about the Thunderbird SC.
I was going though the comments and I couldn’t believe how misinformed people are about this platform, other than the stupid comments “Thunderturd” and Thunderchicken”
There were a lot of comments about how the SC is on the Fox platform, they were FWD, they had a transaxle, they had cable clutch issues, bottom end issues. The 35th anniversary paint scheme wasn’t factory. 🤦🏻‍♂️
Ugh, if you don’t know what you’re talking about just STFU.

Sometimes, I’ll school them on the MN12 platform, but then it turns into them calling me the idiot. You seriously can’t fix stupid.

This is why I took a social media break, there is just so much toxicity on there.

FB is a hot pile of garbage. Even their shorts are terrible with all the embedded ads and ads over the bottom half of the video.

IG, while still a FB / Meta product, is much cleaner. The ads aren't nearly as bad and the commentary section for videos is made (seemingly) to make people not keep a consistent / fluid conversation on purpose, which in turn keeps the comments down. That's my take on IG at least.
 
All I know about IG is what my wife occasionally shows me from the "instagram vs. reality" callouts which show the sheer ridiculousness of what people post. I didn't need more convincing to stay off the platform, but it certainly helps keep me clean!
 
All I know about IG is what my wife occasionally shows me from the "instagram vs. reality" callouts which show the sheer ridiculousness of what people post. I didn't need more convincing to stay off the platform, but it certainly helps keep me clean!

Cars, space stuff, dark humor, auto-racing, and pendejadas are my IG feed.

My wife watches stuff (on that absolute garbage of a platform owned by the Chinese worse than FB) that romanticizes RL and "look at this, we can do this" and I know 100% that it won't be like that.
 
I stumbled upon an article written by some armature writer on Facebook about the Thunderbird SC.
I was going though the comments and I couldn’t believe how misinformed people are about this platform, other than the stupid comments “Thunderturd” and Thunderchicken”
There were a lot of comments about how the SC is on the Fox platform, they were FWD, they had a transaxle, they had cable clutch issues, bottom end issues. The 35th anniversary paint scheme wasn’t factory. 🤦🏻‍♂️
Ugh, if you don’t know what you’re talking about just STFU.

Sometimes, I’ll school them on the MN12 platform, but then it turns into them calling me the idiot. You seriously can’t fix stupid.

This is why I took a social media break, there is just so much toxicity on there.

There is a sizable portion of the population that think Facebook is the entire internet and through the day only ever open that app (or Tik Tok, or IG) and think whatever content they saw… which was made for clicks/clout rather than actual passion(let alone knowledge)… is gospel, and the viewers of it now think they’re experts on the subject, where they can go on to comment on it in other posts. It’s a negative feedback loop of willfull ignorance.

I could go further and share my full opinion of what social media has done to civilization since its advent but I’ll save that for the doomsday thread 😆

As for IG, if you tailor you feed it’s fine, but if I wanted to look for pictures of stuff I’m interested in I would find everything I want and more in a Google image search. I guess I can’t comment or engage or know how popular it is though….. oh the humanity!
 
I sold a few so far, this is what I have left that is in nice shape. Most of them were projects that I upgraded. Goal is to sell most of them, going to keep my 600x,t23,t420s and t60 frankenpad.

I gave up on finding that t61 14.1" 4:3 screen assembly, so I put it's motherboard into a t60 15" and made a frankenpad

600x- modded with 850mhz mmc2
t420s- i7 2640m and 1080p IPS mod
t61- 15.4" wsxga+ mint original condition with factory HDD and image
(x3) t23- forgot specifics but clean
t23– 1.2GHz, sxga+
T60- frankenpad, mint, 15” with t61mobo
z61t – 14” WXGA+ titanium lid, 2.33GHz

My current project is a T60 with the original box and packaging, replacing a bad motherboard in it. Trying to get it back to mint in box condition
I've given a bit of thought about whether to keep my W520. On one hand, I've upgraded it a lot (including installing a Core i7 that's faster than anything Lenovo offered from the factory). Since I also have dual 9-cell batteries in good condition, dual AC adapters, and the advanced dock, I could get a lot for selling the whole package.

I'm not actively trying to declutter, but I just don't have a use for older hardware once I move on from using it daily. A few months ago, I considered replacing it with a T580 because that and the T480 are considered to be the best of the classic ThinkPads that still have modern-enough hardware. However, I wouldn't travel with a T580 either because I've had a much newer and lighter laptop for that purpose since 2020.
 
I never really understood the exodus from Craigslist to Marketplace. Did people think all of the idiots from Craigslist would just stay there and not migrate over to Marketplace? At least the search function on Craigslist works. I recall numerous times searching the entire country for something specific in maybe 15-20 minutes. On Marketplace it takes me 15-20 minutes just to sift through mounds of crap in one region that has absolutely nothing to do with what I searched for.
IMO, the exodus was driven by how well scammers have automated their bots. Most of what I sell is either niche industrial equipment or niche products. Occasionally i'll need to unload some general stuff (like a 4' round table from a failed startups breakroom).

- I stoppped listing on CL when 9 out of 10 people who contacted me were bots asking if I was real and then asking to prove myself by giving them an email verification code from yahoo/google/whatever. Total BS.

- I still have to deal with some dirtbags on FBMP. Since a lot of what I sell is specific, I do deal with some lowballers but in general nerds are reasonable in negotiations and punctual. It helps that I spell out exactly what i'm selling, link to datasheets when I can, and price reasonably.

The absolute worst items to sell (non-current gen Apple Product = because they are generally a toxic mix of wanting the brand name/style, being cheap, and uninformed about what they are buying).... and fucking furniture (the trouble it took to sell some lounge chairs from a failed biotech startups office. In between the people who wanted to nickel and dime me when I was only selling them for $50/ea, asking a million questions about the chairs (noone sat in them, the company died, no i don't know if it was a pet free/non-smoking office. These are NOT heirloom chairs. Fuck, I still have PTSD from that transaction. I think I ended up donating them for a tax writeoff so I wouldn't have to deal with another flake.
 
There is a sizable portion of the population that think Facebook is the entire internet and through the day only ever open that app (or Tik Tok, or IG) and think whatever content they saw… which was made for clicks/clout rather than actual passion(let alone knowledge)… is gospel, and the viewers of it now think they’re experts on the subject, where they can go on to comment on it in other posts. It’s a negative feedback loop of willfull ignorance.
Fun fact: in some emerging countries, FB became "the internet" because very early on, FB cut deals with local cellular telcos to make it so that FB browsing did NOT count towards their bandwidth quotes.

Call someone - costs money.
Call someone via Messenger - free
Visit a company website = cost money
Visit a company's FB page = free
Research a topic = costs money
"Research a topic" on FB = free

Examples: Kenya, Myanmar, Indonesia, Nigeria, South Africa.
Its crazy.

Americans who get their "news" from FB... well, they have no excuse.


 
I'll just leave this here. Click it to adjust your FB feed.

 
Fun fact: in some emerging countries, FB became "the internet" because very early on, FB cut deals with local cellular telcos to make it so that FB browsing did NOT count towards their bandwidth quotes.

Call someone - costs money.
Call someone via Messenger - free
Visit a company website = cost money
Visit a company's FB page = free
Research a topic = costs money
"Research a topic" on FB = free

Examples: Kenya, Myanmar, Indonesia, Nigeria, South Africa.
Its crazy.

Americans who get their "news" from FB... well, they have no excuse.



I legitimately wouldn’t use the internet if those were my circumstances!
 
I've given a bit of thought about whether to keep my W520. On one hand, I've upgraded it a lot (including installing a Core i7 that's faster than anything Lenovo offered from the factory). Since I also have dual 9-cell batteries in good condition, dual AC adapters, and the advanced dock, I could get a lot for selling the whole package.

I'm not actively trying to declutter, but I just don't have a use for older hardware once I move on from using it daily. A few months ago, I considered replacing it with a T580 because that and the T480 are considered to be the best of the classic ThinkPads that still have modern-enough hardware. However, I wouldn't travel with a T580 either because I've had a much newer and lighter laptop for that purpose since 2020.

I was on the TP25 excitement wagon when David Hill released the community polls and production of a "retro" model for the 25th anniversary. Sure, I was a little disappointed at the actual product but the "classic" keyboard on the T470 was still way more than Lenovo would have ever given "the fans" without a huge internal fight waged by Hill, so I bought one right away. Then a couple years later when I saw how relatively easy it was to retrofit its top cover and keyboard with the motherboard and QHD screen of a T480 I picked one up towards the end of 2019 and did the mod. I rarely use the Fn keys so the lack of the "we know it can be done but nobody will spill the beans on HOW to do it" EC mod to fix them doesn't bother me. I use it for work and keep it docked at home, and just upgraded it to a 802.11be this weekend. I probably could stand to replace the batteries in it since they're now 8 years old and I only get a few hours from them now, but it's a true survivor.

I bought my W520 in 2015 to take over for the T61p I bought in 2007 after it started showing early signs of the infamous nvidia defect. It came with the high-gamut FHD panel but it looked so different to me at the time I decided to shoehorn an IPS panel in its place. Is still used daily, and still runs Windows 7... :leftright: :zwall:
 
  • Like
Reactions: Irv
I remember those issues with the T61p. I was glad that the T500 was essentially the same laptop with those issues ironed out.

My experience with chiclet keyboards has been awful between several MacBook Pros for work, my current Eluktronics gaming laptop, and a couple of wireless keyboards I have around the house, one of which is getting replaced by a mechanical, 75% layout keyboard. I've never spent meaningful time with the ThinkPad keyboards that came after the T420 and T/W520, and that's been the primary reason for my ThinkPad drought. I do think the Lenovo chiclets are much better than other brands, but it's still a chiclet. I can't do any serious typing on a keyboard with limited key travel.

Even now, I could get a clean, used TP25 and mod the heck out of it to make it more modern, but when all is said and done, it would still be far behind every other machine I currently use spec-wise, and I'd rather just invest that time into building desktop PCs. These days, I'd rather go for a Framework 16 laptop if it wasn't so expensive for what you get, despite the excellent modularity that essentially justifies the premium.
 
Nvidia learned how much lead-free solder sucks.
I fived a couple of laptops for people I worked with. You have to remove the chip and solder, reball the chip, and solder it back.
You need some special equipment; A reballer for that chip, and a reflow oven.
 
Just something to amuse me at work. I set up a testing sandbox for a project I'm working on. Usually with mock payment data, I just use generic amounts like $10.00 or $11.11, but when I saw the UI for creating a payment link in this platform, I decided to have a little bit of fun with it and now I smile every time I see it.

View attachment 17256

I used to add odd references to work stuff, old sci-fi, monty python, trek,xkcd.
One of our projects was code-named for the Matrix, one for Enterprise, lol. jeffries tube, phase discriminator, and dilithium crystals made it into that design,lol
 
In the nvidia debacle's case it was actually a flaw in the chip itself which was found between the die and substrate/pcb from the nvidia plants, not the bonding between the substrate/pcb and the motherboard/graphics chip.

Here's the (lengthy!) article that I found about it back in 2008.

https://web.archive.org/web/20081102055630/https://www.theinquirer.net/gb/inquirer/news/2008/09/01/why-nvidia-chips-defective said:
This the first part of a series of three articles getting to the nub of Nvidia's graphics chip woes. The series is the result of months of research conducted by diligent INQhack Charlie Demerjian, despite an in-box stuffed full of abuse. Part two can be found here and Part Three is here.

NVIDIA HAS RECENTLY been saying a lot about how it's chips are not bad, and giving people reasons about why the problem is contained.Unfortunately, these disingenuous half-truths don't stand up to an explanation of why this problem is happening.

The problem is extremely complex and defies a simple explanation. It involves multiple poor choices, multiple engineering failures, and likely a few bad accounting choices. This piece could also have been entitled: "More than you ever wanted to know about bumping, and then some: How not to do things". But we will simplify the science and technical details as much as possible to make it accessible, so some things may be oversimplified.

The defective parts appear to make up the entire line-up of Nvidia parts on65nm and 55nm processes, no exceptions. The question is not whether or not these parts are defective, it is simply the failure rates of each line, with field reports on specific parts hitting up to 40 per cent early life failures. This is obviously not acceptable.

The end result of the failures is that bumps crack between the bump and the substrate on a chip, not on the bump to die side. When this happens to a signal bump, game over for the GPU or MCP. What is a bump, die and substrate? Why is it happening? That is a long and technical story.

First, let's start out with some terminology, illustrated here by the lovely and talented Via CN/Nano chip. As you can see, the total package is about the same area as a US quarter. The most important part is the black square at the centre, that is the die, or the silicon chip itself. The green fibreglass-like part around it is the substrate, a complex multi-layered organic material that routes signals from the pads on top to the pins on the bottom, and serves as an attachment point for the die and various passive components. Those are the little silver things around the edges.

The die itself looks a little rough around the edges, but in reality it is very very angular. It has four corners at 90 degree angles, this one being almost square. Some, like the Intel Atom for example, are much more rectangular. The blurry edges are due to a material called the underfill, it looks like glue seeping from the edges, and serves as mechanical support for the die to substrate bonds and a moisture barrier to protect the bumps.

The part you don't see are the bumps, and they are the most critical part. This type of packaging is called flip-chip because the connectors between the die and the substrate are put on the bottom of the die, and it is flipped over onto the package. The connectors are called bumps, and they are literally little balls of solder. A typical chip that is a little more than a centimetre on a side might have over 1000 bumps on it, so spacing is incredibly small and tolerances amazingly tight.

As you can see, the package is about the same height as a quarter as well, so the vertical tolerances are also pretty slim. The bumps act like pins on a normal chip, they carry signals, power and ground to and from the die. They also are the primary attachment mechanism of the die to the substrate. The precision needed to put these things together should not be underestimated.

Those are the biggest players in our little drama, now let's move on to some basic physics and related science. Chips consume power, and in return they give you heat and a few electrons in the right places, occasionally they also give you a flash of light and smoke as well, but few chips do that twice. Heat is not an intended product, it is a consequence, and has to be carried away or bad things happen.

Modern chips consume electricity in an uneven manner, as different parts of the chip use power at different rates. Sometimes parts of the chip are never used at all for a given workload. If you have a modern GPU and don't game or are smart enough to not run Vista, you will likely never touch the transistors that do all the 3D work. Think about it this way, there are hot spots on the chip as well as cold spots, it is uneven and changing constantly.

Related to this is the fact that the chip uses electricity in a non-uniform manner. Parts that are heavily used pull much more current than idle parts, and once again, those parts change over time. Some bumps may pull a lot of Amps, others may pull very few, and this again changes over time and use. The bumps also have a limited current capacity each, too much and they melt or burn out,so there are far more than are strictly needed to supply the chip with power.

The idea is to make sure no one bump will ever reach the maximum current it can handle. This is done by putting in more power bumps on the die in places that use high power than are needed from an average current point of view. If things are done right, no single bump will ever exceed the maximum current it can deliver.

The Nvidiadefective chips use a type of bump called high lead, and are now transitioning to a type called eutectic, see here and here. Eutectic materials have two important properties, they have a low melting point, and all components crystallize at the same temperature. This means they are easier to work with, and form a good solid bond. Eutectic bumps may have lead in them, or they may not, some are gold/tin, other are lead based, it depends on what characteristics you want, and how much you want to pay. It is a property, not a formula.

Most if not all substrates use eutectic pads to attach the bumps to as well. If you use a eutectic pad with a eutectic bump, you get a much better connection than you do if you use a high-lead bump with a eutectic pad. This is reflected in much higher yields, lower assembly costs, and a physically stronger connection as well. At this time, we have no good explanation as to why Nvidia chose to go the high-lead bump on eutectic pad route.

High-lead bumps have a much higher current capacity than eutectic bumps. When power is run through eutectic bumps, you also get an effect called electromigration. This means that some of the materials are essentially pushed around by the current, and you get voids in the bump. These voids lessen the capacity of the bump, and eventually they burn out.

The more current you run through a eutectic bump, the quicker the electromigration. If you keep the current to a reasonable level, the time it takes for this to happen will be so long it isn't worth worrying about. This is why chip vendors say that upping the voltage will shorten the lifespan of parts, it literally does cause them to burn out quicker.

On the good side, eutectic bumps are generally more flexible than high lead. This means they are a bit more forgiving to stress. Some forces that would fracture a lead bump may be absorbed by a eutectic one without problems.

Bumps overall are a multi-dimensional trade-off between cost, assembly yield, current capacity and mechanical resilience among other things. To call it a complex mess is being overly kind, package engineering is not for the faint of heart.

FROM BUMP properties, we move on to thermal expansion of materials, and that is another piece to the puzzle. Most materials expand as they warm up. If you have ever seen a mechanic trying to free a stuck bolt, they usually heat the nut with a blowtorch, this expands the nut and loosens it. The same thing happens with the die and substrate. When you turn on a chip, it heats and expands a little. This expansion is not much, but it is measurable. The substrate also heats and expands.

The problem is that the die gets hot, and heats the substrate secondarily. The silicon on the die has one rate of thermal expansion, the substrate has another, basically they get bigger at different rates. To complicate things further, remember the uneven and changing heating bit above? Parts of the die heat up and expand differently from other parts of the die. This changes quite quickly while things are in use.

The result? The bumps take a lot of stress, and it changes from second to second. This can be very accurately simulated, and you can engineer bump placement at points of lower thermal expansion and therefore lower stress. If you lose a power bump here and there, the chip will very likely survive, but if you lose a signal bump, game over. This is why bump placement is very important.

Engineering what bumps go where is a very complex process, and is done basically when the chip is laid out, near the end of the development process. You don't do it on a whim, you don't make pretty patterns because they are cool, you do it scientifically to minimise the potential for damage.

Getting back to the stress, it is what makes bumps fracture. Think of the old trick of taking a fork and bending it back and forth. It bends several times, then it breaks. The same thing happens to bumps. Heating leads to stress, aka bending, and then it cools and bends back. Eventually this thermal cycling kills chips.

Once again, if you did your engineering right, this won't happen in any timeframe that matters to mere humans, if it takes ten years of on and off switching to make it happen, once a day power cycling won't matter in our lifetimes. Chip makers tend to engineer for timelines like the ten-year horizon, and are pretty safe in assuming it will live for five years of casual use.

If you recall, high-lead bumps are stiffer than eutectic and more prone to stress fractures. The high-lead-to-eutectic substrate bond is also weaker than a eutectic-to-eutectic bond. What is happening to Nvidia is that the substrate to bump joint is cracking, and the chips die. High lead bumps are a poor choice to use in this application.

One other bit to bring into the mix is underfill. If things were as simple as heat leads to cracking, no chips would work for any length of time. Underfill not only protects the bumps from moisture and contamination, but it also provides mechanical support as well. It is designed to take some of the stress that the bumps take, making them live longer.

Underfill can range from rock hard to soft and squishy, it depends on your application. The harder the underfill, the more mechanical support it provides, and the less stress the bumps take. Simple enough.

That brings us to another material, the Polyamide layer. When chips went to a low-K dielectric material, which is not the same as the high-Kgate material, it proved a problem with packaging, bumps and underfill. The solution was to put a polyamide layer, sometimes called a stress layer, to cover the bottom of the chip. This prevents contamination and mechanical damage.

If you pick an underfill that is too soft, it doesn't provide you enough mechanical support for the bumps, they crack and your chip dies and early death. Pick one that is too hard and it rips the polyamide layer off. In the words of one packaging engineer talked to for this article, if you used too hard of an underfill, the chip "wouldn't survive the first heat cycle". The magic is in the middle, you have to pick a bowl of porridge, er, underfill, that is strong enough to provide the support you need, but not so strong as to rip layers off your chip. Like we said, package engineering is not for the faint of heart, but it can make baby bear happy.

That brings us to the billion dollar question, why not simply change bump types to eutectic if they are that much better, which they are, in some ways. The answer is in the current capacity, more specifically average current capacity. We mentioned this earlier, and the idea ties into the hot spots and functional units.

If you take a hypothetical simple CPU that has an integer and floating point units. If you are doing heavy int. work, the power bumps that supply that part of the chip will be loaded heavily and the FP bumps will not be doing much of anything at all. When FP load gets heavy, the opposite happen.

The layout of the bumps is designed so that neither set will be overloaded at peak times, and in fact won't get all that close to their maximum. To use completely made up numbers, take a bump has a peak capacity of 1000mA, and for longevity you don't want to exceed 800mA, basically a 20 per cent safety margin.

If the chip TDP divided by the number of bumps, IE the average current per bump is 200mA, there are likely many bumps drawing 100mA and a few under loaded areas that draw 600mA. This draw moves around with the work the chip is doing. Some may never break 100mA, others may be at 600mA for their entire lives. All are well below the 800mA average, much less the 1000mA max.

The problem with eutectic bumps is that they have a lower current capacity, and the closer you get to it, the worse the problem of electromigration becomes. Lets pick a hypothetical eutectic bump that has a peak capacity of 500mA and the same 20 per cent safety margin, 400mA max for long life. If Nvidia wants to swap in eutectic bumps for the high lead they are using, there is a slight problem, they are well over the current capacity of the new bumps.

If the chip actually powers up without letting the smoke out, the first time you fire up a massive game of Telengard, it will most assuredly go pop. In the rare case of that the gods of luck are staring right at you and the thing doesn't fry immediately, electromigration will ensure it has the lifespan of a mayfly, basically worse than the current crop of defective Nvidia chips.

What do you do? You can either cut the power used by the GPU way way down, ie, clock it at a point where no one would ever buy it, or rearrange where the bumps go. The rearrangement is not a trivial thing, and may require moving large parts of the chip around, basically a partial relayout. This is expensive, time consuming, and likely can't be done and validated in the time the chip is on sale for.

The other option is basically just as bad, you need a power plane or power grid on the die. This is a metal layer that distributes power across the die, and it means adding a layer to the chip. That means expense, slightly lowery ield, and can have other detrimental effects to power draw and clocking.

All of these things can be dealt with if you see this coming when you start making the GPU. It is pretty painfully obvious that Nvidia didn't, otherwise they wouldn't have used high lead bumps and gotten into the hole that they are in. They have switched to eutectic bumps, but given the way it is being done, and the supplier grumbles we are hearing, it appears to be very poorly planned. It will be interesting to see the lifespan of these new parts.

Here's part 2.

https://web.archive.org/web/20081217131059/http://www.theinquirer.net/inquirer/flame_author/947/1013947/nvidia-should-defective-chips said:
This the second part of a series of three articles getting to the nub of Nvidia's graphics chip woes. The series is the result of months of research conducted by diligent INQhack Charlie Demerjian, despite an in-box stuffed full of abuse. Part One can be found here and Part Three is here.


GETTING BACK to the underfill, this is probably the key to the problem. There is one more property of underfill called the glassification temperature, Tg for short. Tg is not melting, it is more the temp that is goes soft and looses most of it's structural rigidity. The underfill that Nvidia used, Namics 8439-1 is what's called a low Tg material, and the Hitachi 3730 has a higher Tg.


To be fair to Nvidia, about the time when the G84 and G86s were hitting the market, high Tg underfills were pretty rare and new to the market. Low Tg underfills, such as the Namics material that NV used had been available for a while, and were 'known'. The last thing you want to do is put a high risk part on a new and market untested material, so it looks like they went with the safe choice, low Tg.


If Nvidia did their homework right, the Tg of the material should never be hit, the chip should always run below that temp, and the underfill should provide the mechanical support needed to keep the high lead bumps from fracturing. This is why you engineer, test, retest, simulate, pray a lot, and pick your materials very carefully.


namics_temp_vs_strength_small



Namics 8439-1 underfill temp vs strength curve


Here is the Tg curve for Namics 8439-1. Let us be the first to say there appears to be nothing, repeat, nothing wrong with this material, it does exactly what it says it does. It starts to lose strength at about 60C and by a little over 80C it has 100 times less rigidity. Think going from hard plastic to jello. What temps do GPUs run at again? What is the Tj (transistor junction temperature) for them? Ooops. Big hundreds of millions of dollar ooopsie right here.


So, the failure chain happens like this. NV for some unfathomable reason decides to design their chips for high lead bumps, something that was likely decided at the layout phase or before because the bump placement is closely tied to the floorplan. At this point, they are basically stuck with the bump type they chose for the life of the chip.


The next choice was the underfill materials, and again, they chose the known low Tg part that had far less tolerances than the newer to the market high Tg materials. It was a risk vs risk proposition, likely with a lot of cost differences as well. They chose wrong, very wrong. The stiffness of the Namics material might be perfect below the Tg, but once you hit it, it is almost like it isn't there, and the stress transfers to the bumps while they are hot and weak.


Fanbois will cry that their $.23 temp sensor is reading much lower temps than that, so there is no way this could be an issue. Well, the temp sensors on many cards are not on die, much less between the die and the substrate. They are also cheap and notoriously inaccurate. To top it off, they only measure average temp across the chip, not hot and cold spots. If you look at the IR photo in the previous part of this story, you can see that if you move the sensor from the right side to the left, you will get vastly differing readings. In this case, a real current chip, it will vary by as much as 30C depending on placement.


Many people also don't realize that it is easier for heat to travel down through the pins, they are mini-heat pipes, than it is to cross the three thermal barriers (die -> thermal paste -> heat spreader -> thermal paste -> heatsink) to the heatsink. That means those little bumps take a huge thermal pounding, and are usually hotter than the surface of the heat spreader.


To make matters worse, the bumps that are under the hot spots get hotter still. Piling on the pain, they carry the most current, and the hotter things get, the more heat they generate, and the more resistance they usually have.


Could it get worse? Of course it could. Remember thermal stress? The expansion is highest at the point, wait for it, that is hottest. That would be under the hot spots, and it puts the most stress on the bumps that are weakest.


This is why you have to pick your underfill very carefully, you have to relieve as much stress as you can from the bumps. Too little and they go snap, and the chip dies. Too much and you pull the polyimide layer off and the chip dies. Basically, you go as stiff as you dare, then test the hell out of it under the conditions your simulations tell you will be present. Test, test, test, test or dies die.


When the underfill glassifies, it means you are at the hottest point on the die, the bumps that it is protecting are under the most heat, pulling the most current, and under the most thermal stress. If the underfill essentially turns to jello, it is very bad. If you compound that by using bumps that bond poorly to the substrate, it makes things worse. If those bumps are stiffer than the other option, it is worse yet.


Let's go down the checklist for Nvidia. High thermal load? Check. Unforgiving high lead bumps. Check. Eutectic pads? Check. Low Tg underfill? Check. Hot spots that exceed the underfill Tg? Check. If you are thinking this looks bad, you are right, expensive too.


If it was just as simple as the underfill glassifying, the parts would have never made it to market. It is much more complex than that. The problem with thermal stress is that it is somewhat additive, it weakens parts long before they actually break unless it is quite extreme.


An example of extreme thermal stress would be to take a glass cup, preferably non-tempered, and put it in the oven on max. Pull it out and drop it in a bucket of ice water, and voila, instant thermal stress demonstration. Wear eye protection. The thermal stress that the bumps see is much more like the fork example earlier, it gets weaker and weaker with each bend, until snap, black screen.


If you recall, the Nvidia parts are breaking at the bump to substrate connection. This is the weakest point in the chain, and it is where they made the worst possible materials choice. It is not really a surprise that it failed. It is simply shoddy engineering.


So, what can be done by Nvidia at this point? Well, changing to high Tg underfills is a start, as is changing to eutectic bumps. The high Tg underfill option has come down in risk substantially since the G84 and G86 parts were introduced, so that is a no-brainer, and guess what Nvidia did to the G86? And the G92 as well.


The problem of changing bump types is far thornier, but Nvidia is doing that as well. From the intelligence we have been able to gather, Nvidia has not made any power distribution changes to the parts, there is no power grid, no power plane, or no anything to protect the eutectic bumps from electromigration. They may be able to keep them under their current capacity, but by how much?


This is emblematic of the 'pants are on fire' school of engineering, and reports from inside Nvidia confirm that they are in full panic mode over this snafu. With short time horizons to fix a massive batch of defective parts, reliability engineering usually draws the short stick.

Here's the last part.

https://web.archive.org/web/20081217161955/http://www.theinquirer.net/inquirer/news/374/1036374/nv-should said:
This is the Third and final part of a series of three articles getting to the nub of Nvidia's graphics chip woes. The series is the result of months of research conducted by diligent INQhack Charlie Demerjian, despite an in-box stuffed full of abuse. PartOne can be found here and PartTwo is here.


SOURCES CLOSE to Dell say they knew about the problem a year ago, and HP is on record as being aware in November, so there has been about a year to characterise the problem, design a solution and test it. Multiple sources involved with package engineering tell us that this is not nearly enough time to do a proper test regime, much less long-term reliability studies.


This new package and materials set does not appear to have been nearly as carefully vetted as it should have been. It may work but, then again, it may not. If the lack of power distribution changes is accurate, we may very well be reading about Nvidia Defective Chipsgate II in a couple of years.


How widespread is the problem? We told you about G84and G86s as well as G92and G94s. From the materials side, it appears that all non-Rand non-F lot numbered parts made on the 65nm and 55nm processes are defective. The flaw is a downright idiotic choice of multiple materials coupled with poor chip design and inadequate testing. It is a case of errors compounding errors. They are all defective.


If this is the case, why aren't we seeing more defective desktop parts? That one is easy... thermal stress. It has two components that lead to a bump fracturing, the amount of the stress, that is the hot cold temperature delta, and the number of times the part is powered up and down, that is the heat cycle. Glass cups in the oven would be the amount of stress, the bended fork would be the number of cycles.


If you remember back to the Nvidia 8-K where they announced that "...customer use patterns are contributing factors." By customer usage patterns, they are referring mainly to thermal cycles, but you could also credit them with meaning high temperatures while the GPU is being pushed hard in gaming and the like.


Desktop systems are usually turned on once a day or so. Some people leave them on for weeks at a time, others may turn then on and off a few times in a day. The average desktop probably has about one heat cycle a day.


Laptops on the other hand are woken up and put to sleep many times a day. If you take a typical student who wakes up, checks his email, goes to three classes takes notes, goes to a coffee shop for a bit, goes home, watches a video or two, then goes to sleep, it is not hard to make a case for 10 or more power cycles a day. Every wake up/sleep or hibernate cycle is a heat cycle, so dozens are not out of the question.


The more cycles you put on it, and the more severe they are, the quicker these defective parts will die. A good way to look at it is to assign the lifespan of each critical bump an amount of stress it can take before it cracks. Lets call this number 100AU for Arbitrary Units. If a power on cycle is worth 4AU, and a hardcore gaming session with the CPU OCd to within 1MHz of it crashing is worth 15, you can figure out when it should die. Remember, these are hypothetical numbers... the theory is the point.


When Dell, HP and others announce a BIOS 'fix', the reason it is so humorous is that all they are doing is lowering the amount of thermal stress on the chips when the fan would not normally be on. When the fan is going full tilt without the 'fix', the new 'updated thermal profiles' won't make a difference. When the fans are normally off or on low, the profiles will essentially lessen the stress from a four to a three. It is just there to allow the laptop to live through the warranty period so the companies don't have to pay for the fix. After that, if the defective chips burn out, it isn't their problem. The 'fix' doesn't fix anything at all.


In the end, it comes down to Nvidia screwing up badly on package engineering and testing, then trying as best they can to bury the problem while passing the buck. It appears that every Nvidia 65nm and 55nm part with high lead bumps and/or low Tg underfill are defective, it is just a question of how defective they are, and when they will die.


As far as we are able to tell, contrary to Nvidia's vague statements blaming suppliers, there are no materials defects at work here. Every material they used lived up to the claimed specs, and every material they used would have done the job while kept within the advertised parameters. Nvidia's engineering failures put overdue stress on the parts, and several failures compounded to make two generations of defective parts. The suppliers and subcontractors did exactly what they were told, Nvidia just told them to do the wrong thing.


When it started talking about this, Nvidia failedcrisis management 101, and the coverup shows it doesn't care about consumers, just its bottom line. NV is doing exactly the wrong thing for the wrong reasons, and the lawyers circling with class action paperwork in hand are going to eat them alive.


The last time you had such a huge batch of defective GPUs, the company that did it swore up and down – just like Nvidia – that there was no problem despite forums filled with evidence to the contrary.


A few weeks later, they turned around and admitted there was a problem, and took a $1.1 Billion charge, placating customers and fending off lawsuits.


You know that as the Xbox360 Red Ring of Death.


I wonder why Nvidia can't be that smart?
 
I remember this well; in 2008 I had just taken Siemens Rohs compliant. Nvidia was already.



Yes, NVIDIA products were compliant with Restriction of Hazardous Substances (RoHS) directives by 2008
, largely due to the industry-wide shift toward lead-free solder, which became mandatory in many regions around that time.

However, this transition is significant in NVIDIA's history because the move to RoHS-compliant, lead-free solder coincided with the "bumpgate" scandal, where many 2007-2008 era GPUs (particularly mobile GeForce 8-series) experienced high failure rates.

Key context regarding NVIDIA and RoHS in 2008:

  • Manufacturing Changes: By 2008, NVIDIA was using lead-free solder to meet environmental standards, which required higher temperatures during manufacturing.
  • "Bumpgate" Failure: The new lead-free materials, combined with a faulty packaging material compound used by NVIDIA, caused the solder bumps connecting the chip to its package to fail, leading to widespread premature GPU failures.
  • Resulting Litigation: In September 2008, NVIDIA was sued regarding these faulty notebook GPUs.
  • Financial Impact: In July 2008, NVIDIA took a $200 million charge to cover costs associated with these GPU failures.
While NVIDIA was RoHS compliant in 2008, the material changes involved in that compliance were a contributing factor to the high-profile failure of their graphics chips during that period.

I was learning how to do these type BGA mounted chips.
They sit on solderballs. Lead-free solderballs with <.03% silver tend to crack.
The fix for their problem was to add more silver.
What I did when I resoldered chips was not the die connection; I used lead solderballs to resolder the substrate to the laptop mobo.
I've never resoldered a die.










  • NV bumpgate lead-free solder debacle - Page 2 \ VOGONS
    Jul 15, 2015 — And there's the old adage of "Never attribute to malice that which is adequately explained by stupidity", so no one knows for sure...

    VOGONS


  • NVIDIA Updates Chip Package Materials, 55nm GPUs subject ...
    Aug 30, 2008 — Currently (2008), however, there are lead-free solders that would work, but the lead free solders are more expensive than lead bas...

    TechPowerUp





 

Similar threads

Back
Top