I found Solid state electronics particularly hard to get a handle on, but you have to treat it like the Romans would have.
Divide and conquer.
A discrete transistor is a special case of the solid state chip. It only has one transistor. Unlike a vacuum triode, you will be hard pressed to make a transistor at home. The vacuum triode was a British invention, and it is not surprising that it is better understood in Britain than the American transistor. Personally I think it is a sad reflection on our nation that if you can't make one in a shed then it can't be a British invention. It does seem to me that it is the case – I digress.
There are layers of thinking in a typical IC and you have to separate them and understand them individually.
At the lowest level you have the transistors and other passive circuit elements. One layer up you have the circuit techniques, things like current mirrors, differential amplifiers and logic cells. Two layers up is functional objects like whole memories, analogue signal conditioning, analogue to digital conversion and arithmetic logic unit functions. Typically these functions are glued together and sequenced with a special kind of functional object, the state machine. Where the state machine is externally programmable, the world of software dawns.
Together, these functions can be seen as modern wealth of VLSI architectures, video codecs, baseband digital tx/rx chipsets and so on.
Just try to deal with the one transistor first. There are different types. FET's aren't very different from thermionic valves. In a similar way that the grid potential intercepts charges in the vacuum triode, biasing the gate voltage of an FET varies the availability of charge carriers in the channel of the FET, by attraction/repulsion of an electric field.
Bipolar transistors are even easier to think about if you don't care about the accuracy of your model. If a semiconductor diode is a non return valve, then a transistor is a non return valve where the flow in the pilot controls the possibility for flow in the main valve. In both cases the most obvious effect is force needed to push the spring behind the valve and lift it from the seat.
Mathematically the way to predict transistor behaviour is through a model something like Ebers-Moll. Although it is a good model, actual transistors vary wildly. The model will never predict any particular transistor very well. By looking at the maths for the model you will see why quite quickly. One parameter is very small, but in an exponential relationship tiny changes dominate the behaviour.
Very few people have a need to model individual transistors. They naturally vary so much, there is no point. As has previously been described by others, you aggregate them and bring them under control in feedback loops. In the feedback loops passive components, which can be made accurately, tame and linearise the transistor parasitic effects.
Single transistors are good for power handling, but they usually have poor gain. Small transistors consume power in their own right, but can be much faster. You can always get enough gain by cascading amplifier stages. The only way to make higher frequency circuits is to use smaller transistors. Very small transistors are hard to package individually. High frequency circuits usually are packaged into VLSI chips with other parts.
High frequency discrete transistors can be made, but they are usually made in more exotic materials like Gallium Arsenide, where higher electron mobilities can be exploited.
New fields include superconducting technologies where electromagnetism can be used to control conductivity. Indeed, in some cases new types of sensor are possible using exotic physical techniques (Yttrium barium copper oxide for SQUIDS and high temp superconductors, Camium Telluride for x-ray detectors, and Indium Phosphide for millimeter wave.)
If you need high power and high frequency, it might just be that vacuum tube technology is what you actually need. In the end the Tevatron and LHC are just enormous vacuum tubes. I'm pretty sure they are still used fairly widely in high power applications like broadcast transmitters, industrial RF heaters and nuclear research.
Not only is it the case that no-one knows it all, no-one can think about all the things they know at once.
It's just divide and conquer, and that is all.