The foundations of physics (that I can practically guarantee will gain greater recognition over time) is in information theory. The universe is no different physically than a giant computer (ignoring consciousness which is a different animal entirely).
If you have two detectors which each have an even chance of detecting a photon, then we theoretically can extract one binary value of information for each photon detected (assuming no other predictable correlations are already known and that no external additional references for time are used etc. ... just a bare recording of the sequence of detections for two uncorrelated photon detectors with identical rates).
So if we detect 2 photons, we could record this as two (binary) bits of information. We can do an exhaustive calculation of the raw information content by listing all possible combinations and then summing the probability of each possibility along with the information content each provides.
http://en.wikipedia.org/wiki/Information_entropyWe'll do a quick calculation for 2 bits being detected by 2 detectors, A and B:
AA (prob=1/4)
AB (prob=1/4)
BA (prob=1/4)
BB (prob=1/4)
The average information provided by these 2 detections is the sum of:
-log2(prob)*prob (for every combination)
-log2(1/4)*1/4-log2(1/4)*1/4-log2(1/4)*1/4-log2(1/4)*1/4=2*1/4*4=2 bits of information, as expected
Now if we instead take a "macroscopic" view and just sum the number of events detected at either A or B, we get 3 possible results:
A=0 B=2 (prob=1/4)
A=1 B=1 (prob=1/4+1/4=1/2)
A=2 B=0 (prob=1/4)
Now we calculate how much information we can extract out of this form of measurement and get:
-log2(1/4)*1/4-log2(1/2)*1/2-log2(1/4)*1/4=2*1/4+1*1/2+2*1/4=1 1/2 bits of information
Hmmm... we lost half a bit of information, on average, by grouping two detections together. Where did it go?
Well half the time we find that both the A and B detectors saw a photon and when this occurs and we remove the ordering of the events (which is a single bit of information in order to detect whether A came first or B came first), we find we have a 50% chance of losing a single bit of information.
Now when we move on to extremely large scales, this loss of information becomes much greater. For 1000 photons, we should theoretically have a raw information content between these two detectors of 1000 binary values, but instead we end up with a gaussian window of probabilities centered around 500 photons for each detector. I'll just make a rough estimate and say the variation for every 1000 photons is around 500^.5=+/-22 photons in each detector. This would roughly equate to log2(44)~= 5 1/2 bits of information.
So we theoretically begain with around 1000 bits and ended up "seeing" only a little over 5 bits of information. The rest was "wasted" creating a pseudo wave function of probabilities that would mask whatever fundamental processes might exist on that scale (not that we'd have an easy way of extracting them, but using the wave function is unnecessarily restrictive). Now I recognize this doesn't directly correlate to the wave function of a photon, but it's similar.
Now what did we gain by using this block-of-1000 photons interpretation? Well we can now measure up to 1001 different variations of intensities between the two detectors (with a peak statistical deviation of ~22 counts in the center). We've added the equivalent of a memory by counting photons (memory creates space) and we can imagine something close to a smooth continuum of intensities existing between the two extremes (if you take this to an extreme you can imagine space to be perfectly smooth featureless and infinitely indivisible - a common fallacy in physics).
But in order to gain this, we've lost around 2^1000 potential relationships (in decimal that's around a 300 digit value) that could exist on a finer scale in the data. Analyzing this in detail for fine scale features or novel forces that could exist would very likely be beyond most anything the best supercomputers could currently handle, but we wouldn't necessarily need to analyze every bit if larger scale features were more obvious.
The above doesn't directly relate to the wave function of photons, but consider that if we already interprete mass as having a wave function, then photons don't necessarily need to also possess a wave function as we can't see photons without their observations being effectively "filtered" by mass anyway, so it's redundant to assign those features to both mass and photons (and again, a single photon measurement doesn't possess a wave anyway).
Now, as a more direct comment on Good Elf's post, if we go back to a raw information content in photon detections and look at how these can be interpreted as macroscopic measurements, we find that we have to ascribe meanings to the measurements (we don't have innate human senses for individual quantum events and so no innate understanding of where and when they occur).
Let's say that we again have two detectors and we're trying to measure what we assume is a trajectory for a beam of light/photons through space. If we see more photons being detected by one detector, we'll assume the beam is more closely with that detector than the other.
So we begin to take measurements for 1000 photons and find that 997 photons hit detector A and only 3 hit B. Now we know that this beam is aligned strongly with detector A and not B ... but here's the question ... where is detector A? We can just look at it with human eyes and say it's "there", but that tells us little about how human senses determine where an object is and the experiment wouldn't be very repeatable ("there" doesn't provide a high information content

).
So we need physically reproducible ways of determining where the locations of the emitter and detectors etc. are and if we're using physics, then our fundamental metric for distances and locations in space are light speed delays. This requires a clock ... but how accurate is a clock? If we're going to measure things using wavelengths of some specific frequency of light, then how can we count wavelengths without a very abundant source of photons to assure we've completely over sampled the wave function? (In other words, if we only had 3 photon detections to work with, we'd be woefully inadequate in information to determine whether or not we're at precisely 1000 wavelengths of that specific frequency of light or not ... 3 photon detections isn't even enough to determine reliably whether or not we've even moved a single wavelength or not). Clock delays are also statistical - you can't determine precisely how long a delay is when you can't precisely determine when you emitted a photon.
If you're going to try to use the photon detections themselves to simultaneously localize the detectors and the beam alignment then you have the equivalent of the Uncertainty Principle in whatever aprior assumptions you make as to how significant these measurements are in determining each of these positions (in other words, if you decide half the information is going to determine the location of the detectors, then you're "losing" half the information regarding the beam alignment .... fundamentally you've just extracted correlation values and you trade off in many ways how significant the correlation is considered to be for measuring each of these positions)
Basically we can go down the list and find that basically every single attachment between an intuitive "human scale" view of the universe and the quantum scale is created via. large scale statistical measurements (that are inherently noisy anyway ... people have claimed to measure the speed of light to X billion parts accuracy yet even the velocity of a single photon of light remains an unknown ... in other words, they've just measured a large enough statistical sample to find that they can get a reproducible delay if someone else wanted to reconstruct the experiment ... ok, yes, I'm being overly critical but it's amazing the amount of ignorance present in many people who've been overly indoctrinated in current dogma and how little is truly understood is very often overlooked)
Now here are some of my more personal views of how these two realms are merged:
How does the wavefunction arise in matter? If we take a purely statistical measurement of the mean and standard deviation for a collection of sample in an area that are entirely chaotically localized, we should measure a pure gaussian of the form e^(-ad^2). But we can also generate this from sinusoidal processess of the form e^i(-ad^2), so a superposition of both complex sinusoidal oscillations as well as purely gaussian characteristics can exist in space. The chaotic gaussian spread is responsible for the gravitational force and the complex sinusoidal is responsible for EMF. The chaotic gaussian field of measurements responsible for detection of mass diffuses over time and expands, this gives rise to the Big Bang (gravity is an expansive force, not an invisible attractive one), the fast inflationary period (every location in space is effectively experiencing an expansion at the rate of the initial conditions of the Big Bang), time dilation (objects moving primarily in one direction have little ability to diffuse or "age" laterally), red shifting (the size of masses doesn't remain constant), Hawking radiation (gravity doesn't "suck" anything in to it - a black hole would be expanding chaotically at light speed outward and you wouldn't fall in to one, it would come out and hit you instead), gravity (objects diffuse and merge over time), "warped" space, and a statistically constant light speed motion through "spacetime" (gravity can be better understood as a diffusion to space that spreads outward - the Earth is pushing you upward against space diffusing downward and though this background pressure is much larger than gravity alone, just like we're surrounded by ~15 PSI atmospheric pressure at sea level, we're not crushed by it because it's rather evenly diffused.)
Part of the problem with assuming a constant light speed velocity in a vacuum is that this isn't directly verifiable except "locally" (though the concept is still a valuable tool in constructing metrics for spacial distances and hence relative positions) which, for mass, means that information needs to be integrated and correlated at less than light speed at some point in space and whatever fundamental light speed "fabric" of space that should exist for light to travel through is only indirectly measureable (for example, it's theoretically not impossible that an observer could be entirely stationary and simply interpreting motion indirectly via. photons ... that isn't specific to a light speed space but having light speed always faster than a mass can travel makes this problem become more obvious)
Yes, I've rambled a lot but I hope this gives a better picture of the link between quantum/discrete and the more intuitive macroscale/Euclidean/continuum view of space. There's more to this as, for example, atomic properties arise from quantum mechanics and aren't purely derived from macroscale statistical measurements. I can give some ideas of how inertia and kinetic energy arise but still don't it down to the point where it's naturally derived from chaos and information theory. The big question to me is over how conscious memories are formed and how relationships between information are physically/consciously correlated - basically I see the relationship as memory constructs space and memory, in order to extract useful predictive relationships must determine patterns that can be seen to repeat (knowledge is gained of the past and useful in predicting the future). You should be able to break down all possible predictable relationships as cycles of various lengths and in a constant light speed space, this ends up creating the equivalent of geometrical relationships. Whether or not two such relationships appear to have correlations depends upon whether or not they have relatively prime features between them or not. Ideally a relationship should be extractable from data alone that automatically implies the functions or computations that should be performed on it (in software terms you shouldn't need an explicit delineation between code and data - it's all just a program and the "instruction set" should be naturally derivable with little of any aprior assumptions made). So my general view is to take raw symbolic information, without any explicit meaning, extract a natural form of association that acts like a network of relationships between these - this gives the equivalent of a memory that constructs space and the specific relationships between these objects constructed should end up being extractable via. the relative lengths of their pathways and this forms a system of mathematical and logical relationships that give rise to physics, of which the larger scale characteristics are only statistically measured as gravity and time dilation etc. and the smaller integer relationships are seen as quantum properties with discrete, non-continuous properties.
Ok, I've been entirely too abstract, verbose and haven't given enough mathematics behind this but that's primarily because 1) when I do give mathematical examples, either people lose interest 2) they get confused in the details and/or 3) I don't have all the pieces of the puzzle mathematically fit together ... yet at least (the biggest ones are over where in the chain atomic properties arise and another is how inertial information is retained ... these both appear related to the wave features of space and I'm working on trying to find a way to have effectively an infinite number of dimensions, each associated with an integer that represents a wavelength, giving the equivalent of resonant filters that determine where an object is located in space ... inertia would be the detection of phase shifts between two dimensions in this case and atomic and quantum properties should be associated with dimensions with small numeric values - every object in the universe should be possible to be described by a single number under this view, that could be factored into its prime component, though that composite number would be huge for most anything larger than simple chemical compounds but whether or not two objects can interact should be determinable by whether or not they share any prime factors that allow them to interact coherently in the same space)