Low-pass filters block higher frequencies and allow lower frequencies through. Averaging is a type of low-pass filter, because it smooths out the sudden changes in the signal [Note: A simple averaging won't necessarily produce the best results. I'm certainly not an expert here but I think you want what's known as a "half-band filter"]. It's not 'bytes' you're dealing with, but 'samples'. A sample might happen to be a byte, or it might be 16-bit (two bytes), or more. Regardless of bit-depth, it's just numbers in the end.
So let's say you have analog audio of music playing, and you sampled this at 10 kHz. You now an array of samples. If the only thing you did was naively reinterpret this sampled data as being sampled at 5 kHz instead of 10 kHz, that would mean that each sample is interpreted as being more spaced out in time, and the audio playback would be twice as long and the pitch would be decreased by half (one octave).
A low-pass filter in combination with downsampling is needed to prevent aliasing. Let's say you have an audio file sampled at 40 kHz. This means that the highest frequency you can have in the signal is 20 kHz (see: nyquist frequency). But if you downsampled this in half, then the highest frequency you can have is 10 kHz. But what if the original signal has a frequency of 15 kHz in it somewhere? This frequency will be
aliased as a lower frequency, and this will probably be audibly heard as some sort of distortion.
We don't want this, so that means we need to remove some samples. This is why you should first run the signal through a low-pass filter to get rid of the frequencies that will be above the new, lower nyquist frequency, then you can downsample (decimate) the signal and reinterpret it as being at the new, lower sampling frequency.
https://www.izotope.com/en/learn/digital-audio-basics-sample-rate-and-bit-depth.html
https://zone.ni.com/reference/en-XX/help/371325F-01/lvdfdtconcepts/dfd_decimation/