Hello,
there's no limit in code size with the bare metal approach -- not from the compiler's side. You can run code directly from SDRAM with the bare metal code - provided, that you have configured the SDRAM controller (EBIU) correctly. However, as Howard implied, it will run slow without cache.
If you can waste a few 100k, I'd recommend starting your programs from u-boot. That already does a lot of cache configuration and also allows easy updates and flash programming. No more worries about writing lots of bootstrap code, either.
For the float issue: The libm and the GCC internal libs for Blackfin take care of that, if you use floats in your code, the FPU emulation will be silently linked in
This will of course kill some performance, so if you want to do numerically efficient stuff, you might want to look at all the code using fixed point formats (libbfdsp, etc.)
A comment about the newlib: In my opinion this beast is quite bloated and not really nice for embedded stuff. Even using a simple printf would eat up all L1 memory in my little test app, so I went for a somewhat selective approach, rewriting part of the functions. See
http://www.section5.ch/forum/viewtopic.php?p=133#133 for a link to the shell.tgz code - it demonstrates how to merge "bare metal" with some basic newlib functionality.
Since the standalone support is a bit of an orphaned thing in the uClinux community, you'll get no "perfect" board supply package at this moment, but I guess the shell code should get you going. Just make sure you compile in boards/SRV1.
Hope that answers some,
- Martin