A million little lines

Joe Gregorio

Cliff Click, chief JVM architect at Azul Systems as quoted in InfoWorld:

As your program grows in size, the lack of strong typing basically kills your ability to handle a very large program and so you don't find the million-line Perl program

That line in particular has elicited some reactions, including this from chromatic:

Second, the reason that there aren’t many million-line Perl programs is that the people who are capable of writing and managing million-line Perl programs have better ways to organize their projects than glomming a million lines of Java into a single shared-everything instance. That’s setting aside the qualities of encapsulation and abstraction that Java-the-language doesn’t have, preferring instead to push that problem to tool vendors and AbstractFactoryFactoryInjectors which consume vast swaths of XML to get around Java’s static code fetish. I can only imagine how much larger the Java code would be without all of those XML files.

I was always baffled with Java folks love affair with XML until I realized it was just a crutch to make up for a lack of map and array literals in the language.

Another quote from the InfoWorld article:

Java is not the slowpoke of old days and performance now matches or exceeds applications developed in C

I found that interesting that only 13 years after being released Java is finally on par with C performance, particularly given that some people believe that a garbage collected language should be faster than a non-garbage collected language.

The advantage of a statically typed language with a simple grammar (Java yes, C++ no) is that there are many development tools available for:
  • refactoring
  • detecting possible bugs or problems
  • code style checks
Good examples for Java are Eclipse, FindBugs and CheckStyle. This is helpful especially for the larger projects mentioned above with many developers working on it.

Posted by Alexander Klimetschek on 2008-04-01

Alexander,

Sorry, static typing isn't required for any of those things. You seem unaware that the capabilities in Eclipse not only appeared first for non-statically typed languages such as VB and Smalltalk, but also unaware that Eclipse was written by former Smalltalk developers.

This is helpful especially for the larger projects mentioned above with many developers working on it.

That so totally misses the point.

Posted by Joe on 2008-04-01

I'd love to hear why Cliff believes a long program is a sign of a good language. I was taught that long sections of code was a bad code smell. I'm unsure why it's something to be bragged about.

BTW, I'm not sure if your last sentence is sarcasm or not (my sarcasm parser is throwing warnings, but I'm ignoring them).

Posted by Josh Peters on 2008-04-01

Arguing over whether Java or Perl is better is like getting in a debate over whether it's better to eat dog or cat feces. Even worse is trying to debate about static vs. dynamic typing using what are possibly the worst examples of both (ok, C++ vs. PHP would be worse, but not much). You shouldn't be allowed to say the word "type" in public until you've learned a language with decent type inference (ML, Haskell) and a strongly, dynamically typed language (Smalltalk, Lisp).

Posted by Jeff on 2008-04-01

Joe,

Sorry, static typing isn't required for any of those things.

Then tell me tools for Ruby and co. that provide those features. Sure, things can be much quicker during development with a scripting language, but there are certain kinds of problems arising in larger systems that are almost a non-issue with Java but get really ugly with dynamic languages.

That so totally misses the point.

How? Growing complexity needs systematic approaches (to be very abstract). I do like scripting languages for small things (eg. cli tools or templating languages) because that is where they really shine. But a fully dynamic language is much harder to handle in a larger project. See the prototype javascript library for example, which is cool and short in the first place but really messes things up with Javascript's core objects and because of that often does not work with other Javascript libraries. Furthermore I think people using lots of XML for configuration and component wiring in their Java frameworks do it the wrong way. Which is not due to the Java language, but due to the simple fact, that out of thousand developers there is only a handful with excellence. And you see even more of the other 900+ guys in an industry-standard language, cause they tend to work for the big companies, whereas the bleeding edge front has way more smart guys.

I found that interesting that only 13 years after being released Java is finally on par with C performance

It was faster before you heard about it (or even before you wrote about that you heard it in your blog). Java is fast since version 1.3, which was released in 2000 (ca. 5 years after 1.0). And since 1.4 it's even better (2002 = 7 years). Ruby is as old as Java, but still is slow. The new YARV virtual machine introduced in Ruby 1.9 tries, but still fails (up to 15% faster does not match Java's speed in comparison). Don't know about Perl and Python, but I am personally not a fan of them...

Posted by Alexander Klimetschek on 2008-04-01

Jeff,

You shouldn't be allowed to say the word "type" in public until you've learned a language with decent type inference (ML, Haskell) and a strongly, dynamically typed language (Smalltalk, Lisp).

It's true that those languages are great. And maybe a lot better than Java. But there is also the fact that not-so-smart people have to be able to use them. Smalltalk is cool, but it is simply to complicated for most developers (not talking about Lisp or Haskell here...). That's why it didn't become as popular.

Posted by Alexander Klimetschek on 2008-04-01

Ah, yes, I had almost forgotten that there are stupid programmers (despite me spending pretty much every workday of my life cleaning up after them). I guess there is a purpose for Java, after all.

In other news, the price of tea in China remains stable.

Posted by Jeff on 2008-04-01

You have obviously not faced problems where a GOFFowlerUOWPaginatorGobbledyGookFactoryProcessor was passed instead of the GOFFowlerUOWPaginatorGoddledyGookFactoryProcessor object and isBSCode() method was missing. These are very hard to catch in 10 million line code bases that parse HTTP.

Posted by Sai on 2008-04-01

Alexander,

Then tell me tools for Ruby and co. that provide those features.

No, I told you of tools that provided these features for Smalltalk and VB. You don't get to pick the language in an existence proof.

How? Growing complexity needs systematic approaches (to be very abstract).

Strong typing is not the one and only systematic approach to dealing with complexity. Strong typing doesn't even rise to the level of "systematic", it's a tactic.

Furthermore I think people using lots of XML for configuration and component wiring in their Java frameworks do it the wrong way.

This is like socialists claiming that the Soviet Union would have worked, but it wasn't a pure enough socialist state, or the Ron Paul supporters that declare that free markets will solve any problem, but they haven't so far because the United States isn't a pure enough free market.

It was faster before you heard about it (or even before you wrote about that you heard it in your blog). Java is fast since version 1.3, which was released in 2000...

While it's good to always take benchmarks with a grain of salt, I don't see Java being faster than C in the year 2008, 8 years later. On the other hand, D is garbage collected and seems to be holding it's own speed wise.

Posted by Joe on 2008-04-01

Joe,

Strong typing is not the one and only systematic approach to dealing with complexity.

True, design of the system's architecture plays another big role. That's why I said it is helpful for larger projects. I didn't say it's the one and only way to go.

This is like socialists claiming that the Soviet Union would have worked, but it wasn't a pure enough socialist state.

That analogy is quite flawed: I didn't state that one should use even more XML in Java, then it would all work out great.

Sorry, Joe, I really liked reading your blog, mainly about REST things, but I didn't know that you like to create a religious war against Java. Now I am thinking about taking your blog out of my reading list.

Posted by Alexander Klimetschek on 2008-04-01

Fuel to your fire:

and my current favourite:

http://cdsmith.twu.net/types.html

Fwiw, I think this argument is ultimately about maintenance and type systems just aren't a first order effect on maintenance in the real world systems I've seen. Over the last couple of years, I've found Scala/Zope3/Cobra/ECMAScript to be informative. I can now imagine a usable Python that allowed method arguments to be type declared. I can see a usable Java that inferred declaration and return types.

"I was always baffled with Java folks love affair with XML until I realized it was just a crutch to make up for a lack of map and array literals in the language."

So I think it's fair to say that Java wiring and configuration via XML are dead ends. They'll be replaced by annotations and scripts respectively. When I say dead ends I mean it in the same way eggs are a dead end for Python.

Posted by Bill de hOra on 2008-04-01

glomming a million lines of Java into a single shared-everything instance.

For those of you who missed it, this is the key phrase. If your language/tools/methodology/whatever ends up with you compiling more than 1,000,000 lines of code into a single shared-everything instance then you have to seriously step back and question your language/tools/methodology/whatever. Yes, static typing probably makes a huge difference when dealing with 1,000,000 lines of code. Why aren't you questioning the underlying problem that requires you to manage 1,000,000 lines of code? And why are you still living in an N=1 world?

Posted by Joe on 2008-04-01

I'll answer your challenge; Ruby has refactoring support in RadRails/Aptana and Python has it in PyDev. Both also have syntax checking, unit test support, and warnings of various sorts. Of course Python has code style checks in the language definition, but any regex will give you those.

Adding to the chorus though, refactoring is not the answer, it's not even the question. I rarely do a straight extract method/inline method refactor in Python just because when I extract a method I'm probably doing more at the same time, like switching to generating that function and 15 others like it in a for loop inside the constructor. Steve Yegge has a post where he likens refactoring to pushing dirt around a construction site. "Many companies are faced with multiple million lines of code, and they view it as a simple tools issue, nothing more: lots of dirt that needs to be moved around occasionally." Automated refactoring only pushes dirt around, eliminating real duplicated code is manual, in Java refactor once, search/replace to get rid of the duplicate calls the refactor couldn't be sure of. Properly used features like maps, arrays, duck typing, multiple inheritance, and functions as first class objects mean there's less dirt to push around and developers can spend their time on the productive refactoring which can't be automated anyway.

BTW and for the record: Improperly using those tools will put you in a world of pain.

Posted by Andy on 2008-04-02

Joe: it's true that those refactoring features first appeared in Smalltalk IDEs, but what you get in Eclipse today is on a totally different level than what you have in typical Smalltalks (at least the ones I've seen).

Such tools are easier to build and easier to build correct in a statically typed language, nothing to argue about there, no? I mean, rename method is great, but not if I have to manually check each occurrence of the method's name...

And his point about the 1 million SLOC Java app still needs to be invalidated. There are indeed real world problems of such a complexity, that they might end up with 1 million SLOC in any programming language. Even if you discard that all as Java noise - if the equivalently complex Perl program was, say, 300,000 SLOC - where is it?

I think in many applications such 1 million SLOC monsters are broken into parts (rightfully!) which communicate via documents, text files, database entries, etc. But lo and behold, one could argue that those are indeed statically typed messages sent between application parts.

Posted by Martin Probst on 2008-04-02

"If your language/tools/methodology/whatever ends up with you compiling more than 1,000,000 lines of code into a single shared-everything instance then you have to seriously step back and question your language/tools/methodology/whatever."

No argument there. I said as much here:

http://beust.com/weblog/archives/000462.html

but it got lost. Thankfully Reg quoted it:

http://weblog.raganwald.com/2007/09/java-is-right-answer-to-wrong-question.html

I like what Stevey Yegge had to say on sheer size:

http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.html

IMO, really big codebases need structural support that doesn't exist in programming languages:

http://dirtsimple.org/2007/01/where-zope-leads-python-follows.html

as I said, this hasn't got much to do with type systems.

Posted by Bill de hOra on 2008-04-03

comments powered by Disqus