Saturday, May 28, 2011

Learning Java

Well, I'm finally learning Java.

I've taken half-hearted stabs at it before, but stalled out each time.  I've got several good Java books.

I've written a production Java application at work.  It's somewhat of a mess, but it mostly works.  It transforms some XML.  The XML I get to feed into it is hard to work with.  It has unquoted entities, bare ampersands, which I had to preprocess to get it to work at all.

I also found that javax.xml.stream.getLocation() doesn't really return a useful offset into the document of where an event occurs, but rather where the parser currently is at.  Not very useful.

I think the mindset of having a getLocation() method that returns an internal value not interesting to users of the  Class is a problem that is typical in Java class implementors. I know the IDEs will create getter and setter methods for you almost automatically.  I think for good OO design, you'd be very reluctant to expose an internal value 'location', but it's so common in the Java world just to throw in getters and setters without regard to their use that they just do it.  Really, you might need to have an internal pointer used by the parser, but getLocation() should return an actual pointer to where the event was detected in the stream.

In my Java studies, I've found a lot of great resources on the net.  I think I've found none better than this site, though.  I don't know about you, but I love real world examples in real world programs that I can build and run.  That's what this site has and it has the good examples indexed by language features, which could save me a lot of research.

One thing that really bums me out about Java is all the damn boilerplate.  Some say that a good IDE takes care of that, but really, all the clutter in your typical Java program makes it hard to read.  There should be some kind of pre-processor or macro facility in Java to help deal with this.  We should be able to do something like:

define println System.out.println

in a program and not have to type sop (in NetBeans) to expand that.  Clean, clear source should be a goal, not just a nice to have.  Most real programs spend a lot more time in maintenance than in development and the easier they are to pickup and maintain the better.

I guess ideally, there'd be support for first-class procedure objects that would allow you to capture a lot of this boilerplate in ways that could be inspected and manipulated by programs. I think something like Clojure might be an interesting thing to investigate. A lisp-language that integrates with the JVM might give you everything you need in terms of clarity and simplicity.

One pretty serious indictment of Java is that they implementors found that there was need for overloading in operators, specifically '+', to make concise coding for building strings from many different data types, but there's no provision for doing your own operator overloading. I could see why the mess that occurred with C++ and operating overloading might have frightened them away, but I think that resorting to features not available to users to make the language more expressive is indicative of a weakness that they should have addressed.

Thus, Ruby developers like to capitalize on their ability to define DSLs (Domain Specific Languages) to express idioms for a particular application area. The best Java developers can do is build APIs, but you are often stuck with huge towers of complexity, with factory methods and the like. Just seems like too much work.

I wish there was more support for the kind of programming I do a lot, programming in the small. I'll need a quick tool to perform set operations on a list of files, on the filenames themselves or the contents of files. I'll need to generate a list of files that is the union of two sets of files, checking if the files are the same and renaming if not, for example. Or, extracting content from a set of files and matching a field found in one set with content found in another. You know, quick and dirty "database" operations with data in files. Database types actually advocate setting up database tables for this kind of thing and doing your selections using SQL, but that seems like overkill a lot of the time. I can run up Perl or a shell script to do this kind of thing all the time. I can be half done before you can type 'public static void main(' (OK, in an IDE, that can be generated for you or is a few key strokes, but starting up an IDE is a big deal sometimes, too).

No comments: