This doesn't work and shouldn't work, because the AvroKeyInputFormat returns a GenericData$Record. The thing is it compiles, and you can even assign the first tuple to the variable "first". You will get a runtime error only when you try to access a field of MyCustomClass from the tuple (e.g first._1.getSomeField()).
This behavior sent me on a wild goose chase that took many hours over many weeks to figure out, because I never expected the method to return a wrong type at runtime. If there's a mismatch between what the InputFormat returns and the class I'm trying to load - shouldn't this be a compilation error? Or at least the runtime error should occur already when I try to assign the tuple to a variable of the wrong type. This is very unexpected behavior.
Moreover, I actually fixed my code and implemented all the required wrapper and custom classes:
JavaPairRDD<MyCustomAvroKey, NullWritable> records =
Tuple2<MyCustomAvroKey, NullWritable> first = records.first();
MyCustomAvroKey customKey = first._1;
But this time I forgot that I moved the class to another package so the namespace in the schema file was wrong. And again, in runtime the method datum() of customKey returned a GenericData$Record instead of a MyCustomClass.
Now, I understand that this has to do with the avro library (the GenericDatumReader class has an "expected" and "actual" schema, and it defaults to a GenericData$Record if something is wrong with my schema). But does it really make sense to return a different class from this API, which is not even assignable to my class, when this happens? Why would I ever get a class U from a wrapper class declared to be a Wrapper<T>? It's just confusing and makes it so much harder to pinpoint the real problem.
As I said, this weird behavior cost me a lot of time, and I've been googling this for weeks and am getting the impression that very few Java developers figured this API out. I posted a question about it in StackOverflow and got several views and upvotes but no replies (a similar question about loading custom types in Google Dataflow got answered within a couple of days).