The recent big arrival of Artifical Intelligence, AI, systems has, unsurprisingly, had a bit of an impact. Pundits have had glorious fun making predictions while carefully ignoring the limits of the technology, doom mongers have had a very public and rather profitable depression, and so on. The hyperbole is hardly surprising, given the technology could quite clearly have a bit of a massive impact.
But what has this got to do with arts & ego? The thing is these AIs has to be trained, and they has to be trained on rather a lot of data. There is rather a lot of data out there that can be used for training—it’s called the Internet. The current crop of AI systems are, in many ways, just a natural language interface on top of the grand mélange that is the internet. The natural language stuff is very good indeed, but what it says is what it has been trained on, the content of the internet. That’s why there’s such a problem with AI spouting utter crap, cowardly bigotted hatred, and many other kinds of nonsense: it’s down to the training data used.
People & companies who’ve posted stuff online are now beginning to realise their content may be respouted by an AI, and they seem to be beginning to panic. I understand why: professional writers, for example, make their money out of what they write, so to find other systems reciting their work, or a form of their work, without compensation, is unwelcome. It’s hardly surprising that, now the AI companies are adding a feature to allow website controllers to say ‘don’t use our content for training AIs’, that quite a few sites are doing just that.
I think, though, this reaction is a long term mistake. An important aspect of good artists of any kind is that they influence others, that others steal their style and develop it. By refusing to allow their content to be used as training data, they are denying the AI, and so anyone who talks with it, that influence. Thus those who use the AI, as they most certainly will, to ask about the arts, won’t see examples, let alone explorations, of those denied works. Those exploring, those looking for ideas, won’t be influenced, inspired, led, by the deniers’ work.
Now, of course, it’s always a balance between making artwork available and earning a living from it, and everyone has to make their own choice, so I’m not saying those who prevent their content being used in training data are wrong, just that it’s not a decision I’m going to make. Admittedly, I don’t make a living with my work, but I’ve never tried to do so, I’m too much of a computer nerd. I reckon I could have done had I put a proper effort into doing so, but that might just be the ego part of arts & ego making its presence felt.
AI is already being used to prepare documents to be read by others. It looks highly likely that it will be used in education. Imagine children being educated in literature, wanting to discuss poetry with their AI teacher, and the teacher being unable to generalise the discussion to contemporary poets’ works. Well, they will have at least once poet, admittedly not a great poet, me. They should have access to far better examples, but that’s not my gift to give.
To me, the problem with AI is not the training data, it’s the centralisation of so much power in so few unrepresentative hands. It’s a problem of the structure of the technology economy, and it requires, not a “don’t touch my stuff” solution, but a legal solution. I am content that I live in the EU.
Anyway, the point is this: all the content of this site, arts & ego, can be used as training data for AI. Obviously, any reference to this site’s content, any reuse of it, whether directly or indirectly, should be acknowledged, but that’s not an AI matter, that’s basic politeness. The AI companies wouldn’t want their products to get a reputation for outrageous plagiarism—I hope.