Radar Trends to Watch: August 2023

Radar Trends to Watch: August 2023

Artificial Intelligence continues to dominate the news. In the past month, we’ve seen a number of major updates to language models: Claude 2, with its 100,000 token context limit; LLaMA 2, with (relatively) liberal restrictions on use; and Stable Diffusion XL, a significantly more capable version of Stable Diffusion. Does Claude 2’s huge context really change what the model can do? And what role will open access and open source language models have as commercial applications develop?

Artificial Intelligence

  • Stable Diffusion XL is a new generative model that expands on the abilities of Stable Diffusion. It promises shorter, easier prompts; the ability to generate text within images correctly; the ability to be trained on private data; and of course, higher quality output. Try it on clipdrop.
  • OpenAI has withdrawn OpenAI Classifier, a tool that was supposed to detect AI-generated text, because it was not accurate enough.
  • ChatGPT has added a new feature called “Custom Instructions.”  This feature lets users specify an initial prompt that ChatGPT processes prior to any other user-generated prompts; essentially, it’s a personal “system prompt.” Something to make prompt injection more fun.
  • Qualcomm is working with Facebook/Meta to run LLaMA 2 on small devices like phones, enabling AI applications to run locally. The distinction between open source and other licenses will prove much less important than the size of the machine on which the target runs.
  • StabilityAI has released two new large language models, FreeWilly1 and FreeWilly2. They are based on LLaMA and LLaMA 2 respectively. They are called Open Access (as opposed to Open Source), and claim performance similar to GPT 3.5 for some tasks.
  • Chatbot Arena lets chatbots do battle with each other. Users enter prompts, which are sent to two unnamed (randomly chosen?) language models. After the responses have been generated, users can declare a winner, and find out which models have been competing.
  • GPT-4’s ability to generate correct answers to problems may have degraded over the past few months—in particular, its ability to solve mathematical problems and generate correct Python code seems to have suffered. On the other hand, it is more robust against jailbreaking attacks.
  • Facebook/Meta has released Llama 2. While there are fewer restrictions on its use than other models, it is not open source despite Facebook’s claims.
  • Autochain is a lightweight, simpler alternative to Langchain. It allows developers to build complex applications on top of large language models and databases.
  • Elon Musk has announced his new AI company, xAI. Whether this will actually contribute to AI or be another sideshow is anyone’s guess.
  • Anthropic has announced Claude 2, a new version of their large language model. A chat interface is available at claude.ai, and API access is available. Claude 2 allows prompts of up to 100,000 tokens, much larger than other LLMs, and can generate output up to “a few thousand tokens” in length.
  • parsel is a framework that helps large language models do a better job on tasks involving hierarchical multi-step reasoning and problem solving.
  • gpt-prompt-engineer is a tool that reads a description of the task you want an AI to perform, plus a number of test cases. It then generates a large number of prompts about a topic, tests the prompts, and rates the results.
  • LlamaIndex is a data framework (sometimes called an “orchestration framework”) for language models that simplifies the process of indexing a user’s data and using that data to build complex prompts for language models. It can be used with Langchain to build complex AI applications.
  • OpenAI is gradually releasing its Code Interpreter, which will allow ChatGPT to execute any code that it creates, using data provided by the user, and sending output back to the user. Code interpreter reduces hallucinations, errors, and bad math.
  • Humans can now beat AI at Go by finding and exploiting weaknesses in the AI system’s play, tricking the AI into making serious mistakes.
  • Time for existential questions: Does a single banana exist? Midjourney doesn’t think so. Seriously, this is an excellent article about the difficulty of designing prompts that deliver appropriate results.
  • The Jolly Roger Telephone Company has developed GPT–4-based voicebots that you can hire to answer your phone when telemarketers call. If you want to listen in, the results can be hilarious.
  • Apache Spark now has an English SDK. It goes a step beyond tools like CoPilot, allowing you to use English directly when writing code.
  • Humans may be more likely to believe misinformation generated by AI, possibly because AI-generated text is better structured than most human text. Or maybe because AIs are very good at being convincing.
  • OpenOrca is yet another LLaMA-based open source language model and dataset. Its goal is to reproduce the training data for Microsoft’s Orca, which was trained using chain-of-thought prompts and responses from GPT-4. The claim for both Orca models is that it can reproduce GPT-4’s “reasoning” processes.
  • At its developer summit, Snowflake announced Document AI: natural language queries of collections of unstructured documents. This product is based on their own large language model, not an AI provider.

Programming

  • “It works on my machine” has become “It works in my container”: This article has some good suggestions about how to avoid a problem that has plagued computer users for decades.
  • StackOverflow is integrating AI into its products. StackOverflow for Teams now has a chatbot to help solve technical problems, along with a new GenAI StackExchange for discussing generative AI, prompt writing, and related issues.
  • It isn’t news that GitHub can leak private keys and authentication secrets. But a study of the containers available on DockerHub shows that Docker containers also leak keys and secrets, and many of these keys are in active use.
  • Firejail is a Linux tool that can run any process in a private, secure sandbox.
  • Complex and complicated: what’s the difference? It has to do with information, and it’s important to understand in an era of “complex systems.” First in a series.
  • npm-manifest-check is a tool that checks the contents of a package in NPM against the package’s manifest. It is a partial solution to the problem of malicious packages in NPM.
  • Facebook has described their software development platform, much of which they have open sourced. Few developers have to work with software projects this large, but their tools (which include testing frameworks, version control, and a build system) are worth investigating.
  • Polyrhythmix is a command-line program for generating polyrhythmic drum parts. No AI involved.
  • Philip Guo’s “Real-Real-World Programming with ChatGPT” shows what it’s like to use ChatGPT to do a real programming task: what works well, what doesn’t.

Security

  • A research group has found a way to automatically generate attack strings that force large language models to generate harmful content. These attacks work against both open- and closed-source models. It isn’t clear that AI providers can defend against them.
  • The cybercrime syndicate Lazarus Group is running a social engineering attack against JavaScript cryptocurrency developers. Developers are invited to collaborate on a Github project that depends on malicious NPM packages.
  • Language models are the next big thing in cybercrime. A large language model called WormGPT has been developed for use by cybercriminals. It is based on GPT-J. WormGPT is available on the dark web along with thousands of stolen ChatGPT credentials.
  • According to research by MITRE, out-of-bounds writes are among the most dangerous security bugs. They are also the most common, and are consistently at the top of the list. An easy solution to the problem is to use Rust.

Web

  • Another web framework? Enhance claims to be HTML-first, with JavaScript only if you need it. The reality may not be that simple, but if nothing else, it’s evidence of growing dissatisfaction with complex and bloated web applications.
  • Another new browser? Arc rethinks the browsing experience with the ability to switch between groups of tabs and customize individual websites.
  • HTMX provides a way of using HTML attributes to build many advanced web page features, including WebSockets and what we used to call Ajax. All the complexity appears to be packaged into one JavaScript library.
  • There is a law office in the Metaverse, along with a fledgling Metaverse Bar Association. It’s a good place for meetings, although lawyers cannot be licensed to practice in the Metaverse.
  • The European Court of Justice (CJEU) has ruled that Meta’s approach to GDPR compliance is illegal. Meta may not use data for anything other than core functionality without explicit, freely-given consent; consent hidden in the terms of use document does not suffice.

Cryptocurrency

  • Google has updated its policy on Android apps to allow apps to give blockchain-based assets such as NFTs.
  • ChatGPT can be programmed to send Bitcoin payments. As the first commenter points out, this is a fairly simple application of Langchain. But it’s something that was certainly going to happen. But it begs the question: when will we have GPT-based cryptocurrency arbitrage?

Biology

  • Google has developed Med-PaLM M, an attempt at building a “generalist” multimodal AI that has been trained for biomedical applications. Med-PaLM M is still a research project, but may represent a step forward in the application of large language models to medicine.

Materials

  • Room temperature ambient pressure superconductors: This claim has met with a lot of skepticism—but as always, it’s best to wait until another team succeeds or fails to duplicate the results. If this research holds up, it’s a huge step forward.

Learn faster. Dig deeper. See farther.

Source of Article