Discussion OpenAI API Plugin for VoiceAttack

This is a RELEASE, not a discussion - OpenAI Plugin is fully featured and ready to use!


OpenAI API Plugin for Voiceattack
by SemlerPDX



The OpenAI VoiceAttack Plugin provides a powerful interface between VoiceAttack and the OpenAI API, allowing us to seamlessly incorporate state-of-the-art artificial intelligence capabilities into our VoiceAttack profiles and commands.



I'm so excited to bring the power of true artificial intelligence to VoiceAttack through this plugin for all profile and command builders out there interested in working with OpenAI Technologies in VoiceAttack! I know everyone assumes that now that this technology is available, it will be easy to incorporate into existing programs or workflows, but the reality is that this is a brand new technology being made available and until some aspects of it become more accessible, working with the OpenAI API itself is a great way to get our foot in the door and start taking advantage of this awesome power right now.

All of the known limitations of these AI models apply here, ChatGPT will boldly state incorrect facts with high confidence at times, and we should always double-check or test responses - only difference is now, we can berate it verbally and ask for a correction which it can speak back to us!


3nJr9gA.png


We can use raw text input, dictation text, or captured audio from VoiceAttack as input prompts for ChatGPT, and we can receive responses as a text variable to use as we wish, or set it to be spoken directly and specifically tailored for text-to-speech in VoiceAttack. We can also perform completion tasks on provided input with options for selecting the GPT model (and more), processing audio via transcription or translation into (English) text using OpenAI Whisper, and generate or work with images using OpenAI Dall-E.


- Comprehensive Wiki and Samples for Profile Builders -
5MhRsew.png



This plugin also features OpenAI Moderation to review provided input and return a list of any flagged categories. Lastly, we can use the plugin to upload, list, or delete files for fine-tuning the OpenAI GPT models, or make use of OpenAI Embedding, which returns a string of metadata that can be parsed and used as desired. With this plugin, we can access a wide range of OpenAI functionality with ease, directly from within VoiceAttack.



Find complete details, download link, and documentation on GitHub:
OpenAI Plugin for VoiceAttack




If you enjoy this plugin, Click this Pic to check out my AVCS Profiles:


(AVCS CHAT is the first ready-to-use public profile powered by this OpenAI Plugin for VoiceAttack!)
 
Last edited:
An occasional error may occur when contacting the OpenAI API through use of any application, including any VoiceAttack profiles using the OpenAI Plugin such as my own AVCS CHAT profile. The GetResponse phase (and the 'thinking' sound, in AVCS CHAT) may seem like it is looping infinitely, but in fact, it is waiting for a return from the OpenAI API. Eventually, it may result in no response (and any sounds just ending), or users being so confused that they restart VoiceAttack.

I wanted users to know that they can just press the "Stop" button in VoiceAttack, closing and restarting is not required to end any looping sounds or a seemingly endless wait for a response from OpenAI API requests. Users can then immediately try their same input again - though note that any 'continuing session' will have ended due to the use of a "Stop" command, and the contextual memory of recent input/response pairs will be cleared, starting fresh again.

The error which would appear in the "openai_errors.log" may look like this:
Code:
==========================================================================
OpenAI Plugin Error at 2023-05-10 9:30:17 AM:
System.Exception: OpenAI Plugin Error: Error at chat/completions
(https://api.openai.com/v1/chat/completions) with HTTP status code: 429. Content: {
  "error": {
    "message": "That model is currently overloaded with other requests. You can retry
 your request, or contact us through our help center at help.openai.com if the error
 persists. (Please include the request ID a123456b7cdef890a12b3c456d789e0f in your
 message.)",
    "type": "server_error",
    "param": null,
    "code": null
  }
}

   at OpenAI_VoiceAttack_Plugin.ChatGPT.<Chat>d19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at OpenAI_VoiceAttack_Plugin.OpenAIplugin.<PluginContext>d40.MoveNext()
==========================================================================


Note on this 'overload' error from OpenAI API:
In the whole of development and testing, and all of my calls to OpenAI API since I got beta access in December, I have never seen these 'overload' messages resulting in an API call failure. There is no way I could have anticipated it beyond the current exception handling which already occurs, however I also read the OpenAI Discord channels often, and we are NOT the only ones surprised and bothered by this seemingly brand new issue with the OpenAI API. This company has had to scale up faster than any new website in recent history - they went from a hundred thousand users to over ten million in less than two months, and I imagine each month of 2023 that goes by sees more and more tools such this allowing more and more users to access the OpenAI API, so they will need to scale it up accordingly.

We just have to wait and deal with the 'overloads' that happen now and then. Just know it is not a fault of the plugin systems, the libaries I am using to access the OpenAI API, or anything to do with individual user accounts, and there is nothing we as users can change or do better. This is as good as it gets for now, and will only get better in time.
 
BIG things on the horizon! Now that the plugin has been released, I've had time to circle back to the most advanced feature of the OpenAI Plugin for VoiceAttack: Embeddings


What are embeddings?
Embeddings are a way to represent a body of text as an array of numeric values which capture the meaning and context of the text, allowing for comparisons between different texts. The OpenAI API offers a very fast and very affordable means to get these numeric values for a block of text, which are called 'embedding vectors'. The total number of embedding vectors generated for text is always the same, and the OpenAI Embeddings provides 1,536 float vectors for any text content. With such a high dimensionality, comparisons can have an increased degree of accuracy.


How do we use embeddings?
Consider a database that contains many entries, each entry has various data fields, most importantly the text content and the 1,536 embedding vectors generated for that content. These are "float" number types, similar to a decimal. If the user asks a question, a system can first get the new embedding vectors for that question, and then compare those to all entries on file through something called "cosine similarity". Through that, we can discover each block of text in the database which is most similar to the question which was asked. Then, we can take any number of those most similar text entries and present THAT to ChatGPT (along with the original question), instead of only the question, and tell it to use the data along with the question to produce an appropriate response. By doing this, we can "feed" information to ChatGPT for which to base its response upon, for situations where it would NOT know this information (such as a help document, a wiki page, a book of short stories, etc.). Rather than using its own knowledge base, it can formulate an organic response using the provided data.

A personalized, local "brain" for our AI chat bots!


What is this? TLDNR;
  • Introduction of local database processing for new user inputs before sending them to ChatGPT.
  • Ability to ignore irrelevant questions and respond using existing knowledge base.
  • Database selection, topic specification, and subject refinement options for ChatGPT context plugin calls.
  • Support for adding, viewing, editing, and removing documents and individual entries in the database, including PDF format.
  • Command Action system to execute commands directly in VoiceAttack when a user input matches a specific entry.
  • Contiguous subject system for reading entries using text-to-speech, with the ability to pause and resume reading from where it left off.

How will this work in OpenAI Plugin for VoiceAttack?
A few new plugin contexts will be added to expand the currently lacking Embeddings context, as well as a handful of additional option VoiceAttack variables for the ChatGPT context(s). These will allow us to specific that new inputs should be processed against a local database, which would occur just after getting new user input and before sending that input to ChatGPT. When the question is NOT relevant to the similar text content provided to ChatGPT to help formulate a response, it will ignore it and just answer as it would normally using its existing knowledge base, else it will use the data to respond to the user input.

Before beginning a ChatGPT context plugin call, we can indicate the database to use, optionally a particular topic contained in that database referring only to a certain set of entries, and also optionally a particular subject of that topic to further refine the specifics of a particular call. By default, when not specified, the entire database would be queried.

Users will be able to add new documents in whole (even in .PDF format), or add individual entries. There will be a system to view, edit, or remove entries as well - individually, all entries, or by topic name, or topic + subject name.

An additional system will allow setting a Command Action value for an entry, and a way to indicate that when a user input matches an entry with such an action set, to execute the command directly in VoiceAttack rather than provide the user input to ChatGPT as a question to be responded to.

Another interesting new system will have a VoiceAttack variable we can set, indicating that Embeddings should treat all entries in a subject as contiguous. Once identified through contextual similarity to the user input, they can be read using text-to-speech entry by entry until paused/stopped, where it will save the index of where it left off in that topic + subject. This could allow us to feed a document to the database such as a book of short stories, where we could ask it to read one of them to us, or continue reading from where it last left off - all without contacting OpenAI API beyond the initial embeddings for the user input to match it to and identify the contents of the database to be read.


When will this be available?
Because this system will be introducing an SQL database layer to the codebase of the OpenAI Plugin for VoiceAttack, it will be awhile before I can feel secure adding this to the public branch of this repo on GitHub. I expect it may be late June or into July before everything is ready for prime time, so I intend to introduce an early Beta branch to the GitHub repository. This will allow interested users to begin testing and trying out this new system, and help out by give me the feedback I need to ensure performance and functions are consistent across all systems. If all goes well, I should have this Beta branch available in a few weeks, but again due to the complexity of this refactor, public testing and some feedback will be required before I merge it with the Main branch and push this update to everyone.


Pics or it Didn't Happen
So far, I have been testing the loading phase of the database which occurs once when VoiceAttack is loaded, and the cosine calculations of the new embedding vectors against a test set of 25,000 entries, with a goal of optimizing the speed of these functions. For reference, a database of such size would contain about ten 300+ page documents. I have gotten the loading of the database to just over 8 seconds for such a massive test database (down from 24 seconds!), and the calculations return down to just 0.189, all achieved by parallelizing these tasks across all CPU* cores on the PC:
Screenshot 2023-05-29 005105.png

*(I should note that my CPU is an AMD R9 3900X with 12 Cores and 24 Threads which this 'Malcom' VM above has full access to, so I will be keen to discover how optimized this will be on systems with fewer cores/threads)


Thanks for all the feedback and support so far - hope you all are enjoying the concept of real AI tools in our VoiceAttack profiles & commands as much as I am!!
 
Last edited:
Well, I certainly had high hopes when I wrote that Embeddings announcement above! HAHA!

As it turns out, this Embeddings project is a lot deeper than I had initially anticipated, and so due to the depth and scope of converting documents/PDFs into a truly useful Embeddings Database, I have spun that aspect of this project off into its own application outside this OpenAI API Plugin for VoiceAttack. This will help me keep my head straight as I develop a proper app with an intuitive GUI completely separate from this codebase entirely. I will still need to refactor this OpenAI Plugin to allow for the alternative embedding vectors processing optional flow path for a ChatGPT context, along with other minor changes, but there will be no point to updating this Plugin until I have completed the application for users to create their own local Embeddings Databases.

It may be some time before I complete this GUI app on the side, but I am whittling away at it day by day - will be a long wait, but it will be worth it. If a thing is worth doing, it's worth doing well, and I don't do anything if it isn't worth doing. Again, thanks for all the support and coffees, I really appreciate feedback!
:coffee:
 
Last edited:
Back
Top Bottom