Last week I was at DFRWS USA in Portland, OR to soak up some DFIR research, participate in some workshops, and congregate with some of the DFIR tribe. I also happen to be there to give a 20 minute presentation on Android Auto & Google Assistant.
Seeing how this was my first presentation I was super nervous and I am absolutely sure it showed (I got zero sleep the night before). I also made the rookie mistake of making WAY more many slides than I had time for; I do not possess that super power that allows some in our discipline to zip through PowerPoint slides at superhuman speeds. The very last slide in the deck had my contact information on it which included the URL for this blog. Unbeknownst to me, several people visited the blog shortly after my presentation and read some of the stuff here. Thank you!
As it turns out, this happened to generate a conversation. On one of the breaks someone came up to me and posed a question about Google Assistant. That question led to other conversations about Assistant, and another question was asked: what happens when a user cancels whatever action they wanted Google Assistant to do when they first invoked it?
I had brought my trusty Pixel 3 test phone with me on this trip for another project I am working on, so I was able to test this question fairly quickly with a pleasantly surprising set of results. The Pixel was running Android Pie with a patch level of February 2019 that had been freshly installed a mere two hours earlier. The phone was not rooted, but did have TWRP (3.3.0) installed, which allowed me to pull the data once I had run my tests.
Consider this scenario: a user not in the car calls on Google Assistant to send a text message to a recipient. Assistant acknowledges, and asks the user to provide the message they want to send. The user dictates the message, and then decides, for whatever reason they do not want to send it. Assistant reads the message back to the user and asks what the user wants to do (send it or not send it). The user indicates they want to cancel the action, and the text message is never sent.
This is the scenario I tested. In order to envoke Google Assistant I used the Assistant button in the right side of the Google Quick Search bar on the Android home screen. My dialogue with Google Assistant went as follows:
Me: OK, Google. Send a message to Josh Hickman
GA: Message to Josh Hickman using SMS. Sure. What’s the message?
Me: This is the test message for Google Assistant, period (to represent punctuation).
GA: I got “This is a test message for Google Assistant.” Do you want to send it or change it?
GA: OK, no problem.
If you have read my blog post on Google Assistant when outside of the car you know where the Google Assistant protobuf files are located, and the information they contain, so I will skip ahead to examining the file that represented this session.
The file header that reports where the protobuf file comes from is the same as before; the “opa” is seen in the red box. However, there is a huge difference with regards to the embedded audio data in this file. See Figure 1.
In the blue box there is a marker for Ogg, a container format that is used to encapsulate audio and video files. In the orange box is a marker for Opus, which is a lossy audio compression codec. It is designed for interactive speech and music transmission over the Internet and is considered to be high quality audio, which makes it prime to send Assistant audio across limited bandwidth connections. Based on this experiment and data in the Oreo image I released a couple of months ago, I believe Google Assistant may be using Opus now instead of the LAME codec. The takeaway here is to just be aware you may see either.
In the green box is the string “Google Speech using libopus.” Libopus is the method by which audio is encoded in Opus. Since this was clearly audio data, I treated it just like the embedded MP3 data I had previously seen in other Google Assistant protobuf files. I carved from the Ogg marker all the way down until I reached a series of 0xFF values just before a BNDL (see the previous Google Assistant posts about BNDL). I saved the file out with no extension and opened it with VLC Player. The following audio came out of my speakers:
“OK, no problem.”
This is the exact behavior I had see before in Google Assistant protobuf files: the file contained the audio of the last thing Google Assistant said to me, so this behavior was the same as before.
However, in this instance my request (to send a message) had not been passed to a different service (the Android Messages app) because I had indicated to Assistant that I did not want to send the message (my “Cancel” command). I continued search the file to see if the rest of my interaction with Google Assistant was present.
Figure 2 shows an area a short way past the embedded audio data. The area in the blue box should be familiar to those who read my previous Google Assistant. The hexadecimal string 0xBAF1C8F803 appears just before the first vocal input (red box) that appears in this protobuf file. The 8-byte string seen in the orange box, while not not exactly what I had seen before, had bytes that were the same (the leading 0x010C and trailing 0x040200). Either way, if you see this, get ready to see the text of some of the user’s vocal input.
So far, this pattern was exactly as I had seen before: what was last during my session with Google Assistant was first in the protobuf file. So I skipped a bit of data because I know the session data that followed dealt with the last part of session. If the pattern holds, that portion of the session will appear again towards the end of the protobuf file.
I navigated to the portion seen in Figure 3. Here I find a 16-byte string which I consider to be a footer for what I call vocal transactions. It marks the end of the data for my “Cancel” command; you can see the string in the blue box. Also in Figure 3 is the 8-byte string that I saw earlier (that acts as a marker for the vocal input) and the text of the vocal input that started the session (“Send a message to Josh Hickman”).
Traveling a bit further finds the two things of interest. The first is data that indicates how the session was started (via pressing the button in the Google Quick Search Box – I see this throughout the files in which I invoked Assistant via the button), which is highlighted in the blue box in Figure 4. Figure 4 also has a timestamp in it (red box). The timestamp is a Unix Epoch timestamp that is stored little endian (0x0000016BFD619312). When decoded, the timestamp is 07/16/2019 at 17:42:38 PDT (-7:00), which can be seen in Figure 5. This is when I started the session.
The next thing I find, just below the timestamp, is a transactional GUID. I believe this GUID is used by Google Assistant to keep vocal input paired with the feedback that the input generates; this helps keep a user’s interaction with Google Assistant conversational. See the red box in Figure 6.
The data in the red box in Figure 7 is interesting and I didn’t realize its significance until I was preparing slides for my presentation at DFRWS. The string 3298i2511e4458bd4fba3 is the Lookup Key associated with the (lone) contact on my test phone, “Josh Hickman;” this key appears in a few places. In the Contacts database (/data/data/com.android.providers.contacts/databases/contact2.db) the key appears in the contacts, view_contacts, view_data, and view_entities tables. It also appears in the Participants table in the Bugle database (/data/data/com.google.android.messages/databases/bugle.db), which is the database for the Android Messages app. See Figures 7, 8, & 9.
There are a few things seen in Figure 10. First is the transactional GUID that was previously seen in Figure 6 (blue box). Just below that is the vocal transaction footer (green box), the 8-byte string that marks vocal input (orange box), and the message I dictated to Google Assistant (red box). See Figure 10.
Figure 11 shows the timestamp in the red box. The string, read little endian, decodes to 07/17/2019 at 17:42:43 PDT, 5 seconds past the first timestamp, which makes sense that I would have dictated the message after having made the request to Google Assistant. The decoded time is seen in Figure 12.
Below there is the transactional GUID (again, previously seen in Figure 6) associated with the original vocal input in the session. Again, I believe this allows Google Assistant to know that this dictated message is associated with the original request (“Send a message to Josh Hickman”). This allows Assistant to be conversational with the user. See the red box in Figure 13.
Scrolling through quite a bit of protobuf data finds the area seen in Figure 14. Here I found the vocal transaction footer (blue box), the 8-byte vocal input marker (orange box) and the vocal input “Cancel” in the red box.
Figure 15 shows the timestamp of the “Cancel;” it decodes to 07/17/2019 at 17:42:57 PDT (-7:00). See Figure 16 for the decoded timestamp.
The final part of this file shows the original transactional GUID again (red box), which associates the “Cancel” with the original request. See Figure 17.
After I looked at this file, I checked my messages on my phone and the message did not appear in the Android Messages app. Just to confirm, I pulled my bugle.db and the message was nowhere to be found. So, based on this, it is safe to say that if I change my mind after having dictated a message to Google Assistant the message will not show up in the database that holds messages. This isn’t surprising as Google Assistant never handed me off to Android Messages in order to transmit the message.
However, and this is the surprising part, the message DOES exist on the device in the protobuf file holding the Google Assistant session data. Granted, I had to go in and manually find the message and the associated timestamp, but it is there. The upside to the manual parsing is there is already some documentation on this file structure to help navigate to the relevant data. 🙂
I also completed this scenario by invoking Google Assistant verbally, and the results were the same. The message was still resident inside of the protobuf file even though it had not been saved to bugle.db.
Hitting the Cancel Button
Next, I tried the same scenario but instead of telling Google Assistant to cancel, I just hit the “Cancel” button in the Google Assistant interface. Some users may be in a hurry to cancel a message and may not want to wait for Assistant to give them an option to cancel, or they are interrupted and may need to cancel the message before sending it.
I ran this test in the Salt Lake City, UT airport, so the time zone was Mountain Daylight Time (MDT or -6:00). The conversation with Google Assistant went as so:
Me: Send a text message to Josh Hickman.
GA: Message to Josh Hickman using SMS. Sure. What’s the message?
Me: This is a test message that I will use to cancel prior to issuing the cancel command.
*I pressed the cancel button in the Google Assistant UI*
Since I’ve already covered the file structure and markers, I will skip those things and get to the relevant data. I will just say the structure and markers are all present.
Figure 18 shows the 8-byte marker indicating the text of the vocal input is coming (orange box) along with the text of the input itself (red box). The timestamp seen in Figure 19 is the correct timestamp based on my notes: 07/18/2019 at 9:25:37 MDT (-6:00).
Just as before the dictated text message request was further up in the file, which makes sense here because the last input I gave Assistant was the dictated text message. Also note that there are variants of the dictated message, each with their own designations (T, X, V, W, & Z). This is probably due to the fact that I was in a noisy airport terminal, and, at the time I dictated the message, there was an announcement going over the public address system. See Figure 21 for the message and its variants, Figure 22 for the timestamp, and Figure 23 for the decoded timestamp.
As I mentioned, I hit the “Cancel” button on the screen as soon as the message was dictated. I watched the message appear in the Google Assistant UI, but I did not give Assistant time to read the message back to me to make sure it had dictated the message correctly. I allowed no feedback whatsoever. Considering this, the nugget I found in Figure 24 was quite the surprise.
In the blue box you can see the message in a java wrapper, but the thing in the red box…well, see for yourself. I canceled the message by pressing the “Cancel” button, and there is a string “Canceled” just below the message. I tried this scenario again by just hitting the “Home” button (instead of the “Cancel” button in the Assistant UI), and I got the same result. The dictated message was present in the protobuf file, but this time the message did not appear in a java wrapper, The “Canceled” ASCII string was just below an empty wrapper. See Figure 25.
So it would appear that an examiner may get some indication a session was canceled prior to Google Assistant getting a chance to either complete the action of sending a message or Google Assistant getting a “Cancel” command. Obviously, there are multiple scenarios in which a user could cancel a session with Google Assistant, but having “Canceled” in the protobuf data is definitely a good indicator. The drawback, though, is there is no indication how the transaction was canceled (e.g. by way of the “Cancel” button or hitting the home button).
An Actual Virtual Assistant Butt Dial
The next scenario I tested involved me simulating what I believe to be Google Assistant’s version of a butt-dial. What would happen if Google Assistant was accidentally invoked? By accidentally I mean by hitting the button in the Quick Search Box by accident, or by saying the hot word without intending to call on Google Assistant. Would Assistant record what the user said? Would it try to take any action even though there was probably no actionable items, or would it freeze and not do anything? Would there be any record of what the user said, or would Assistant realize what was going on, shut itself off, and not generate any protobuf data?
There were two tests here with the difference being in the way I invoked Assistant. One was by button and the other by hot word. Since the results were the same I will show just one set of screen shots, which are from the scenario in which I pressed the Google Assistant button in the Quick Search Bar (right side). I was in my hotel room at DFRWS, so the time zone is Pacific Daylight Time (-7:00) again. The scenario went as such:
*I pressed the button*
Me: Hi, my name is Josh Hickman and I’m here this week at the Digital Forensic Research Workshop. I was here originally…
*Google Assistant interrupts*
GA: You’d like me to call you ‘Josh Hickman and I’m here this week at the digital forensic research Workshop.’ Is that right?
*I ignore the feedback from Google Assistant and continue.*
Me: Anyway, I was here to give a presentation and the presentation went fairly well considerIng the fact that it was my first time here…
*Google Assistant interrupts again*
GA: I found these results.
*Google Assistant presents some search results for addressing anxiety over public speaking…relevant, hilarous, and slightly creepy.*
As before, I will skip file structure and get straight to the point.
The vocal input is in this file. Figure 26 shows the vocal input and a variant of what I said (“I’m” versus “I am”) in the purple boxes. It also shows the 5-byte marker for the first vocal input in a protobuf file (blue box) along with the 8-byte marker that indicates vocal input is forthcoming (orange box).
Just below the area in Figure 26 is the timestamp of the session. The time decodes to 07/17/2019 at 11:51:06 PDT (-7:00). See Figure 27.
Figure 29 shows my vocal input wrapped in the java wrapper.
Interestingly enough, I did not find any data in this file related to the second bit of input Google Assistant received, the fact that Google Assistant performed a search, or what search terms it used (or thought I gave it). I even went out to other protobuf files in the app_session folder to see if a new file was generated. Nothing.
This exercise shows there is yet one more place to check for messages in Android. Traditionally, we have always thought to look for messages in database files. What if the user composed a message using Google Assistant? If the user actually sends the message, the traditional way of thinking still applies. But, what if the user changes their mind prior to actually sending those dictated messages? Are those messages saved to a draft folder or some other temporary location in Messages? No, it is not. In fact, it is not stored any other location that I can find other than the Google Assistant protobuf files (if someone can find them please let me know). The good news is if a message is dictated using Assistant and the user cancels the message, it is possible to recover the message that was dicated but never sent. This could give further insight into the intent of a user and help recover even more messges. It also gives a better picture of how a user actually interacted with their device.
The Google Assistant protobuf files are continuing to suprise me in regards to how much data they contain. At this year’s I/O conference Google annouced speed improvements to Assistant along with their intention to push more of the natural language processing and machine learning functions on to the devices instead of having everything done server-side. This could be advantageous in that more artifacts could be left behind by Assistant, which would give a more wholelistic view of device usage.