Featured

printf (“hello, world\n”)

Welcome.  After reading other digital forensic blogs over the past couple of years I decided to start my own.  I have gained a lot by reading research done by others, so I thought it would only be right to give back to the digital forensic community.

I work in the government sector, so I will be limiting identifying information about me or my organization.  If you’re able to figure out who I am that’s fine.  Just know that I will actively try to not put anything identifying in here, although that may not be possible at times.  The posts will focus on the forensics.

A bit about me:  I manage a digital forensics lab in an ISO 17025 environment.  In addition to all of the managerial duties, I am also responsible for technical operations of my group and carry a full caseload.  If I am expected to lead, I also better do.  🙂

My vision for this blog will start small.  I hold a full-time job, have a family, and am in the middle of a master’s program.  I am aiming for one post a month, but there may be more or less depending on life.

Again, welcome, and thank you for taking time to read the posts.  I welcome feedback, regardless if it is positive or negative.  If you see something that is inaccurate please let me know so I can correct whatever it is that you see.  I do not want to proliferate bad information to the DF community.

Google Assistant Butt Dials (aka Accidental & Canceled Invocations)

Last week I was at DFRWS USA in Portland, OR to soak up some DFIR research, participate in some workshops, and congregate with some of the DFIR tribe. I also happen to be there to give a 20 minute presentation on Android Auto & Google Assistant.

Seeing how this was my first presentation I was super nervous and I am absolutely sure it showed (I got zero sleep the night before). I also made the rookie mistake of making WAY more many slides than I had time for; I do not possess that super power that allows some in our discipline to zip through PowerPoint slides at superhuman speeds. The very last slide in the deck had my contact information on it which included the URL for this blog. Unbeknownst to me, several people visited the blog shortly after my presentation and read some of the stuff here. Thank you!

As it turns out, this happened to generate a conversation. On one of the breaks someone came up to me and posed a question about Google Assistant. That question led to other conversations about Assistant, and another question was asked: what happens when a user cancels whatever action they wanted Google Assistant to do when they first invoked it?

I had brought my trusty Pixel 3 test phone with me on this trip for another project I am working on, so I was able to test this question fairly quickly with a pleasantly surprising set of results. The Pixel was running Android Pie with a patch level of February 2019 that had been freshly installed a mere two hours earlier. The phone was not rooted, but did have TWRP (3.3.0) installed, which allowed me to pull the data once I had run my tests.

The Question

Consider this scenario: a user not in the car calls on Google Assistant to send a text message to a recipient. Assistant acknowledges, and asks the user to provide the message they want to send. The user dictates the message, and then decides, for whatever reason they do not want to send it. Assistant reads the message back to the user and asks what the user wants to do (send it or not send it). The user indicates they want to cancel the action, and the text message is never sent.

This is the scenario I tested. In order to envoke Google Assistant I used the Assistant button in the right side of the Google Quick Search bar on the Android home screen. My dialogue with Google Assistant went as follows:

Me: OK, Google. Send a message to Josh Hickman

GA: Message to Josh Hickman using SMS. Sure. What’s the message?

Me: This is the test message for Google Assistant, period (to represent punctuation).

GA: I got “This is a test message for Google Assistant.” Do you want to send it or change it?

Me: Cancel.

GA: OK, no problem.

If you have read my blog post on Google Assistant when outside of the car you know where the Google Assistant protobuf files are located, and the information they contain, so I will skip ahead to examining the file that represented this session.

The file header that reports where the protobuf file comes from is the same as before; the “opa” is seen in the red box. However, there is a huge difference with regards to the embedded audio data in this file. See Figure 1.

Figure 1
Figure 1.  Same header, different audio.

In the blue box there is a marker for Ogg, a container format that is used to encapsulate audio and video files. In the orange box is a marker for Opus, which is a lossy audio compression codec. It is designed for interactive speech and music transmission over the Internet and is considered to be high quality audio, which makes it prime to send Assistant audio across limited bandwidth connections.  Based on this experiment and data in the Oreo image I released a couple of months ago, I believe Google Assistant may be using Opus now instead of the LAME codec.  The takeaway here is to just be aware you may see either.

In the green box is the string “Google Speech using libopus.” Libopus is the method by which audio is encoded in Opus. Since this was clearly audio data, I treated it just like the embedded MP3 data I had previously seen in other Google Assistant protobuf files. I carved from the Ogg marker all the way down until I reached a series of 0xFF values just before a BNDL (see the previous Google Assistant posts about BNDL). I saved the file out with no extension and opened it with VLC Player. The following audio came out of my speakers:

“OK, no problem.”

This is the exact behavior I had see before in Google Assistant protobuf files: the file contained the audio of the last thing Google Assistant said to me, so this behavior was the same as before.

However, in this instance my request (to send a message) had not been passed to a different service (the Android Messages app) because I had indicated to Assistant that I did not want to send the message (my “Cancel” command). I continued search the file to see if the rest of my interaction with Google Assistant was present.

Figure 2 shows an area a short way past the embedded audio data. The area in the blue box should be familiar to those who read my previous Google Assistant. The hexadecimal string 0xBAF1C8F803 appears just before the first vocal input (red box) that appears in this protobuf file. The 8-byte string seen in the orange box, while not not exactly what I had seen before, had bytes that were the same (the leading 0x010C and trailing 0x040200). Either way, if you see this, get ready to see the text of some of the user’s vocal input.

Figure 2
Figure 2.  What is last is first.

So far, this pattern was exactly as I had seen before: what was last during my session with Google Assistant was first in the protobuf file. So I skipped a bit of data because I know the session data that followed dealt with the last part of session. If the pattern holds, that portion of the session will appear again towards the end of the protobuf file.

I navigated to the portion seen in Figure 3. Here I find a 16-byte string which I consider to be a footer for what I call vocal transactions. It marks the end of the data for my “Cancel” command; you can see the string in the blue box. Also in Figure 3 is the 8-byte string that I saw earlier (that acts as a marker for the vocal input) and the text of the vocal input that started the session (“Send a message to Josh Hickman”).

Figure 3
Figure 3.  The end of a transaction and the beginning of another.

Traveling a bit further finds the two things of interest. The first is data that indicates how the session was started (via pressing the button in the Google Quick Search Box – I see this throughout the files in which I invoked Assistant via the button), which is highlighted in the blue box in Figure 4. Figure 4 also has a timestamp in it (red box). The timestamp is a Unix Epoch timestamp that is stored little endian (0x0000016BFD619312). When decoded, the timestamp is 07/16/2019 at 17:42:38 PDT (-7:00), which can be seen in Figure 5. This is when I started the session.

Figure 4
Figure 4.  A timestamp and the session start mechanism.
Figure 5
Figure 5.  The decoded timestamp.

The next thing I find, just below the timestamp, is a transactional GUID. I believe this GUID is used by Google Assistant to keep vocal input paired with the feedback that the input generates; this helps keep a user’s interaction with Google Assistant conversational. See the red box in Figure 6.

Figure 6
Figure 6.  Transactional GUID.

The data in the red box in Figure 7 is interesting and I didn’t realize its significance until I was preparing slides for my presentation at DFRWS. The string 3298i2511e4458bd4fba3 is the Lookup Key associated with the (lone) contact on my test phone, “Josh Hickman;” this key appears in a few places. In the Contacts database (/data/data/com.android.providers.contacts/databases/contact2.db) the key appears in the contacts, view_contacts, view_data, and view_entities tables. It also appears in the Participants table in the Bugle database (/data/data/com.google.android.messages/databases/bugle.db), which is the database for the Android Messages app. See Figures 7, 8, & 9.

Figure 7
Figure 7.  The lookup key in the protobuf file.
Figure 8.PNG
Figure 8.  The participants table entry in the bugle.db.
Figure 9
Figure 9.  A second look at the lookup key in the bugle.db.

There are a few things seen in Figure 10. First is the transactional GUID that was previously seen in Figure 6 (blue box). Just below that is the vocal transaction footer (green box), the 8-byte string that marks vocal input (orange box), and the message I dictated to Google Assistant (red box). See Figure 10.

Figure 10.png
Figure 10.  There is a lot going on here.

Figure 11 shows the timestamp in the red box. The string, read little endian, decodes to 07/17/2019 at 17:42:43 PDT, 5 seconds past the first timestamp, which makes sense that I would have dictated the message after having made the request to Google Assistant. The decoded time is seen in Figure 12.

Figure 11
Figure 11.  Timestamp for the dictated message.
Figure 12.png
Figure 12.  The decoded timestamp.

Below there is the transactional GUID (again, previously seen in Figure 6) associated with the original vocal input in the session. Again, I believe this allows Google Assistant to know that this dictated message is associated with the original request (“Send a message to Josh Hickman”). This allows Assistant to be conversational with the user. See the red box in Figure 13.

Figure 13.png
Figure 13.  The same transactional GUID.

Scrolling through quite a bit of protobuf data finds the area seen in Figure 14. Here I found the vocal transaction footer (blue box), the 8-byte vocal input marker (orange box) and the vocal input “Cancel” in the red box.

Figure 14.png
Figure 14.  The last vocal input of the session.

Figure 15 shows the timestamp of the “Cancel;” it decodes to 07/17/2019 at 17:42:57 PDT (-7:00). See Figure 16 for the decoded timestamp.

Figure 15.png
Figure 15.  The “Cancel” timestamp.
Figure 16
Figure 16.  The decoded “Cancel” timestamp.

The final part of this file shows the original transactional GUID again (red box), which associates the “Cancel” with the original request. See Figure 17.

Figure 17
Figure 17.  The original transactional GUID…again.

After I looked at this file, I checked my messages on my phone and the message did not appear in the Android Messages app. Just to confirm, I pulled my bugle.db and the message was nowhere to be found. So, based on this, it is safe to say that if I change my mind after having dictated a message to Google Assistant the message will not show up in the database that holds messages. This isn’t surprising as Google Assistant never handed me off to Android Messages in order to transmit the message.

However, and this is the surprising part, the message DOES exist on the device in the protobuf file holding the Google Assistant session data. Granted, I had to go in and manually find the message and the associated timestamp, but it is there. The upside to the manual parsing is there is already some documentation on this file structure to help navigate to the relevant data. 🙂

I also completed this scenario by invoking Google Assistant verbally, and the results were the same. The message was still resident inside of the protobuf file even though it had not been saved to bugle.db.

Hitting the Cancel Button

Next, I tried the same scenario but instead of telling Google Assistant to cancel, I just hit the “Cancel” button in the Google Assistant interface. Some users may be in a hurry to cancel a message and may not want to wait for Assistant to give them an option to cancel, or they are interrupted and may need to cancel the message before sending it.

I ran this test in the Salt Lake City, UT airport, so the time zone was Mountain Daylight Time (MDT or -6:00). The conversation with Google Assistant went as so:

Me: Send a text message to Josh Hickman.

GA: Message to Josh Hickman using SMS. Sure. What’s the message?

Me: This is a test message that I will use to cancel prior to issuing the cancel command.

*I pressed the cancel button in the Google Assistant UI*

Since I’ve already covered the file structure and markers, I will skip those things and get to the relevant data. I will just say the structure and markers are all present.

Figure 18 shows the 8-byte marker indicating the text of the vocal input is coming (orange box) along with the text of the input itself (red box). The timestamp seen in Figure 19 is the correct timestamp based on my notes: 07/18/2019 at 9:25:37 MDT (-6:00).

Figure 18.png
Figure 18.  The request.
Figure 19.png
Figure 19.  The timestamp.
Figure 20
Figure 20.  The timestamp decoded.

Just as before the dictated text message request was further up in the file, which makes sense here because the last input I gave Assistant was the dictated text message. Also note that there are variants of the dictated message, each with their own designations (T, X, V, W, & Z). This is probably due to the fact that I was in a noisy airport terminal, and, at the time I dictated the message, there was an announcement going over the public address system. See Figure 21 for the message and its variants, Figure 22 for the timestamp, and Figure 23 for the decoded timestamp.

Figure 21
Figure 21.  The dicated message with variants.
Figure 22.png
Figure 22.  The timestamp.
Figure 23.png
Figure 23.  The decoded timestamp.

As I mentioned, I hit the “Cancel” button on the screen as soon as the message was dictated. I watched the message appear in the Google Assistant UI, but I did not give Assistant time to read the message back to me to make sure it had dictated the message correctly. I allowed no feedback whatsoever. Considering this, the nugget I found in Figure 24 was quite the surprise.

Figure 24
Figure 24.  The canceled message.

In the blue box you can see the message in a java wrapper, but the thing in the red box…well, see for yourself. I canceled the message by pressing the “Cancel” button, and there is a string “Canceled” just below the message. I tried this scenario again by just hitting the “Home” button (instead of the “Cancel” button in the Assistant UI), and I got the same result. The dictated message was present in the protobuf file, but this time the message did not appear in a java wrapper, The “Canceled” ASCII string was just below an empty wrapper. See Figure 25.

Figure 25
Figure 25.  Canceled.  Again.

So it would appear that an examiner may get some indication a session was canceled prior to Google Assistant getting a chance to either complete the action of sending a message or Google Assistant getting a “Cancel” command. Obviously, there are multiple scenarios in which a user could cancel a session with Google Assistant, but having “Canceled” in the protobuf data is definitely a good indicator. The drawback, though, is there is no indication how the transaction was canceled (e.g. by way of the “Cancel” button or hitting the home button).

An Actual Virtual Assistant Butt Dial

The next scenario I tested involved me simulating what I believe to be Google Assistant’s version of a butt-dial. What would happen if Google Assistant was accidentally invoked? By accidentally I mean by hitting the button in the Quick Search Box by accident, or by saying the hot word without intending to call on Google Assistant. Would Assistant record what the user said? Would it try to take any action even though there was probably no actionable items, or would it freeze and not do anything? Would there be any record of what the user said, or would Assistant realize what was going on, shut itself off, and not generate any protobuf data?

There were two tests here with the difference being in the way I invoked Assistant. One was by button and the other by hot word. Since the results were the same I will show just one set of screen shots, which are from the scenario in which I pressed the Google Assistant button in the Quick Search Bar (right side). I was in my hotel room at DFRWS, so the time zone is Pacific Daylight Time (-7:00) again. The scenario went as such:

*I pressed the button*

Me: Hi, my name is Josh Hickman and I’m here this week at the Digital Forensic Research Workshop. I was here originally…

*Google Assistant interrupts*

GA: You’d like me to call you ‘Josh Hickman and I’m here this week at the digital forensic research Workshop.’ Is that right?

*I ignore the feedback from Google Assistant and continue.*

Me: Anyway, I was here to give a presentation and the presentation went fairly well considerIng the fact that it was my first time here…

*Google Assistant interrupts again*

GA: I found these results.

*Google Assistant presents some search results for addressing anxiety over public speaking…relevant, hilarous, and slightly creepy.*

As before, I will skip file structure and get straight to the point.

The vocal input is in this file. Figure 26 shows the vocal input and a variant of what I said (“I’m” versus “I am”) in the purple boxes. It also shows the 5-byte marker for the first vocal input in a protobuf file (blue box) along with the 8-byte marker that indicates vocal input is forthcoming (orange box).

Figure 26.png
Figure 26.  The usual suspects.

Just below the area in Figure 26 is the timestamp of the session. The time decodes to 07/17/2019 at 11:51:06 PDT (-7:00). See Figure 27.

Figure 27.png
Figure 27.  Timestamp.
Figure 28
Figure 28.  Decoded timestamp.

Figure 29 shows my vocal input wrapped in the java wrapper.

Figure 29.png
Figure 29.  My initial vocal input, wrapped.

Interestingly enough, I did not find any data in this file related to the second bit of input Google Assistant received, the fact that Google Assistant performed a search, or what search terms it used (or thought I gave it). I even went out to other protobuf files in the app_session folder to see if a new file was generated. Nothing.

Conclusion

This exercise shows there is yet one more place to check for messages in Android.  Traditionally, we have always thought to look for messages in database files.  What if the user composed a message using Google Assistant?  If the user actually sends the message, the traditional way of thinking still applies.  But, what if the user changes their mind prior to actually sending those dictated messages?  Are those messages saved to a draft folder or some other temporary location in Messages?  No, it is not.  In fact, it is not stored any other location that I can find other than the Google Assistant protobuf files (if someone can find them please let me know).     The good news is if a message is dictated using Assistant and the user cancels the message, it is possible to recover the message that was dicated but never sent.  This could give further insight into the intent of a user and help recover even more messges.  It also gives a better picture of how a user actually interacted with their device.

The Google Assistant protobuf files are continuing to suprise me in regards to how much data they contain.  At this year’s I/O conference Google annouced speed improvements to Assistant along with their intention to push more of the natural language processing and machine learning functions on to the devices instead of having everything done server-side.  This could be advantageous in that more artifacts could be left behind by Assistant, which would give a more wholelistic view of device usage.

Me(n)tal Health in DFIR – It’s Kind of a Big Deal

When I initially started this blog I set a modest goal of making one post a month with the understanding that sometimes life will happen and take priority. Well, life is happening for me this month: an imminent house move, an upcoming presentation at DFRWS USA, the GCFE, and several cases at work have kept me extremely busy. With all that going on there has been absolutely zero time for any research. Being the stubborn person I am, though, I couldn’t NOT post something, so here we are. Fortunately, there are no screenshots this month. 🙂

A few days ago I was cruising around the DFIR Discord channel when someone asked an important question. The question was this: how are examiners/investigators who are exposed to child sexual exploitation material (i.e. child pornography) given mental health support, if any. The few replies that came were all over the place. Some responses indicated they received zero support, others got what I would consider partial support, and one responder indicated they got a lot of support.

Why?

I have the unfortunate experience of being exposed to this material at my current job assignment, and have been for several years now due to past job assignments. No one wants to see it, be around it, or be around individuals who willingly seek out this material. This material doesn’t magically appear out of thin air; it has to be created, which means a child has to be sexually exploited. This is against the law. Period.

Viewing these acts is…terrible.

In addition to the social implications, there is a societal need to investigate people who possess, distribute, and create this material. These investigations are mentally taxing because the material is tough to look at, plain and simple. But, the investigations have to be done. There is no way around it. The well-being of a child is at stake.

The subject matter of these investigations require a special kind of person to do them. I cannot tell you how many times I have had seasoned investigators say to me “I don’t know how you do it. I would jump across the table and kill them.” The thing is that I believe that they would do just that. Investigators/examiners are human, and, just like everyone else, we are all wired differently. Certain things may trigger a severe emotional response in one investigator/examiner, and not trigger a severe emotional response in another. Investigators/examiners who do these types of investigations/examinations have to have a particular mindset. Having done all kinds of criminal investigations and examinations for various criminal offenses, I can tell you, for example, that there is a difference in mindset between dealing with a homicide suspect and an individual who peddles in this material.

Investigators/examiners who are exposed to this material have to keep severe emotional responses in check in order to remain professional and do their job, and it takes a lot of mettle to do this. That mental effort, along with being repeatedly exposed to this material, takes a toll on the mind and the heart. I have seen colleagues crumble under the mental and emotional stress caused by these investigations/examinations, and walk away from investigations/digital forensics. I even had a co-worker take their own life.

And the need for mental fortitude doesn’t just apply to law enforcement investigators/examiners. The private sector has its own set of stressors that takes a mental and physical toll on DFIR personnel that operate in that arena. Long hours, being away from family/friends, conflicting priorities, deadlines and employer/peer expectations can all introduce stress and cause the mind to buckle and suffer.

And, if you think the non-law enforcement DFIR people don’t see some disturbing material, you are wrong. Digital devices act as a sort of safe for the mind (in addition to being the bicycle Steve Jobs liked to talk about), so people will store valuable things in them. Sometimes these valuable things may have a (negative) social stigma associated with them, and the owner wants to keep them secret, afraid that someone will find out their secret. DFIR practitioners who operate in the private/non-law enforcement sector will find this stuff, and while it may not be unlawful to posses the material, it may still be disturbing, so viewing it takes a toll.

I will add that this discussion also applies to those who conduct forensic audio/video examinations. Our team does those exams, too. We have the unfortunate experience, at times, of watching/listening to a person die or be seriously injured or maimed. Audio/video examinations are some of the toughest we do because we actually see/hear the event.

It Doesn’t Have To Be This Way

There have been a few DFIR blog posts published in the past few months that have addressed burnout/mental health in our discipline, so I am not going to re-hash what they have said. They are good articles, and DFIR folks should read them. If you are interested, they are:

Christa Miller (Forensic Focus) – Burnout in DFIR (And Beyond)

Brett Shavers – Only Race Cars Should Burnout

Thom Langford – Drowning, Not Waving

If you are struggling, seek help. Just know that you are not the only one, and there are resources out there to help you, including others in the DFIR community; generally speaking, we are a supportive bunch. Even if your employer doesn’t offer support, the DFIR community will.

One of the responses I saw in the Discord channel indicated that there is a negative connotation around seeking out help for mental health. I understand that because I have worked in environments where expressing mental/emotional distress was seen as a sign of weakness among peers and supervisors. However, I was fortunate enough to find my way into an environment where mental health is taken seriously and when people were in distress (expressed or not), peers and supervisors listened and took action to help. The few responses I saw made me think environments like mine are the exception and not the rule. I hope I am wrong.

The thing is, it doesn’t have to be that way.

What To Do?

I am not a health professional, so I don’t know the answer to the question or if there even IS an answer.

However, I do know mental health is important, in both DFIR and non-DFIR careers. Even for those of us DFIR’ers who are not exposed to child sexual exploitation material on a regular basis, the other major stressors I previously mentioned can have a negative impact on mental health (see Thom’s article above). Our minds are subjected to so much, it would make sense to have someone check it from time to time.

To use Brett Shaver’s car analogy, it would be silly to not take your car in for a maintenance checkup after an extended period of use. Why would you not give your mind the same checkup by someone who is licensed to do so? We do that for our physical bodies (most of us do, anyway), so why not for the mind? Our minds and bodies are symbiotic just like the systems in a car; a change in one can affect the other, good or bad. If your mind starts to break down due to ongoing mental stress, it can have a negative impact on your physical health…just like a breakdown in one system in a car can negatively impact other systems in the car. This impacts overall performance. A breakdown in your mind can have the same effect on your physical health, job performance, personal habits, and interpersonal relationships.

I have been in supportive environments, and am now responsible for not only maintaining that type of environment, but looking after team members’ well-being. Their families have entrusted my organization with their well-being, and my organization has delegated that responsibility to me. Those of you who supervise a DFIR team have the same responsibility, whether you realize it or not. Sure, one more thing to be responsible for, but guess what. You are in THE seat, and this is extremely important.

For those of you who are not supervisors, you should be looking out for your colleagues, and that includes your supervisor. I have tried to establish a relationship with my fellow team members that encourages free flowing communication, regardless of whether it is positive or negative, and I have experienced both. I would like to think they would come to me if they noticed a change in my behavior.

Again, I am not a health professional, and I am not sure there is a one-size-fits-all answer to how an organization effectively deals with mental health issues for DFIR. That being said I thought I would share what my organization does to try and keep a healthy environment for its DF examiners (we have no incident response function). What we do may work for other organizations, it may not, but I do want to show that it can be done.

An Example

The first thing, and I think this probably the most important, is that we have agency buy-in. If we did not have support from our administration, the rest of what we do would not happen. They fully support what we do and they recognize that happy employees are not only productive employees, but employees that are more likely to stay than to leave. What does that support entail? Well, they provide the funding and approve policies. Without those two things, it would have been impossible to do anything. Again, this applies to my organization, which happens to be 400-ish strong (only three of us are DF). If your agency is small and not very bureaucratic, you may have an easier time with this.

Policies. Some may roll their eyes at them, despise them, or completely ignore them. Regardless of you feelings toward them, they work for the purposes here. Our policy requires….requires…that our examiners go see a licensed psychologist at least once a year and the organization pays for the visit. (Update: this is separate from the employee assistance program, or EAP). Having this in the policy puts the agency on the hook, so to speak, and my organization is completely ok with that. Again, they fully support the mission and the employees who carry out that mission. By making the visit mandatory in a policy, it inoculates it (somewhat) from budget shortfalls which we encounter from time to time.

If a DF employee requests to go to see a licensed psychologist after/before their annual visit because they feel they are struggling, we send them, and the organization pays for it, no questions asked. Any examination (regardless of what it is for) can suddenly hit an examiner the wrong way at the wrong time and have a detrimental effect on their mental health. We realize that, thus we do not tell the employee “Can’t this wait until your scheduled visit?” No, we send them as quickly as we can get an appointment. Again, this is separate from EAP.

Along those same lines, we also realize that an examination may not have a contemporaneous emotional effect, and that it can take a while for the emotional distress to manifest itself to the point the examiner realizes there is a problem, or others notice a change. Again, this is why we do not lock them in to a set schedule.

There is a second part of this. Sometimes we carry our work home with us. If we are struggling at work, we can carry that home with us, and that can start to wear on our family members/significant others who live in the home with us. Our policy allows for a DF spouse to go see a licensed psychologist, too. They may need help helping the examiner cope, or they may need to offload what the examiner offloads on them. Just like the examiner, the spouse can go multiple times if needed, and, the agency pays for it.

Meet Our Lady

img_0176

Who in DFIR doesn’t like dog pictures? Well, this isn’t just any random dog. Meet Lady. She is the therapy K-9 that is attached to our team. Lady is considered a working K-9, just like a K-9 who detects narcotics or explosives, so the usual rules apply to her (e.g. no people food). She is considered an employee; she has an identification badge, a uniform, and an entry in the employee directory.

Just like other working dogs, Lady lives with her handler, who is a member of our DF team. She is a part of our family, and we treat her as such.

Lady came to us by way of the Paws and Stripes program at the Brevard County, Florida Sheriff’s Office. I will not get in to the specifics of that program, but just know she came to us after having undergone four months of training at the program site. We have a separate policy that addresses Lady. It addresses things such as her medical care, food, lodging, grooming, appearance, the person who is responsible for Lady (her handler), and certification requirements. Just as an example, my organization pays for all food and medical care so as long as she is able to serve in her official capacity. In the event she is not able to serve, she retires from service. The Director of my organization has the final say-so about with whom she retires, but, in keeping with standards, she would probably retire with her handler. Once that occurs, the handler absorbs the cost of food, but my organization will continue to pay for medical care for Lady until her death. We believe Lady is around 2 years old (she was rescued from a shelter), so we plan on her being with us for a LONG time.

Lady is a certified therapy K-9, and is certified through the Alliance of Therapy Dogs. You can read more about that organization and the certification requirements here.

In my opinion, this K-9 program is money well-spent. The mental health benefits Lady provides really is incalculable. Not only to the DF examiners, but to the organization as a whole. For the DF examiners she can be a pleasant distraction; whether it’s to take her out to potty, or to just toss a ball or frisbee, she can provide a short, necessary, and welcome distraction from tough examinations. Lady is intuitive, too. She can sense if someone is having a hard time, and happily go apply a wet nose to a leg or hand to get your attention, which gets you out from behind your workstation and not thinking about your exam.

The budget for Lady is modest compared to other costs in my organization. We budget around $1600 (USD) per year, but we have yet to come close to tapping that whole pot of money. If we were to lose an examiner due to mental health issues, we would have to spend time recruiting and hiring (my hourly salary plus the others involved) and training (DFIR training is not cheap) a replacement. From a financial perspective, Lady is “spend a little money up front, save a lot of money later.” By investing in Lady, we invest in the mental health of our examiners.

Here’s a picture of Lady hard at work….or not. I promise she has beds scattered all throughout our work areas (along with toys).

And here is a picture of her when she visited a medical facility over the holidays (periodic therapy visits outside of work are a requirement of her certification).

And the last one (I feel like a parent). One of our team members rides his motorcycle into work when the weather is nice. Lady randomly hopped up there one afternoon (she wasn’t allowed to ride on the bike).

From a supervisory standpoint there are a couple of things that I do to help with mental health. A small thing is rotating examination types. In other words, if an examiner has had a tough examination, I will assign a not-so-tough subject matter examination after that (“not-so-tough”, of course is subjective). For example, if an examiner had a child sexual exploitation examination, I try to assign something else other than a child sexual exploitation to that examiner for their next exam or two. Sometimes, our case queue will not allow for this, but I am monitoring what exam types they are working and doing what I can from that angle.

Another small thing that I do is leave my door open as much as I can, i.e. I have an open door policy. Usually every morning the team stops by the office, coffee in hand, and have a seat. We discuss current examinations and any issues that have risen during those examinations. A lot of times we are trading ideas on ways to overcome those issues. We also discuss other ancillary subjects and non-work related matters, too. I appreciate that communication and exchange of ideas. I typically learn something from those discussions, too. I will note, that this is not a required meeting…it just happens, and it may happen again, spontaneously, throughout the workday.

While I am invested in and appreciative of our daily discussions, these discussions also serve another purpose: I get a chance to observe the team. Is there any change in their mood or behavior that I can detect? Have they said anything that gives me cause for concern? Are they passively expressing some type of emotional distress? Does any change I detect coincide with a current or recent examination they have conducted? I am looking and listening for these things. As I mentioned before, their families have lent them to the citizens of our state via our organization to deal with some of the toughest subject matters in the criminal justice system. I would be remiss if I didn’t take their well-being to heart.

We try to go out for a team dinner, off-site, after hours every so often. The team usually leaves a little early and heads to the location, and I stay behind for a bit and meet them. We’ll discuss a few work-related matters and then we officially go off the clock. Work is done, and so is our discussion of it. I will say that schedules have been all over the place as of late so we are a bit off schedule.  This happens.  

Encouraging team members to not feel bad when taking time off from work is something I have noticed that I have to do every so often. I usually have to do this when something unexpected arises and causes a team member to request leave on short notice. Life happens…to all of us…at some point during our career. Whether you work in DFIR or not, things will happen outside of your work that will require you to divert your focus and energy from your work to that thing, whatever it is. Diverting like that requires time away from work, and that’s ok. That’s what paid time off (PTO) is for.

Conclusion

I hope readers find this helpful.  Mental health in our field is an important subject, and it is one that I don’t think gets talked about enough.  If you have any questions about our program or anything else, please feel free to reach out; I am responsive to communication through the site. 

Mental health is something that impacts all of us in DFIR.  It is important that we recognize that and to take steps to foster environments in which mental health is taken seriously and not dismissed.

Take care of yourselves, and each other.

Two Snaps and a Twist – An In-Depth (and Updated) Look at Snapchat on Android

There is an update to this post. It can be found after the ‘Conclusion’ section.

I was recently tasked with examining a two-year old Android-based phone which required an in-depth look at Snapchat. One of the things that I found most striking (and frustrating) during this examination was the lack of a modern, in-depth analysis of the Android version of the application beyond the tcspahn.db file, which, by the way, doesn’t exist anymore, and the /cache folder, which isn’t really used anymore (as far as I can tell). I found a bunch of things that discussed decoding encrypted media files, but this information was years old (Snapchat 5.x). I own the second edition of Learning Android Forensics by Skulkin, Tyndall, and Tamma, and while this book is great, I couldn’t find where they listed the version of Snapchat they examined or the version of Android they were using; what I found during my research for this post did not really match what was written in their book. A lot of things have changed.

Googling didn’t seem to help either; I just kept unearthing the older research. The closest I got was a great blog post by John Walther that examined Snapchat 10.4.0.54 on Android Marshmallow. Some of John’s post lined up with what I was seeing, while other parts did not.

WHAT’S THE BIG DEAL?

Snapchat averages 190 million users daily, which is just under half of the U.S. population, and those 190 million people send three billion snaps (pictures/videos) daily. Personally, I have the app installed on my phone, but it rarely sees any usage. Most of the time I use it on my kid, who likes the filters that alter his voice or requires that he stick out his tongue. He is particularly fond of the recent hot dog filter.

One of the appealing things about Snapchat is that direct messages (DMs) and snaps disappear after a they’re opened. While the app can certainly be used to send silly, ephemeral pictures or videos, some people find a way to twist the app for their own nefarious purposes.

There has been plenty written in the past about how some traces of activity are actually recoverable, but, again, nothing recent. I was surprised to find that there was actually more activity-related data left behind than I thought.

Before we get started just a few things to note (as usual). First, my test data was generated using a Pixel 3 running Android 9.0 (Pie) with a patch level of February 2019. Second, the version of Snapchat I tested is 10.57.0.0, which was the most current version as of 05/22/2019. Third, while the phone was not rooted, it did have TWRP, version 3.3.0-0, installed. Extracting the data was straight forward as I had the Android SDK Platform tools installed on my laptop. I booted into TWRP and then ran the following from the command line:

adb pull /data/data/com.snapchat.android

That’s it. The pull command dropped the entire folder in the same path as where the platform tools resided.

As part of this testing, I extracted the com.snapchat.android folder five different times over a period of 8 days as I wanted to see what stuck around versus what did not. I believe it is also important to understand the volatility of the data that is provided in this app. I think understanding the volatility will help investigators in the field and examiners understand exactly how much time, if any, they have before the data they are seeking is no longer available.

I will add that I tested two tools to see what they could extract: Axiom (version 3.0) and Cellebrite (UFED 4PC 7.18 and Physical Analyzer 7.19). Both tools failed to extract (parsing not included) any Snapchat data. I am not sure if this is a symptom of these tools (I hope not) or my phone. Regardless, both tools extracted nothing.

TWO SNAPS AND…SOME CHANGE

So, what’s changed? Quite a bit as far as I can tell. The storage location of where some of the data that we typically seek has changed. There are enough changes that I will not cover every single file/folder in Snapchat. I will just focus on those things that I think may be important for examiners and/or investigators.

One thing has not changed: the timestamp format. Unless otherwise noted, all timestamps discussed are in Unix Epoch.

The first thing I noticed is that the root level has some new additions (along with some familiar faces). The folders that appear to be new are “app_textures”, “lib”, and “no_backup.” See Figure 1.

Figure 1. Root level of the com.snapchat.android folder.

The first folder that may be of interest is one that has been of interest to forensicators and investigators since the beginning: “databases.” The first database of interest is “main.db.” This database replaces tcspahn.db as it now contains a majority of user data (again, tcspahn.db does not exist anymore). There is quite a bit in here, but I will highlight a few tables. The first table is “Feed.” See Figure 2.

Figure 2. The Feed.

This table contains the last action taken in the app. Specifically, the parties involved in that action (seen in Figure 2), what the action was, and when the action was taken (Figure 3). In Figure 4 you can even see which party did what. The column “lastReadTimestamp” is the absolute last action, and the column “lastReader” show who did that action. In this instance, I had sent a chat message from Fake Account 1 (“thisisdfir”) to Fake Account 2 (“hickdawg957”) and had taken a screenshot of the conversation using Fake Account 1. Fake Account 2 then opened the message.

Figure 3. Last action.
Figure 4. Who did what?

The second table is “Friend.” This table contains anyone who I may be my friend. The table contains the other party’s username, user ID, display name, the date/time I added that person as a friend (column “addedTimestamp”), and the date/time the other person added me as a friend (column “reverseAddedTimestamp”). Also seen is any emojis that may be assigned to my friends. See Figures 5, 6, and 7.

Figure 5. Username, User ID, & Display Name.
Figure 6. Friendmojis (Emojis added to my Friends.
Figure 7. Timestamps for when I added friends and when they added me.

Note that the timestamps are for when I originally added the friend/the friend added me. The timestamps here translate back to dates in November of 2018, which is when I originally created the accounts during the creation of my Android Nougat image.

One additional note here. Since everyone is friends with the “Team Snapchat” account, the value for that entry in the “addedTimestamp” column is a good indicator of when the account you’re examining was created.

The next table is a biggie: Messages. I will say that I had some difficulty actually capturing data in this table. The first two attempts involved sending a few messages back and forth, letting the phone sit for a 10 or so minutes, and then extracting the data. In each of those instances, absolutely NO data was left behind in this table.

In order to actually capture the data, I had to leave the phone plugged in to the laptop, send some messages, screenshot the conversation quickly, and then boot into TWRP, which all happened in under two minutes time. If Snapchat is deleting the messages from this table that quickly, they will be extremely hard to capture in the future.

Figure 8 is a screenshot of my conversation (all occurred on 05/30/2019) taken with Fake Account 1 (on the test phone) and Figure 9 shows the table entries. The messages on 05/30/2019 start on Row 6.

Figure 8. A screenshot of the conversation.
Figure 9. Table entries of the conversation.

The columns “timestamp” and “seenTimestamp” are self-explanatory. The column “senderId” is the “id” column from the Friends table. Fake Account 1 (thisisdfir) is senderId 2 and Fake Account 2 (hickdawg957) is senderId 1. The column “feedRowId” tells you who the conversation participants are (beyond the sender). The values link back to the “id” column in the Feed table previously discussed. In this instance, the participants in the conversation are hickdawg957 and thisisdifr.

In case you missed it, Figure 8 actually has two saved messages between these two accounts from December of 2018. Information about those saved messages appear in Rows 1 and 2 in the table. Again, these are relics from previous activity and were not generated during this testing. This is an interesting find as I had completely wiped and reinstalled Android multiple times on this device since the those messages were sent, which leads me to speculate these messages may be saved server-side.

In Figure 10, the “type” column is seen. This column shows the type of message was transmitted. There are three “snap” entries here, but, based on the timestamps, these are not snaps that I sent or received during this testing.

Figure 10. The “types” of messages.

After the “type” column there is a lot of NULL values in a bunch of columns, but you eventually get to the message content, which is seen in Figure 11. Message content is stored as blob data. You’ll also notice there is a column “savedStates.” I am not sure exactly what the entries in the cells are referring to, but they line up with the saved messages.

Figure 11. Message (blob) content.

In Figure 12, I bring up one of the messages that I recently sent.

Figure 12. A sample message.

The next table is “Snaps.” This table is volatile, to say the least. The first data extraction I performed was on 05/22/2019 around 19:00. However, I took multiple pictures and sent multiple snaps on 05/21/2019 around lunch time and the following morning on 05/22/2019. Overall, I sent eight snaps (pictures only) during this time. Figure 13. Shows what I captured during my first data extraction.

Figure 13. I appear to be messing some snaps.

Of the eight snaps that I sent, only six appear in the table. The first two entries in the table pre-date when I started the testing (on 05/21/2019), so those entries are out (they came from Team Snapchat). The first timestamp is from the first snap I sent on 05/22/2019 at 08:24. The two snaps from 05/21/2019 are not here. So, within 24 hours, the data about those snaps had been purged.

On 05/25/2019 I conducted another data extraction after having received a snap and sending two snaps. Figure 14 shows the results.

Figure 14. A day’s worth of snaps.

The entries seen in Figure 13 (save the first two) are gone, but there are two entries there for the snaps I sent. However, there is no entry for the snap I received. I checked all of the tables and there was nothing. I received the snap at 15:18 that day, and performed the extraction at 15:51. Now, I don’t know for sure that a received snap would have been logged. I am sure, however, that it was not there. There may be more testing needed here.

Figure 15 shows the next table, “SendToLastSnapRecipients.” This table shows the user ID of the person I last sent a snap to in the “key” column, and the time at which I sent said snap.

Figure 15. The last snap recipient.

MEMORIES

During the entire testing period I took a total of 13 pictures. Of those 13, I saved 10 of them to “Memories.” Memories is Snapchat’s internal gallery, separate from the phone’s Photos app. After taking a picture and creating an overlay (if desired), you can choose to save the picture, which places it in Memories. If you were to decide to save the picture to your Photos app, Snapchat will allow you to export a copy of the picture (or video).

And here is a plus for examiners/investigators: items placed in Memories are stored server-side. I tested this by signing into Fake Account 1 from an iOS device, and guess what…all of the items I placed in Memories on the Pixel 3 appeared on the iOS device.

Memories can be accessed by swiping up from the bottom of the screen. Figure 16 shows the Snapchat screen after having taken a photo but before snapping (sending) it. Pressing the area in the blue box (bottom left) saves the photo (or video) to Memories. The area in the red box (upper right) are the overlay tools.

Figure 16. The Snapchat screen.

Figure 17 shows the pictures I have in my Memories. Notice that there are only 9 pictures (not 10). More on that in a moment.

Figure 17. My memories. It looks like I am short one picture.

The database memories.db stores relevant information about files that have been saved to Memories. The first table of interest is “memories_entry.” This table contains an “id,” the “snap_id,” and the date the snap was created. There are two columns regarding the time: “created_time” and “latest_created_time.” In Figure 18 there is a few seconds difference between the values in some cells in the two columns, but there are also a few that are the same value. In the cells where there are differences, the differences are negligible.

There is also a column titled “is_private” (seen in Figure 19). This column refers to the My Eyes Only (MEO) feature, which I will discuss shortly. For now, just know that the value of 1 indicates “yes.”

Figure 18. Memories entries.
Figure 19. My Eyes Only status.

(FOR) MY EYES ONLY

I have been seeing a lot of listserv inquires as of late regarding MEO. Cellebrite recently added support for MEO file recovery in Android as of Physical Analyzer 7.19 (iOS to follow), and, after digging around in the memories database, I can see why this would be an issue.

MEO allows a user to protect pictures or videos with a passcode; this passcode is separate from the user’s password for their Snapchat account. A user can opt to use a 4-digit passcode, or a custom alphanumeric passcode. Once a user indicates they want to place a media file in MEO, that file is moved out of the Memories area into MEO (it isn’t copied to MEO).

MEO is basically a private part of Memories. So, just like everything else in Memories, MEO items are also stored server-side. I confirmed this when I signed in to Fake Account 1 from the iOS device; the picture I saved to MEO on the Pixel 3 appeared in MEO on the iOS device. The passcode was the same, too. Snapchat says if a user forgets the passcode to MEO, they cannot help recover it. I’m not sure how true that is, but who knows.

If you recall, I placed 10 pictures in Memories, but Figure 17 only showed 9 pictures. That is because I moved one picture to MEO. Figure 20 shows my MEO gallery.

Figure 20. MEO gallery.

In the memories database, the table “memories_meo_confidential” contains entries about files that have been placed in MEO. See Figure 21.

Figure 21. MEO table in the memories database.

This table contains a “user_id,” the hashed passcode, a “master_key,” and the initialization vector (“iv”). The “master_key” and “initialization vector” are both stored in base64. And, the passcode….well, it has been hashed using bcrypt (ugh). I will add that Cellebrite reports Physical Analyzer 7.19 does have support for accessing MEO files, and, while I did have access to 7.19, I was not able to tell if it was able to access my MEO file since it failed to extract any Snapchat data.

The “user_id” is interesting: “dummy.” I have no idea what that is referring to, and I could not find it anywhere else in the data I extracted.

The next table is “memories_media.” This table. Does have a few tidbits of interesting data: another “id,” the size of the file (“size”), and what type of file (“format”). Since all of my Memories are pictures, all of the cells show “image_jpeg.” See Figures 22 and 23.

Figure 22. “memories_media.”
Figure 23. “memories_media,” part 2.

The next table is “memories_snap.” This table has a lot of information in about my pictures, and brings together data from the other tables in this database. Figure 24 shows a column “media_id,” which corresponds to the “id” in the “memories_media” table discussed earlier. There is also a “creation_time” and “time_zone_id” column. See Figure 24.

Figure 24. id, media_id, creation_time, and time zone.

Figure 25 shows the width and height of the pictures. Also note the column “duration.” The value is 3.0 for each picture. I would be willing to be that number could be higher or lower if the media were videos.

Figure 25 also shows the “memories_entry_id,” which corresponds to the “id” column in the “memories_entry” table. There is also a column for “has_location.” Each of the pictures I placed in Memories has location data associated with it (more on that in a moment).

Figure 25. Picture size, another id, and a location indicator.

Figure 26 is interesting as I have not been able to find the values in the “external_id” or “copy_from_snap_id” columns anywhere.

Figure 26. No clue here.

The data seen in Figure 27 could be very helpful in situations where an examiner/investigator thinks there may be multiple devices in play. The column “snap_create_user_agent” contains information on what version of Snapchat created the the snap, along with the Android version and, in my case, my phone model.

Figure 27. Very helpful.

The column “snap_capture_time” is the time I originally took the picture and not the time I sent the snap.

Figure 28 shows information about the thumbnail associated with each entry.

Figure 28. Thumbnail information.

Figure 29 is just like Figure 27 in its level of value. It contains latitude and longitude of the device when the picture was taken. I plotted each of these entries and I will say that the coordinates are accurate +/- 10 feet. I know the GPS capabilities of every device is different, so just be aware that your mileage may vary.

Figure 29. GPS coordinates!!

Figure 29 also has the column “overlay_size.” This is a good indication if a user has placed an overlay in the picture/video. Overlays are things that are placed in a photo/video after it has been captured. Figure 30 shows an example of an overlay (in the red box). The overlay here is caption text.

Figure 30. An overlay example.

If the value in the overlay_size column is NULL that is a good indication that no overlay was created.

Figure 31 shows the “media_key” and “media_iv,” both of which are in base64. Figure 32 shows the “encrypted_media_key” and “encrypted_media_iv” values. As you can see there is only one entry that has values for these columns; that entry is the picture I placed in MEO.

Figure 31. More base64.
Figure 32. Encrypted stuff.

The next table that may be of interest is “memories_remote_operation.” This shows all of the activity taken within Memories. In the “operation” column, you can see where I added the 10 pictures to Memories (ADD_SNAP_ENTRY_OPERATION). The 11th entry, “UPDATE_PRIVATE_ENTRY_OPERATION,” is where I moved a picture into MEO. See Figure 33.

Figure 33. Remote operations.

The column “serialized_operation” stores information about the operation that was performed. The data appears to be stored in JSON format. The cell contains a lot of the same data that was seen in the “memories_snap” table. I won’t expand it here, but DB Browser for SQLite does a good job of presenting it.

Figure 34 shows a better view of the column plus the “created_timestamp” column. This is the time of when the operation in the entry was performed.

Figure 34. JSON and a timestamp for the operation.

Figure 35 contains the “target_entry” column. The values in these columns refer to the “id”column in the “memories_entry” table.

Figure 35. Operation targets.

To understand the next database, journal, I first have to explain some additional file structure of the com.snapchat.android folder. If you recall all the way back to Figure 1, there was a folder labeled “files.” Entering that folder reveals the times seen in Figure 36. Figure 37 shows the contents of the “file_manager” folder.

Figure 36. “Files” structure.
Figure 37. file_manager.

The first folder of interest here is “media_package_thumb,” the contents of which can be seen in Figure 38.

Figure 38. Thumbnails?

Examining the first file here in hex finds a familiar header: 0xFF D8 FF E0…yoya. These things are actually JPEGs. So, I opened a command line in the folder, typed ren *.* *.jpg and BAM: pictures! See Figure 39.

Figure 39. Pictures!

Notice there are a few duplications here. However, there are some pictures here that were not saved to memories and were not saved anywhere else. As an example, see the picture in Figure 40.

Figure 40. A non-saved, non-screenshot picture.

Figure 40 is a picture of the front of my employer’s building. For documentation purposes, I put a text overlay in the picture with the date/time I took it (to accompany my notes). I then snapped this picture to Fake Account 2, but did not save it to Memories, did not save it to my Photos app, and did not screenshot it. However, here it is, complete with the overlay. Now, while this isn’t the original picture (it is a thumbnail) it can still be very useful; one would need to examine the “snap” table in the main database to see if there was any activity around the MAC times for the thumbnail.

The next folder of interest is the “memories_media” folder. See Figure 41.

Figure 41. Hmm…

There are 10 items here. These are also JPEGs. I performed the same operation here as I did in the “media_package_thumb” folder and got the results seen in Figure 42.

Figure 42. My Memories, sans overlays.

These are the photographs I placed in Memories, but the caption overlays are missing. The picture that is MEO is also here (the file staring with F5FC6BB…). Additionally, these are high resolution pictures.

You may be asking yourself “What happened to the caption overlays?” I’m glad you asked. They are stored in the “memories_overlay” folder. See Figure 43.

Figure 43. My caption overlays.

Just like the previous two folders, these are actually JPEGs. I performed the rename function, and got the results seen in Figure 44. Figure 45 shows the overlay previously seen in Figure 30.

Figure 44. Overlays.
Figure 45. The Megaman overlay from Figure 30.

The folder “memories_thumbnail” is the same as the others, except it contains just the files in Memories (with the overlays). For brevity’s sake, I will just say the methodology to get the pictures to render is the same as before. Just be aware that while I just have pictures in my Memories, a user could put videos in there, too, so you could have a mixture of media. If you do a mass-renaming, and a file does not render, the file extension is probably wrong, so adjust the file extension(s) accordingly.

Now that we have discussed those file folders, let’s get back to the journal database. This database keeps track of everything in the “file_manager” directory, including those things we just discussed. Figure 46 shows the top level of the database’s entries.

Figure 46. First entries in the journal database.

If I filter the “key” column using the term “package” from the “media_package_thumb” folder (the “media_package_thumb.0” files) I get the results seen in Figure 47.

Figure 47. Filtered results.

The values in the “key” column are the file names for the 21 files seen in Figure 38. The values seen in the “last_update_time” column are the timestamps for when I took the pictures. This is a method by which examiners/investigators could potentially recover snaps that have been deleted.

WHAT ELSE IS THERE?

As it turns out, there are a few more, non-database artifacts left behind which are located in the “shared_prefs” folder seen in Figure 1. The contents can be seen in Figure 48.

Figure 48. shared_prefs contents.

The first file is identity_persistent_store.xml seen in Figure 49. The file contains the timestamp for when Snapchat was installed on the device (INSTALL_ON_DEVICE_TIMESTAMP), when the first logon occurred on the device (FIRST_LOGGED_IN_ON_DEVICE_TIMESTAMP), and the last user to logon to the device (LAST_LOGGED_IN_USERNAME).

Figure 49. identity_persistent_store.xml.

Figure 50. shows the file LoginSignupStore.xml. it contains the username that is logged in.

Figure 50. Who is logged in?

The file user_session_shared_pref.xml has quite a bit of account data in it, and is seen in Figure 51. For starters, it contains the display name (key_display_name), the username (key_username), and the phone number associated with the account (key_phone).

The value “key_created_timestamp” is notable. This time stamp converts to November 29, 2018 at 15:13:34 (EST). Based on my notes from my Nougat image, this was around the time I established Fake Account 1, which was used in the creation of the Nougat image. This might be a good indicator of when the account was established, although, you could always get that data from serving Snapchat with legal process.

Rounding it out is the “key_user_id” (seen in the Friends table of the main database) and the email associated with the account (key_email).

Figure 51. user_session_shared_pref.xml

CONCLUSION

Snapchat’s reputation proceeds it very well. I have been in a few situations where examiners/investigators automatically threw up their hands and gave up after having been told that potential evidence was generated/contained in Snapchat. They wouldn’t even try. I will say that while I always have (and will) try to examine anything regardless of what the general concensus is, I did share a bit of others’ skepticism about the ability to recover much data from Snapchat. However, this exercise has shown me that there is plenty of useful data left behind by Snapchat that can give a good look into its usage.

Update

Alexis Brignoni over at Initialization Vectors noticed that I failed to address something in this post. First, thanks to him for reading and contacting me. 🙂 Second, he noticed that I did not address Cellebrite Physical Analyzer’s (v 7.19) and Axiom’s (v 3.0) ability to parse my test Snapchat data (I addressed the extraction portion only).

We both ran the test data against both tools and found both failed to parse any of the databases. Testing found that while Cellebrite found the pictures I describe in this post, it did not apply the correct MAC times to them (from the journal.db). Axiom failed to parse the databases and failed to identify any of the pictures.

This is not in any way shape or form a knock on or an attempt to single out these two tools; these are just the tools to which I happen to have access. These tools work, and I use them regularly. The vendors do a great job keeping up with the latest developments in both the apps and the operating systems. Sometimes, though, app developers will make a hard turn all of a sudden, and it does take time for the vendors to update their tools. Doing so requires R&D and quality control via testing, which can take a while depending on the complexity of the update.

However, this exercise does bring to light an important lesson in our discipline, one that bears repeating: test and know the limitations of your tools. Knowing the limitations allows you to know when you may be missing data/getting errant readings. Being able to compensate for any shortcomings and manually examine the data is a necessary skillset in our discipline.

Thank you Alexis for the catach and assist!

Ridin’ With Apple CarPlay

I have been picking on Google lately.  In fact, all of my blog posts thus far have focused on Google things.  Earlier this year I wrote a blog about Android Auto, Google’s solution for unifying telematic user interfaces (UIs), and in it I mentioned that I am a daily CarPlay driver.  So, in the interest of being fair, I thought I would pick on Apple for a bit and take a look under the hood of CarPlay, Apple’s foray into automotive telematics.

Worldwide, 62 different auto manufacturers make over 500 models that support CarPlay.  Additionally, 6 after-market radio manufacturers (think Pioneer, Kenwood, Clarion, etc.) support CarPlay.  In comparison, 41 auto manufacturers (again, over 500 models – this is an increase since my earlier post) and 19 after-market radio manufacturers support Android Auto.  CarPlay runs on iPhone 5 and later.  It has been a part of iOS since its arrival (in iOS 7.1), so there is no additional app to download (unlike Android Auto).  A driver simply plugs the phone into the car (or wirelessly pairs it if the car supports it) and drives off; a wired connection negates the need for a Bluetooth connection.  The toughest thing about CarPlay setup is deciding how to arrange the apps on the home screen.

In roughly 5 years’ time CarPlay support has grown from 3 to 62 different auto manufacturers.  I can remember shopping for my 2009 Honda (in 2012) and not seeing anything mentioned about hands-free options.  Nowadays, support for CarPlay is a feature item in a lot of car sales advertisements.  With more and more states enacting distracted driving legislation, I believe using these hands-free systems will eventually become mandatory.

Before we get started, let’s take a look at CarPlay’s history.

Looking in the Rearview Mirror

The concept of using an iOS device in a car goes back further than most people realize.  In 2010 BMW announced support for iPod Out, which allowed a driver to use their iPod via an infotainment console in select BMW & Mini models.

iPod Out-1
Figure 1.  iPod Out.  The great-grandparent of CarPlay.

iPod Out-2
Figure 2.  iPod Out (Playback).

The iPod connected to the car via the 30-pin to USB cable, and it would project a UI to the screen in the car.  iPod Out was baked in to iOS 4, so the iPhone 3G, 3GS, 4, and the 2nd and 3rd generation iPod Touches all supported it.  While BMW was the only manufacturer to support iPod Out, any auto manufacturer could have supported it; however, it just wasn’t widely advertised or adopted.

In 2012 Siri Eyes Free was announced at WWDC as part of iOS 6.  Siri Eyes Free would allow a user to summon Siri (then a year old in iOS) via buttons on a steering wheel and issue any command that one could normally issue to Siri.  This differed from iPod Out in that there was no need for a wired-connection.  The car and iOS device (probably a phone at this point) utilized Bluetooth to communicate.  The upside to Siri Eyes Free, beyond the obvious safety feature, was that it could work with any in-car system that could utilize the correct version of the Bluetooth Hands-Free Profile (HFP).  No infotainment center/screen was necessary since it did not need to project a UI.  A handful of auto manufacturers signed on, but widespread uptake was still absent.

At the 2013 WWDC Siri Eyes Free morphed in to iOS in the Car, which was part of iOS 7.  iOS in the Car can be thought of as the parent of CarPlay, and closely resembles what we have today.  There were, however, some aesthetic differences, which can be seen below.

HomeScreen
Figure 3.  Apple’s Eddy Cue presenting iOS in the Car (Home screen).

iOS-in-the-Car-integration-Chevy-Spark-MyLink-720x340
Figure 4.  Phone call in iOS in the Car.

dims
FIgure 5.  Music playback in iOS in the Car.

Screen Shot 2013-06-10 at 12.59.52 PM
Figure 6.  Getting directions.

Screen Shot 2013-06-10 at 2.09.12 PM
Figure 7.  Navigation in iOS in the Car.

iOS in the Car needed a wired connection to the vehicle, or so was the general thought at the time.  During the iOS 7 beta, switches were found indicating that iOS in the Car could, potentially, operate over a wireless connection, and there was even mention of it possibly leveraging AirPlay (more on that later in this post).  Unfortunately, iOS in the Car was not present when iOS 7 was initially released.

The following spring Apple presented CarPlay, and it was later released in iOS 7.1.  At launch there were three auto manufactures that supported it:  Ferrari, Mercedes-Benz, and Volvo.  Personally, I cannot afford cars from any of those companies, so I am glad more manufacturers have added support.

CarPlay has changed very little since its release.  iOS 9 brought wireless pairing capabilities to car models that could support it, iOS 10.3 added recently used apps to the upper left part of the screen, and iOS 12 opened up CarPlay to third party navigation applications (e.g. Google Maps and Waze).  Otherwise, CarPlay’s functionality has stayed the same.

With the history lesson now over, there are a couple of things to mention.  First, this research was conducted using my personal phone, an iPhone XS (model A1920) running iOS 12.2 (build 16E227).  So, while I do have data sets, I will not be posting them online as I did with the Android Auto data.  If you are interested in the test data, contact me through the blog site and we’ll talk.

Second, at least one of the files discussed (the cache file in the locationd path) is in a protected area of iPhone, so there are two ways you can get to it:  jailbreaking iPhone or using a “key” with a color intermediate between black and white. The Springboard and audio data should be present in an iTunes backup or in an extraction from your favorite mobile forensic tool.

Let’s have a look around.

Test Drive

I have been using CarPlay for the past two and a half years.  A majority of that time was with an after-market radio from Pioneer (installed in a 2009 Honda), and the last six months have been with a factory-installed display unit in a 2019 Nissan.  One thing I discovered is that there are some slight aesthetic differences in how each auto manufacturer/after-market radio manufacturer visually implements CarPlay, so your visual mileage may vary.  However, the functionality is the same across the board.  CarPlay works just like iPhone.

Figure 8 shows the home screen of CarPlay.

IMG_0769 2
Figure 8.  CarPlay’s home screen.

The home screen looks and operates just like iPhone, which was probably the idea.  Apple did not want users to have a large learning curve when trying to use CarPlay.  Each icon represents an app, and the apps are arranged in rows and columns.  Unlike iPhone, creating folders is not an option, so it is easy to have multiple home screens. The icons are large enough to where not much fine motor skill is necessary to press one, which means you probably won’t be hunting for or pressing the wrong app icon very often.

The button in the orange box is the home button.  It is persistent across the UI, and it works like the iPhone home button:  press it while anywhere and you are taken back to the home screen.  The area in the blue box indicates there are two home screens available, and the area in the red box shows the most recently used apps.

Most of the apps should be familiar to iPhone users, but there is one that is not seen on iPhone:  the Now Playing app.  This thing is not actually an app…it can be thought of more like a shortcut.  Pressing it will bring up whatever app currently has control of the virtual sound interface of CoreAudio (i.e. whatever app is currently playing or last played audio if that app is suspended in iPhone’s background).

Swiping left, shows my second home screen (Figure 9).  The area in the red box is the OEM app.  If I were to press it, I would exit the CarPlay UI and would return to Nissan Connect (Nissan’s telematic system); however, CarPlay is still running in the background.  The OEM app icon will change depending on the auto maker.  So, for example, if you were driving a Honda, this icon would be different.

IMG_0771 1.jpg
Figure 9.  The second batch of apps on the second home screen.

A user can arrange the apps any way they choose and there are two ways of doing this, both of which are like iPhone.  The first way is to press and hold an app on the car display unit, and then drag it to its desired location.  The second way is done from the screen seen in Figure 10.

IMG_0801.JPG
Figure 10.  CarPlay settings screen.

The screen in Figure 10 can be found on iPhone by navigating to Settings > General > CarPlay and selecting the CarPlay unit (or units – you can have multiple)…mine is “NissanConnect.”  Moving apps arounds is the same here as it is on the display unit (instructions are present midway down the screen).  Apps that have a minus sign badge can be removed from the CarPlay home screen.  When an app is removed it is relegated to the area just below the CarPlay screen; in Figure 10 that area holds the MLB AtBat app, AudioBooks (iBooks), and WhatsApp.  If I wanted to add any relegated apps to the CarPlay home screen I could do so by pushing the plus sign badge.  Some apps cannot be relegated:  Phone, Messages, Maps, Now Playing, Music, and the OEM app.  Everything else can be relegated.

One thing to note here.  iOS considers the car to be a USB accessory, so CarPlay does have to abide by the USB Restricted Mode setting on iPhone (if enabled).  This is regardless of whether the Allow CarPlay While Locked toggle switch is set to the on position.

The following screenshots show music playback (Figure 11), navigation (Figure 12), and podcast playback (Figure 13).

IMG_0796.PNG
Figure 11.  Music playback.

IMG_0782.PNG
Figure 12.  Navigation in CarPlay.

IMG_0794.PNG
Figure 13.  Podcast playback.

Messages in CarPlay is a stripped-down version of Messages on iPhone.  The app will display a list of conversations (see Figure 14), but it will not display text of the conversations (Apple obviously doesn’t want a driver reading while driving).  Instead, Siri is used for both reading and dictating messages.

IMG_0792.jpg
Figure 14.  Messages conversation list.

Phone is seen in Figures 15; specifically, the Favorites tab.  The tabs at the top of the screens mirror those that are seen on the bottom in the Phone app on iPhone (Favorites, Recents, Contacts, Keypad, and Voicemail).  Those tabs look just like those seen in iPhone.

IMG_0790
Figure 15.  Phone favorites.

IMG_0805
Figure 16.  The keypad in Phone.

If I receive a phone call, I can answer it in two ways:  pressing the green accept button (seen in Figure 17) or pushing the telephone button on my steering wheel.  Answering the call changes the screen to the one seen in Figure 18.  Some of the items in Figure 18 look similar to those seen in iOS in the Car (Figure 4).

IMG_0807
Figure 17.  An incoming call.

IMG_0809
Figure 18.  An active phone call.

Most apps will appear like those pictured above, although, there may be some slight visual/functional differences depending on the app’s purpose, and, again, there may be some further visual differences depending on what car or after-market radio you are using.

Speaking of purpose, CarPlay is designed to do three things:  voice communication, audio playback, and navigation.  These things can be done fairly well through CarPlay, and done safely, which, I believe, is the main purpose.  Obviously, some popular apps, such as Twitter or Facebook, don’t work well in a car, so I don’t expect true social media apps to be in CarPlay any time soon if at all (I could be wrong).

Now that we have had a tour, let’s take a look under the hood and see what artifacts, if any, can be found.

Under the Hood

After snooping around in iOS for a bit I came to a realization that CarPlay is forensically similar to Android Auto:  it merely projects the apps that can work with it on to the car’s display unit, so the individual apps contain a majority of the user-generated data.  Also, like Android Auto, CarPlay does leave behind some artifacts that may be valuable to forensic examiners/investigators,  and, just like any other artifacts an examiner may find, these can be used in conjunction with other data sources to get a wholistic picture of a device.

One of the first artifacts that I found is the cache.plist file under locationd.  It can be found in the private > var > root > Library > Caches > locationd path.  cache.plist contains the times of last connect and last disconnect.  I did not expect to find connection times in the cache file of the location daemon, so this was a pleasant surprise.  See Figure 19.

LastVehicleConnection.jpg
Figure 19.  Last connect and last disconnect times.

There are actually three timestamps here, two of which I have identified.  The timestamp in the red box is the last time I connected to my car. It is stored in CF Absolute Time (aka Mac Absolute Time), which is the number of seconds since January 1, 2001 00:00:00 UTC.  The time, 576763615.86389804, converts to April 12, 2019 at 8:06:56 AM (EDT).  I had stopped at my favorite coffee shop on the way to work and when I hopped back in the car, I plugged in my iPhone and CarPlay initialized.  See Figure 20.

LastConnectTime
Figure 20.  Time of last connect.

The time stamp in the green box just under the string CarKit NissanConnect, is a bit deceptive.  It is the time I disconnected from my car.  Decoding it converts it to April 12, 2019 at 8:26:18 AM (EDT).  Here, I disconnected from my car, walked into work, and badged in at 8:27:14 AM (EDT).  See Figure 21.

LastDisconnectTime
Figure 21.  Time of last disconnect.

The time in the middle, 576764725.40157998, is just under a minute before the timestamp in the green box.  Based on my notes, it is the time I stopped playback on a podcast that I was listening to at the time I parked.  I also checked KnowledgeC.db (via DB Browser for SQLite) and found an entry in it for “Cached Locations,” with the GPS coordinates being where I parked in my employer’s parking lot.  Whether the middle timestamp represents the time the last action was taken in CarPlay is a good question and requires more testing.

The next file of interest here is the com.apple.carplay.plist file.  It can be found by navigating to the private > var > mobile > Library > Preferences path.  See Figure 22.

CarPlay-Plist
Figure 22.  carplay.plist

The area in the red box is of interest.  Here the name of the car that was paired is seen (NissanConnect) along with a GUID.  The fact that the term “pairings” (plural) is there along with a GUID leads me to believe that multiple cars can be paired with the same iPhone, but I wasn’t able to test this as I am the only person I know that has a CarPlay capable car.  Remember the GUID because it is seen again in discussing the next artifact.  For now, see Figure 23.

IMG_0802.JPG
Figure 23.  Main CarPlay setting page in iOS.

Figure 23 shows the settings page just above the one seen in Figure 10.  I show this merely to show that my car is labeled “NissanConnect.”

The next file is 10310139-130B-44F2-A862-7095C7AAE059-CarDisplayIconState.plist.  It can be found in the private > var > mobile > Library > Springboard path.  The first part of the file name should look familiar…it is the GUID seen in the com.apple.carplay.plist file.  This file describes the layout of the home screen (or screens if you have more than one).  I found other files in the same path with the CarDisplayIconState string in their file names, but with different GUIDs, which causes me to further speculate that multiple CarPlay units can be synced with one iPhone.  See Figure 24.

IconList-Plist-1
Figure 24.  CarPlay Display Icon State.

The area in the red and blue boxes represent my home screens.  The top-level Item in the red box, Item 0, represents my first home screen, and the sub-item numbers represent the location of each icon on the first home screen.  See Figure 25 for the translation.

IMG_0769
Figure 25.  Home screen # 1 layout.

The area in the blue box in Figure 24 represents my second home screen, and, again, the sub-item numbers represent the location of each icon on the screen.  See Figure 26 for the translation.

IMG_0771
Figure 26.  Home screen # 2 layout.

The entry below the blue box in Figure 24 is labeled “metadata.”  Figure 27 shows it in an expanded format.

IconList-Plist-2
Figure 27.  Icon state “metadata.”

The areas in the green and purple boxes indicate that the OEM app icon is displayed, and that it is “Nissan” (seen in Figure 26).  The areas in the orange and blue boxes describe how the app icon layout should be (four columns and two rows).  The area in the red box is labeled “hiddenIcons,” and refers to the relegated apps previously seen in Figure 10.  As it turns out, the items numbers also describe their position.  See Figure 28.

IMG_0801
Figure 28.  Hidden icon layout.

Notice that this file did not describe the location of the most recently used apps in CarPlay (the area in the upper left portion of the display screen).  That information is described in com.apple.springboard, which is found in the same path.  See Figure 29.

RecentlyUsedLayout
Figure 29.  Springboard and most recently used apps.

Just like the app icon layout previously discussed, the item numbers for each most recently used app translate to positions on the display screen.  See Figure 30 for the translation.

IMG_0769 1
Figure 30.  Most recently used apps positions.

The next file is the com.apple.celestial.plist, which is found in the private > var > mobile > Library > Preferences path.  This file had a bunch of data in it, but there are three values in this file that are relevant to CarPlay.  See Figure 31.

Celestial.JPG
Figure 31.  Celestial.

The string in the green box represents what app had last played audio within CarPlay prior to iPhone being disconnected from the car.  The area in blue box is self-explanatory (I had stopped my podcast when I parked my car).  The item in the red box is interesting.  I had been playing a podcast when I parked the car and had stopped playback.  Before I disconnected my iPhone, I brought the Music app to the foreground, but did not have it play any music, thus it never took control of the virtual sound interface in CoreAudio. By doing this, the string in the red box was generated.  Just to confirm this, I tested this scenario a second time, but did not bring the Music app to the foreground; the value nowPlayingAppDisplayIDUponCarPlayDisconnect was not present in the second plist file.  I am sure this key has some operational value, although I am not sure what that value is.  If anyone has any idea, please let me know.

As I mentioned earlier in this post, Siri does a lot of the heavy lifting in CarPlay because Apple doesn’t want you messing with your phone while you’re driving.  So, I decided to look for anything Siri-related, and I did find one thing…although I will say that this  is probably not exclusive to CarPlay.  I think this may be present regardless of whether it occurs in CarPlay or not (more testing).  In the path private > var > mobile > Library > Assistant there is a plist file named PreviousConversation (there is no file extension but the file header indicates it is a bplist).  Let me provide some context.

When I pick up my child from daycare in the afternoons, I will ask Siri to send a message, via CarPlay, to my spouse indicating that my child and I are on the way home, and she usually acknowledges.  The afternoon before I extracted the data from my iPhone (04/11/2019), I had done just that, and, after a delay, my spouse had replied “Ok.”

PreviousConversation contains the last conversation I had with Siri during this session. When I received the message, I hit the notification I received at the top of the CarPlay screen, which triggered Siri.  The session went as so:

Siri:                 “[Spouse’s name] said Ok.  Would you like to reply?”

Me:                  “No.”

Siri:                 “Ok.”

See Figure 32.

IncomingMessage.JPG
FIgure 32.  Session with Siri.

The area in the red box is the name of the sender, in this case, my spouse’s (redacted) name.  The orange box was spoken by Siri, and the blue box is the actual iMessage I received from my spouse.  The purple box is what was read to me, minus the actual iMessage.  Siri’s inquiry (about my desire to reply) is seen in Figure 33.

WouldYouLikeToReply.PNG
Figure 33.  Would you like to reply?

Figure 34 contains the values of the message sender (my spouse).  Inside of the red box the field “data” contains the iMessage identifier…in this case, my spouse’s phone number.  The field “displayText” is my spouse’s name (presumably pulled from my Contact’s list).  Figure 35 has the message recipient information:  me.

MessageSender.PNG
Figure 34.  Message sender.

MessageRecipient.PNG
Figure 35.  Message recipient (me) plus timestamp.

Figure 35 also has the timestamp of when the message was received (orange box), along with my spouse’s chat identifier (blue box).

Siri-OK.PNG
Figure 36.  Siri’s response.

Figure 36 shows Siri’s last response to me before the session ended.

Interesting note:  this plist file had other interesting data in it.  One thing that I noticed is that each possible response to the inquiry “Would you like to reply?” had an entry in here:  “Call” (the message sender), “Yes” (I’d like to reply), and “No” (I would not like to reply).  It might be a good research project for someone.  🙂

The next artifact actually comes from a file previously discussed:  com.apple.celestial.plist.  While examining this file I found something interesting that bears mentioning in this post.  My iPhone has never been paired via Bluetooth with my 2019 Nissan.  When I purchased the car, I immediately started using CarPlay, so there has been no need to use Bluetooth (other than testing Android Auto).  Under the endointTypeInfo key I found the area seen in Figure 37.

CarBT.jpg
Figure 37.  What is this doing here?

The keys in the red box contain the Bluetooth MAC address for my car.  I double-checked my Bluetooth settings on the phone and the car, and the car Bluetooth radio was turned off, but the phone’s radio was on (due to my AppleWatch).  So, how does my iPhone have the Bluetooth MAC address for my car?  I do have a theory, so stay with me for just a second.  See Figure 38.

IMG_0814
Figure 38.  AirPlay indicator.

Figure 38 shows the home screen of my iPhone while CarPlay is running.  Notice that the AirPlay/Bluetooth indicator is enabled (red box).  Based on some great reverse engineering, it was found that any device that uses the AirPlay service will use its MAC address in order to identify itself (deviceid).  Now, see Figure 39.

AudioInterfaces
Figure 39. Virtual Audio Interfaces for AirPlay and CarPlay.

Figure 39 shows two files, both of which are in the Library > Audio > Plugins > HAL path.  The file on the left is the info.plist file for the Halogen driver (the virtual audio interface) for AirPlay and the file on the right is the info.plist file for the Halogen driver for CarPlay.  The plug-in identifiers for each (both starting with EEA5773D) are the same.  My theory is that CarPlay may be utilizing AirPlay protocols in order to function, at least for audio.  I know this is a stretch as those of us that use AirPlay know that it typically is done over a wireless connection, but I think there is a small argument to be made here.  Obviously, this requires more research and testing, and it is beyond the scope of this post.

Conclusion

CarPlay is Apple’s attempt at (safely) getting into your car.  It provides a singular screen experience between iPhone and the car, and it encourages safe driving.  While a majority of the user-generated artifacts are kept by the individual apps that are used, there are artifacts specific to CarPlay that are left behind.  The app icon layout, time last connected and disconnected, and last used app can all be found in these artifacts.  There are also some ancillary artifacts that may also be useful to examiners/investigators.

It has been a long time since I really dug around in iOS, and I saw a lot of interesting things that I think would be great to research, so I may be picking on Apple again in the near future.

Android Pie (9.0) Image Is Available. Come Get A Piece!

In continuing the series of created Android images, I’d like to announce an Android Pie (9.0) image is now available for download.   Unfortunately, I had to retire the LG Nexus 5X (it topped out at Oreo), so this time I used a Google Pixel 3. The image contains user-populated data within the stock Android apps and 24 non-stock apps.  It includes some new, privacy-centered messaging apps:  Wickr Me, Silent Phone, and Dust.

As with the Nougat and Oreo images, this one includes robust documentation; however, there are some differences in the files being made available.  First, there is no .ufd file.  Second, there is no takeout data.  It appeared, based on the traffic for the last two images, there was little interest, so I did not get takeout data this time.  If enough interest is expressed, I will get it.

Third…and this is a biggie…there are multiple files.  The first file, sda.bin (contained within sda.7z), is an image of the entire phone.   This file contains all of the partitions of the phone in an unencrypted format…except for the /data partition (i.e. sda21 or Partition 21), which is encrypted. I tried every method I could think of to get a completely unencrypted image, but was unable to do so.  I suspect the Titan M chip may have something to do with this but I need to study the phone and Android Pie further to confirm or disprove.  Regardless, I am including this file so the partition layout and the unencrypted areas can be studied and examined.  I will say there are some differences between Pie’s partition layout and the layout of previous flavors of Android.

The sda.bin file is 64 GBs in size (11 GB compressed), so make sure you have enough room for the decompressed file.

The second file, Google Pixel 3.tar, is the unencrypted /data partition.  Combined with the sda.bin file, you have a complete picture of the phone.

And finally, there is a folder called “Messages,” which contains two Excel spreadsheets that have MMS and SMS messages from the Messages app.  There were way too many messages for me to type out in the documentation this time, so I just exported them to spreadsheets.  I can confirm that both spreadsheets are accurate.

This image is freely available to anyone who wants it for training, education, testing, or research.

Once Android Q gets further along in beta I will began work on an image for it, so, for the time being, this will be it. 🙂

Please note the images and related materials are hosted by Digital Corpora.  You can find everything here.

Google Search Bar & Search Term History – Are You Finding Everything?

Search history.  It is an excellent way to peer into someone’s mind and see what they are thinking at a particular moment in time.  In a court room, search history can be used to show intent (mens rea).  There are plenty of examples where search history has been used in court to establish a defendant’s intent.  Probably the most gruesome was the New York City Cannibal Cop trial, where prosecutors used the accused’s search history against him.  Of course, there is a fine line between intent and protected speech under the First Amendment.

Over the past month and a half I have published a couple of blog posts dealing with Google Assistant and some of the artifacts it leaves behind, which you can find here and here.  While poking around I found additional artifacts present in the same area that have nothing to do with Google Assistant:  search terms.

While I wasn’t surprised, I was; after all, the folder where this data was found had “search” in the title (com.google.android.googlequicksearchbox).  The surprising thing about these search terms is that they are unique to this particular area in Android; they do not appear anywhere else, so it is possible that you or I (or both) could have been missing pertinent artifacts in our examinations (I have missed something).  Conducting a search via this method can trigger Google Chrome to go to a particular location on the Internet, but the term used to conduct the search is missing from the usual spot in the History.db file in Chrome.

My background research on the Google Search Bar (as it is now known) found that this feature may not be used as much as, say, the search/URL bar inside Chrome.  In fact, there are numerous tutorials online that show a user how to remove the Google Search Bar from Android’s Home Screen, presumably to make more space for home screen icons.  I will say, however, that while creating two Android images (Nougat and Oreo), having that search bar there was handy, so I can’t figure out why people wouldn’t use it more.  But, I digress…

Before I get started there are a few things to note.  First, the data for this post comes from two different flavors of Android:  Nougat (7.1.2) and Oreo (8.1).  The images can be found here and here, respectively.  Second, the device used for each image was the same (LG Nexus 5X), and it was rooted both times using TWRP and Magisk.  Third, I will not provide a file structure breakdown here as I did in the Google Assistant blog posts.  This post will focus on the pertinent contents along with content markers within the binarypb files.  I found the binarypb files related to Google Search Bar activity to contain way more protobuff data than those from Google Assistant, so a file structure breakdown is impractical.

Finally, I thought it might be a good idea to give some historical context about this feature by taking a trip down memory lane.

A Quick Background

Back in 2009 Google introduced what, at the time, it called Quick Search Box for Android for Android 1.6 (Doughnut).  It was designed as a place a user could go to type a word or phrase and search not only the local device but also the Internet.  Developers could adjust their app to expose services and content to Quick Search Box so returned results would include their app.  The neat thing about this feature was that it was contextually/location aware, so, for example, I could type the word “weather” and it would display the weather conditions for my current location.  All of this could occur without the need of another app on the phone (depending on the search).

QSB-Doughnut

Google Quick Search Box – circa 2009.

Searching.png

Showtimes…which one do you want?

Prior to Google Assistant, Quick Search Box had a vocal input feature (the microphone icon) that could execute commands (e.g. call Mike’s mobile) and that was about it.  Compared to today this seems archaic, but, at the time, it was cutting edge.

VocalInput.png

Yes, I’m listening.

Fast forward three years to 2012’s Jelly Bean (4.1).  By that time Quick Search Bar (QSB) had been replaced by Google Now, Google’s search and prediction service.  If we were doing Ancestry.com or 23andMe, Google Now would definitely be a genetic relative of Google Search Bar/Google Assistant.  The resemblance is uncanny.

android_41_jelly_bean_ss_08_verge_300.jpg

Mom, is that you?  Google Now in Jelly Bean

The following year, Kit Kat allowed a device to start listening for the hotword “Ok, Google.”  The next big iteration was Now on Tap in 2015’s Marshmallow (6.x), and, with the arrival of Oreo (8.x) we have what we now know today as Google Assistant and the Google Search Bar (GSB).   Recently in Android Pie (9.x) GSB moved from the top part of the home screen to the bottom.

old-navbar-1080x1920

Google Search Bar/Google Assistant at the bottom in Android Pie (9.x).

As of the Fall of 2018 Nougat and Oreo accounted for over half of the total Android install base.  Since I had access to images of both flavors and conducted research on both, the following discussion covers both.  There were a few differences between the two systems, which I will note, but, overall, there was no major divergence.

To understand where GSB lives and the data available, let’s review…

Review Time

GSB and Google Assistant are roommates in both Nougat and Oreo; they both reside in the /data/data directory in the folder com.google.android.googlequicksearchbox.  See Figure 1.

galisting

Figure 1.  GSB & Google Assistant’s home in Android.

This folder holds data about searches that are done from GSB along with vocal input generated by interacting with Google Assistant.  The folder has the usual suspect folders along with several others.  See Figure 2 for the folder listings.

galisting-infile

Figure 2.  Folder listing inside of the googlequicksearchbox folder.

The folder of interest here is app_session.  This folder has a great deal of data, but just looking at what is here one would not suspect anything.  The folder contains several binarypb files, which are binary protocol buffer files.  These files are Google’s home-grown, XML-ish rival to JSON files.  They contain data that is relevant to how a user interacts with their device via Google Assistant and GSB.    See Figure 3.

Figure 3.PNG

Figure 3.  binarypb file (Nougat).

A good deal of the overall structure of these binarypb files differ from those generated by Google Assistant.  I found the GSB binarypb files not easy to read compared to the Google Assistant files.  However, the concept is similar:  there are markers that allow an examiner to quickly locate and identify the pertinent data.

Down in the Weeds

To start, I chose 18551.binarypb in the Nougat image (7.1.2)This search occurred on 11/30/2018 at 03:55 PM (EST).  The search was conducted while the phone was sitting on my desk in front of me, unlocked and displaying the home screen.  The term I typed in to the GSB was “dfir.”  I was presented with a few choices, and then chose the option that took me to the “AboutDFIR” website via Google Chrome.  The beginning of the file appears in Figure 4.

Figure 4.PNG

Figure 4.  Oh hello!

While not a complete match, this structure is slightly similar to that of the Google Assistant binarypb files.  The big takeaway here is the “search” in the blue box.  This is what this file represents/where the request is coming from.  The BNDLs in the red boxes are familiar to those who have read the Google Assistant posts.  While BNDLs are scattered throughout these files, it is difficult to determine where the individual transactions occur within the binarypb files, thus I will ignore them for the remainder of the post.

Scrolling down a bit finds the first area of interest seen in Figure 5.

Figure 5.PNG

Figure 5.  This looks familar.

In the Google Assistant files, there was an 8-byte string that appeared just before each vocal input.  Here there is a four-byte string (0x40404004 – green box) that appears before the search term (purple box).  Also present is a time stamp in Unix Epoch Time format (red box).  The string, 0x97C3676667010000 is read little endian and converted to decimal.  Here, that value is 1543611335575.

Figure 6.PNG

Figure 6.  The results of the decimal conversion.

This time is the time I conducted the search from GSB on the home screen.

Down further is the area seen in Figure 7.   The bit in the orange box looks like the Java wrappers in the Google Assistant files.  The string webj and.gsa.widget.text* search dfir and.gsa.widget.text has my search term “dfir” wrapped in two strings:  “and.gsa.widget.txt.”  Based on Android naming schemas, I believe this to be “Android Google Search Assistant Widget” with text.  This is speculation on my part as I haven’t been able to find anything that confirms or denies this.

Figure 7.PNG

Figure 7.  More search information.

The 4-byte string (green box), my search term (purple box), and the time stamp (red box) are all here.  Additionally, is the string in the blue box.  The string, a 5-byte string 0xBAF1C8F803, is something seen in Google Assistant files.  In the Google Assistant files, this string appeared just prior to the first vocal input in a binarypb file, regardless of when, chronologically, it occurred during the session (remember, the last thing chronologically in the session was the first thing in those binarypb files).  Here, this string occurs at the second appearance of the search term.

Traveling further, I find the area depicted in Figure 8.  This area of the file is very similar to that of the Google Assistant files.

Figure 8.PNG

Figure 8.  A familar layout.

The 16-byte string ending in 0x12 in the blue box is one that was seen in the Google Assistant files.  In those files I postulated this string marked the end of a vocal transaction.  Here, it appears to be doing the same thing.  Just after that, a BNDL appears, then the 4-byte string in the green box, and finally my “dfir” search term (purple box).  Just below this area, in Figure 9, there is a string “android.search.extra.EVENT_ID” and what appears to be some type of identifier (orange box).  Just below that, is the same time stamp from before (red box).

Figure 9.PNG

Figure 9.  An identifier.

I am showing Figure 10 just to show a similarity between GSB and Google Assistant files.  In Google Assistant, there was a 16-byte string at the end of the file that looked like the one shown in Figure 8, but it ended in 0x18 instead of 0x12.  In GSB files, that string is not present.  Part of it is, but not all of it (see the red box).  What is present is the and.gsa.d.ssc. string (blue box), which was also present in Google Assistant files.

Figure 10.PNG

Figure 10.  The end (?).

The next file I chose was 33572.binarypb.  This search occurred on 12/04/2018 at 08:48 AM (EST).  The search was conducted while the phone was sitting on my desk in front of me, unlocked and displaying the home screen.  The term I typed in to the GSB was “nist cfreds.”  I was presented with a few choices, and then chose the option that took me to NIST’s CFReDS Project website via Google Chrome.  The beginning of the file appears in Figure 11.

Figure 11.PNG

Figure 11.  Looks the same.

This looks just about the same as Figure 4.  As before, the pertinent piece is the “search” in the blue box.  Traveling past a lot of protobuff data, I arrive at the area shown in Figure 12.

Figure 12.PNG

Figure 12.  The same, but not.

Other than the search term (purple box) and time stamp (red box) this looks just like Figure 5.  The time stamp converts to decimal 1543931294855 (Unix Epoch Time).  See Figure 13.

Figure 13.PNG

Figure 13.  Looks right.

As before, this was the time that I had conducted the search in GSB.

Figure 14 recycles what was seen in Figure 7.

Figure 14.PNG

Figure 14.  Same as Figure 7.

Figure 15 is a repeat of what was seen in Figures 8 and 9.

Figure 15.PNG

Figure 15.  Same as Figures 8 & 9.

While I am not showing it here, just know that the end of this file looks the same as the first (seen in Figure 10).

In both instances, after having received a set of results, I chose ones that I knew would trigger Google Chrome, so I thought there would be some traces of my activities there.  I started looking at the History.db file, which shows a great deal of Google Chrome activity.  If you aren’t familiar, you can find it in the data\com.android.chrome\app_chrome\Default folder.  I used ol’ trusty DB Browser for SQLite (version 3.10.1) to view the contents.

As it turns out, I was partially correct.

Figure 16 shows the table “keyword_search_terms” in the History.db file.

Figure 16.PNG

Figure 16.  Something(s) is missing.

This table shows search terms used Google Chrome.  The term shown, “george hw bush,” is one that that I conducted via Chrome on 12/01/2018 at 08:35 AM (EST).  The terms I typed in to GSB to conduct my searches, “dfir” and “nist cfreds,” do not appear.  However, viewing the table “urls,” a table that shows the browsing history for my test Google account, you can see when I went to the AboutDFIR and CFReDS Project websites.  See Figures 17 and 18.

Figure 17

Figure 17.  My visit to About DFIR.

Figure 18.PNG

Figure 18.  My visit to NIST’s CFReDS.

The column “last_visit_time” stores the time of last visit to the site seen in the “url” column.  The times are stored in Google Chrome Time (aka WebKit time), which is a 64-bit value in microseconds since 01/01/1601 at 00:00 (UTC).  Figure 19 shows the time I visited AboutDFIR and Figure 20 shows the time I visited CFReDS.

Figure 19

Figure 19.  Time of my visit to About DFIR.

Figure 20

Figure 20.  Time of my visit to NIST’s CFReDS.

I finished searching the Chrome directory and did not find any traces of the search terms I was looking for, so I went back over to the GSB directory and looked there (other than the binarypb files).  Still nothing.  In fact, I did not find any trace of the search terms other than in the binarypb files.  As a last-ditch effort, I ran a raw keyword search across the entire Nougat image, and still did not find anything.

This could potentially be a problem.  Could it be that we are missing parts of the search history in Android?  The History.db file is a great and easy place to look and I am certain the vendors are parsing that file, but are the tool vendors looking at and parsing the binarypb files, too?

As I previously mentioned, I also had access to an Oreo image, so I loaded that one up and navigated to the com.google.android.googlequicksearchbox\app_session folder.  Figure 21 shows the file listing.

Figure 21.PNG

Figure 21.  File listing for Oreo.

The file I chose here was 26719.binarypb.  This search occurred on 02/02/2019 at 08:48 PM (EST).  The search was conducted while the phone was sitting in front of me, unlocked and displaying the home screen.  The term I typed in to the GSB was “apple macintosh classic.”  I was presented with a few choices but took no action beyond that.  Figure 22 shows the beginning of the file in which the “search” string can be seen in the blue box.

Figure 22.PNG

Figure 22.  Top of the new file.

Figure 23 shows an area just about identical to that seen in Nougat (Figures 5 and 12).  My search term can be seen in the purple box and a time stamp in the red box.  The time stamp converts to decimal 1549158503573 (Unix Epoch Time).  The results can be seen in Figure 24.

Figure 23.PNG

Figure 23.  An old friend.

Figure 24

Figure 24.  Time when I searched for “apple macintosh classic.”

Figure 23 does show a spot where Oreo differs from Nougat.  The 4-byte in the green box that appears just before the search term, 0x50404004, is different.  In Nougat, the first byte is 0x40, and here it is 0x50.  A small change, but a change, nonetheless.

Figure 25 shows a few things that appeared in Nougat (Figures 7 & 14).

Figure 25

Figure 25.  The same as Figures 7 & 14.

As seen, the search term is in the purple box, the search term is wrapped in the orange box, the 4-byte string appears in the green box, and the 5-byte string seen in the Nougat and the Google Assistant files is present (blue box).

Figure 26 shows the same objects as those in the Nougat files (Figures 8, 9, & 15).  The 16-byte string ending in 0x12, the 4-byte string (green box), my search term (purple box), some type of identifier (orange box), and the time stamp (red box).

Figure 26.PNG

Figure 26.  Looks familar…again.

While not depicted in this post, the end of the file looks identical to those seen in the Nougat files.

Just like before, I traveled to the History.db file to look at the “keyword_search_terms” table to see if I could find any artifacts left behind.  See Figure 27.

Figure 27.PNG

Figure 27.  Something is missing…again.

My search term, “apple macintosh classic,” is missing.  Again.  I looked back at the rest of the GSB directory and struck out.  Again.  I then ran a raw keyword search against the entire image.  Nothing.  Again.

Out of curiosity, I decided to try two popular forensic tools to see if they would find these search terms.  The first tool I tried was Cellebrite Physical Analyzer (Version 7.15.1.1).  I ran both images through PA, and the only search terms I saw (in the parsed data area of PA) were the ones that were present in Figures 16 & 27; these terms were pulled from the “keyword_search_terms” table in the History.db file.  I ran a search across both images (from the PA search bar) using the keywords “dfir,”“cfreds,” and “apple macintosh classic.”  The only returned hits were the ones from the “urls” table in the History.db file  of the Nougat image; the search term in the Oreo image (“apple macintosh classic”) did not show up at all.

Next, I tried Internet Evidence Finder (Version 6.23.1.15677).  The Returned Artifacts found the same ones Physical Analyzer did and from the same location but did not find the search terms from GSB.

So, two tools that have a good foot print in the digital forensic community missed my search terms from GSB.  My intentions here are not to to speak ill of either Cellebrite or Magnet Forensics, but to show that our tools may not be getting everything that is available (the vendors can’t research everything).  It is repeated often in our discipline, but it does bear repeating here:  always test your tools.

There is a silver lining here, though.  Just to check, I examined my Google Takeout data, and, as it turns out, these searches were present in what was provided by Google.

Conclusion

Search terms and search history are great evidence.  They provide insight in to a user’s mindset and can be compelling evidence in a court room, civil or criminal.  Google Search Bar provides users a quick and convenient way to conduct searches from their home screen without opening any apps.  These convenient searches can be spontaneous and, thus, dangerous; a user could conduct a search without much thought given to the consequences or how it may look to third parties.  The spontaneity can be very revealing.

Two major/popular forensic tools did not locate the search terms from Google Search Bar, so it is possible examiners are missing search terms/history.  I will be the first to admit, now that I know this, that I have probably missed a search term or two.  If you think a user conducted a search and you’re not seeing the search term(s) in the usual spot, try the area discussed in this post.

And remember:  Always.  Test.  Your.  Tools.

Update

A few days after this blog post was published, I had a chance to test Cellebrite Physical Analyzer, version 7.16.0.93.  This version does parse the .binarypb files, although you will get multiple entries for the same search, and some entries may have different timestamps.  So, caveat emptor; it will be up to you/the investigator/both of you to determine which is accurate.

I also have had some time to discuss this subject further with Phil Moore (This Week in 4n6), who has done a bit of work with protobuf files (Spotify and the KnowledgeC database).  The thought was to use Google’s protoc.exe (found here) to encode the .binarypb files and then try to decode the respective fields.  Theoretically, this would make it slightly easier than manually cruising through the hexadecimal and decoding the time manually.  To test this, I ran the file 26719.binarypb through protoc.exe.  You can see the results for yourself in Figures 28, 29, and 30, with particular attention being paid to Figure 29.

Figure 28

Figure 28. Beginning of protoc output.

 

Figure 29

Figure 29.  Middle part of the protoc output (spaces added for readability).

 

Figure 30

Figure 30.  Footer of the protoc output.

In Figure 28 the “search” string is identified nicely, so a user could easily see that this represents a search, but you can also see there is a bunch of non-sensical data grouped in octets.  These octets represent the data in the .binarypb file, but how it lines up with the hexadecimal values/ASCII values is anyone’s guess.  It is my understanding that there is a bit of educated guessing that occurs when attempting to decode this type of data.  Since protobuf data is serialized and the programmers have carte blanche in determining what key/value pairs exist, the octets could represent anything.

That being said, the lone educated guess I have is that the octet 377 represents 0xFF.  I counted the number of 377’s backwards from the end of the octal time (described below) and found that they matched (24 – there were 24 0xFF’s that proceeded the time stamp seen in Figure 23).  Again, speculation on my part.

Figure 29 is the middle of the output (I added spaces for readability).  The area in the red box, as discovered by Phil, is believed to the be the timestamp, but in an octal (base-8) format…sneaky, Google.  The question mark at the end of the string lines up with the question mark seen at the end of each timestamp seen in the figures of this article.  The area in the green box shows the first half of the Java wrapper that was discussed and seen in Figure 25.  The orange box contains the search string and the last half of the Java wrapper.

Figure 30 shows the end of the protoc output with the and.gsa.d.ssc.16 string.

So, while there is not an open-source method of parsing this data as of this writing, Cellebrite, as previously mentioned, has baked this into the latest version of Physical Analyzer, but care should be taken to determine which timestamp(s) is accurate.

OK Computer…er…Google. Dissecting Google Assistant (Part Deux)

NoDisassemble

In part two of this article I will be looking at Google Assistant artifacts that are generated when using a device outside of the car (non-Android Auto). Since this post is a continuation of the first, I will dispense with the usual pleasantries, and jump right into things.  If you have not read Part 1 of this post (dealing with Google Assistant artifacts generated when using Google Assistant via Android Auto), at least read the last portion, which you can do here.  The data (the phone extraction) discussed in both posts can be found here.  Just know that this part will not be as long as the first, and will, eventually, compare the Google Assistant artifacts generated in Android Auto to those generated just using the device.

If you don’t feel like clicking over, let’s recap:

A Slight Review

Google Assistant resides in the /data/data directory.  The folderis com.google.android.googlequicksearchbox.  See Figure 1.

galisting

Figure 1.  Google Assistant’s home in Android.

This folder also holds data about searches that are done from the Quick Search Box that resides at the top of my home screen (in Oreo).  The folder has the usual suspect folders along with several others.  See Figure 2 for the folder listings.

galisting-infile

Figure 2.  Folder listing inside of the googlequicksearchbox folder.

The folder of interest here is app_session.  This folder has a great deal of data, but just looking at what is here one would not suspect anything.  The folder contains several binarypb files, which I have learned, after having done additional research, are binary protocol buffer files.  These files are Google’s home-grown, XML-ish rival to JSON files.  They contain data that is relevant to how a user interacts with their device via Google Assistant.    See Figure 3.

binarypbs

Figure 3.  binarypb files.

Each binarypb file here represents a “session,” which I define as each time Google Assistant was invoked.  Based on my notes, I know when I summoned Google Assistant, how I summoned it, and what I did when I summoned it.  By comparing my notes to the MAC times associated with each binarypb file I identified the applicable files for actions taken inside of the car (via Android Auto) and those taken outside of the car.

During my examination of the binarypb files that were created during sessions inside of the car, I found similarities between each file, which are as follows:

  1. Each binarypb file will start by telling you where the request is coming from (car_assistant).
  2. What is last chronologically is first in the binarypb Usually, this is Google Assistant’s response (MP3 file) to a vocal input just before being handed off to whatever service (e.g. Maps) you were trying to use.  The timestamp associated with this is also at the beginning of the file.
  3. A session can be broken down in to micro-sessions, which I call vocal transactions.
  4. Vocal transactions have a visible line of demarcation by way of the 16-byte string ending in 0x12.
  5. A BNDL starts a vocal transaction, but also further divides the vocal transaction in to small chunks.
  6. The first vocal input in the binarypb file is marked by a 5-byte string: 0xBAF1C8F803, regardless of when, chronologically, it occurred in the session.
  7. Each vocal input is marked by an 8-byte string:   While the 5-byte string appears at the first in the binarypb file only (along with the 8-byte string), the 8-byte string appears just prior to each and every vocal input in the file.
  8. When Google Assistant doesn’t think it understands you, it generates different variations of what you said…candidates…and then selects the one it thinks you said.
  9. In sessions where Google Assistant needs to keep things tidy, it will assign an identifier. There does not appear to be any consistency (as far as I can tell) as to the format of these identifiers.
  10. The end of the final vocal transaction is marked by a 16-byte string ending in 0x18.

Visually, sessions via Android Auto can be seen in Figure 4, and vocal transactions can be seen in Figure 5.

img_0075

Figure 4.  Visual resprensetation of a session.

 

img_0074

Figure 5.  Visual representation of vocal transactions.

One additional notation here.  I was contacted by a reader via Twitter and asked about adding byte offsets to Figures 4 and 5.  Unfortunately, the byte offsets beyond the header are never consistent.  This is due to requests always being different, and, as a result, Google Assistant’s response (whether vocally, by action, or both) are always different.  I think the thing to keep in mind here is that there is a structure and there are some markers to help examiners locate this data.

A Deep Dive

To start, I chose 13099.binarypb.  This session occurred on 01/28/2019 at 12:41 PM (EST) and involved reading new text messages and dictating a response.  The session was initiated by “Ok, Google” with the phone sitting on my desk in front of me while the phone was unlocked and displaying the home screen.  The session went like this:

First Dialogue

Figure 6 shows the top of the binarypb file.  In the blue box is something familiar:  the 0x155951 hex value at offset 0x10.  This string was also present in the binarypb files generated while inside the car (via Android Auto).  In the orange box “opa” appears.  This string appears at the top of each binarypb file generated as a result of using Google Assistant outside of the car.  I suspect (based on other data seen in these files) that this is a reference to the Opa programming language.  This would make sense as I see references to Java, too, which is used throughout Android.  Additionally, Opa is aimed at both client-side and server-side operations (Node.js on the server and JavaScript on the client side).  Again, this is speculation on my part, but the circumstantial evidence is strong.

Figure 6

Figure 6. Top of 13099.binarypb.

In the red boxes are the oh-so-familiar “BNDL’s.” In the green box the string “com.google.android.googlequicksearchbox” is seen.  This is the folder in which the Quick Search Box resides, along with the session files for Google Assistant.

Just below the area in Figure 6 is the area in Figure 7.  There are a couple of BNDL’s in this area, along with the area in the orange.  This string appears to be indicating this part of the file was caused by a change in the conversation between Google Assistant and myself; “TRIGGERED_BY” and “CONVERSATION_DELTA.” See Figure 7.

Figure 7

Figure 7. A change in conversation triggered this vocal transaction

The area in the blue box is interesting as it is a string that is repeated throughout this session.  I suspect…loosely…this is some type of identifier, and the string below it (in Figure 8) is some type of token.

Figure 8.PNG

Figure 8.  ID with possible token…?

I will stop here for a second.  There was a noticeable absence at the top of this file.  There was no MP3 data here.  A quick scan of this entire file finds no MP3 data at all.  Determining whether this is unique to this particular file or systemic trend will require examining other files (later in this article).

After the area in Figure 8 there was quite a bit of protocol buffer data.  Eventually, I arrived at the area depicted in Figure 9.  In it you can see the identifier from Figure 7 (blue box), a bit more data, and then a time stamp (red box).  The value is 0x65148E9568010000, which, when read little endian is 1548697343077 (Unix Epoch Time).  Figure 10 shows the outcome using DCode.

Figure 9.PNG

Figure 9.  Identifier and Unix Epoch Time time stamp.

 

Figure 10.PNG

Figure 10. Time stamp from Figure 9.

The time stamp here is about a minute ahead of when I initiated the session.  Remember what I said about the last thing chronologically being the first thing in the file?  I suspect the last thing I said to Google Assistant will be the first vocal input data I see.  See Figure 11.

Figure 11.PNG

Figure 11.  Last vocal input of the session.

There is one bit of familiar data in here.  If you read the first part of this article you will know that the string in the blue box (0xBAF1C8F803) appeared just before the first vocal input in a binarypb file, which is usually the last vocal input data of the session.  It did not appear anywhere else within the file.  It appears here, too, in a session outside of the car.

In the orange box is what appears to be some Java data indicating where this session started:  “hotword.”  The hotword is the trigger phrase for Google Assistant, which, for me is “Ok, Google.”  The 8-byte string in the green box (0x010C404000040200) is consistent throughout the file (save one location – discussed later), and, as suspected, my last vocal input that I provided Google Assistant (purple box).  A BNDL appears at the end in the red box.

Figure 12 shows some familiar data (from Figures 7 & 8):  TRIGGERED_BY, CONVERSATION_DELTA, the identifier (blue box) and what I believe to be some token (red box).  Note that the suspected token here matches that seen in Figure 8.

Figure 12

Figure 12.  A rehash of Figures 7 & 8.

 

Figure 13.PNG

Figure 13.  The identifier again and another time stamp.

After some more protocol buffer data I find the area in Figure 13.  It looks the same as the area shown in Figure 9, and the time stamp is the same.

Figure 14 is a somewhat recycled view of what was seen in Figure 11, but with a twist.  The Java data which seems to indicate where the query came from wraps the vocal input (“no”); see the orange box.  A BNDL is also present.

Figure 14.PNG

Figure 14.  Vocal input with a Java wrapper.

Also seen in Figure 14 is another time stamp in the red box.  The value is 0x65148E9568010000, which is decimal 1548697279859.  As before, I used DCode to convert this from Unix Epoch Time to 01/28/2019 at 12:41:23 (EST).  This is the time I originally invoked Google Assistant.

Figure 15 shows some more data, and the end of the vocal transaction (see my Part 1 post).  This is marked by the velvet:query_state:search_result_id string (purple box) and the 16-byte hex value of 0x00000006000000000000000000000012 (orange box).  The string and accompanying hex value are the same ones seen in the binarypb files generated by interaction with Google Assistant via Android Auto.

Figure 15

Figure 15.  Data marking the end of the vocal transaction.

Figure 16 shows the start of a new vocal transaction.  The BNDL (seen at the bottom of Figure 15, but not marked) is in the red box.  Just below it is the 8-byte string in the green box.  Note that the last byte is 0x10 and not 0x00 as seen in Figure 11.  My vocal input appears in the purple box; this input is what started the session.  Just below it is another BNDL.  See Figure 16.

Figure 16.PNG

Figure 16.  New vocal transaction.

The items below the BNDL are interesting.  The orange box is something previously seen in this file:  TRIGGERED_BY.  However, the item in the blue box is new.  The string is QUERY_FROM_HOMESCREEN, which is exactly what the phone was displaying when I invoked Google Assistant.  The phone was on, unlocked, and I used the hotword to invoke Google Assistant, which leads me to the string in the brown box: “INITIAL_QUERY.”  The phrase “read my new text messages” was my original request.  This area seems to imply that my phrase was the initial query and that it was made from the home screen.  Obviously, there is plenty of more testing that needs to be done to confirm this, but it is a good hypothesis.

Figure 17.PNG

Figure 17.  A time stamp and a “provider.”

In Figure 17 there is a time stamp (red box):  the decimal value is 1548697279878 (Unix Epoch Time) and the actual time is 01/28/2019 at 12:41:19 (EST).  Again, this is the time Google Assistant was invoked.  The portion in the blue box, while not a complete match, is data that is similar to data seen in Android Auto.  I highlighted the whole box, but the area of interest is voiceinteraction.hotword.HotwordAudioProvider /34.  In the Android Auto version, the related string was projection.gearhead.provider /mic /mic.  In the Part 1 post, I indicated that the /mic /mic string indicated where the vocal input was coming from (my in-car microphone used via Android Auto).  Here I believe this string indicates the origin of the Google Assistant invocation is via the hotword, although I am not completely sure about the /34.

The area in the blue box in Figure 18 is new.  I have tried to find what the data in the box means or its significance, and I have been unable to do so.  In addition to searching the Google developer forums, I pulled the phone’s properties over ADB in order to see if I could determine if the data was referring to the on-board microphone and speaker (ear piece), but the list of returned items did not have any of this data.  At this point I have no idea what it means.  If someone knows, please contact me and I will add it to this article and give full credit.

Figure 18-1-1

Figure 18.  Something new.

I had to scroll through some more protocol buffer data to arrive at the area in Figure 18-1.  There are several things here:  the velvet:query_state:search_result_id with the accompanying 16-byte string ending in 0x12 (brown boxes), BNDLs (red boxes), the 8-byte string just prior to my vocal input (green box), my vocal input (purple box), the TRIGGERED_BY, CONVERSATION_DELTA strings (orange box – my response “yes” was a result in a change in the conversation), and the identifier that I had seen earlier in the file (blue box).  Note that while the string in the green box matches the string seen in Figure 11, it differs from the one seen in Figure 18-1.  The string in Figure 18-1 ends in 0x10 whereas the string here and Figure 11 both end in 0x00.

Figure 18

Figure 18-1.  The end of one vocal transaction and the beginning of another.

Just past the identifier seen in Figure 18-1, there was another string that I suspect is a token.  This string starts out the same as the one seen in Figures 8 and 12, but it does differ.  See Figure 19.

Figure 19.PNG

Figure 19.  A new “token.”

Scrolling through more protocol buffer data finds the area seen in Figure 20.  Here I find another time stamp (red box).  The decoding methodology is the same as before, and it resulted in a time stamp of 01/28/2019 at 12:41:42 (EST).  This would have been around the time that I indicated that I wanted to reply to the text messages (by saying “yes”) Google Assistant had read to me.  Additionally, the Java string appears (orange box), and the end of the vocal transaction is seen with the velvet:query_state:search_result_id and the accompanying 16-byte string ending in 0x12 (blue boxes).

Figure 20

Figure 20.  The end of another vocal transaction.

Figure 21 has my dictated message in it (purple box), along with some familiar data, and a familiar format.

Figure 21

Figure 21.  A familiar face.

At the top is a BNDL (red box), the 8-byte string ending in 0x00 (green box), another BNDL (red box), the TRIGGERED_BY, CONVERSATION_DELTA strings (orange box), and the identifier again (blue box).  In Figure 22 another “token” is found (red box).  This is the same one as seen in Figure 19.

Figure 22.PNG

Figure 22.  Another “token.”

Yet more protocol buffer data, and yet more scrolling takes me to the area in Figure 23.  In the red box is another time stamp.  In decimal it is 1548697307562 (Unix Epoch Time), which converts to 01/28/2019 at 12:41:47 (EST).  This would have been around the time I dictated my message to Google Assistant.  The identifier also appears at the foot of the protocol buffer data (blue box).

Figure 23.PNG

Figure 23.  Another time stamp.

Figure 24 shows the same data as in Figure 20:  the end of a vocal transaction.  The orange box contains the Java data, and the blue box contains the velvet:query_state:search_result_id and the accompanying 16-byte string ending in 0x12.

Figure 24

Figure 24.  End of another vocal transaction.

Beyond my vocal input (purple box), the area seen in Figure 25 is the same as those seen in Figures 18-1 & 21.  I even marked them the same… BNDL (red box), the 8-byte string ending in 0x00 (green box), another BNDL (red box), the TRIGGERED_BY, CONVERSATION_DELTA strings (orange box), and the identifier again (blue box).

Figure 25

Figure 25.  The top of another vocal transaction.

Figure 26 shows an area after some protocol buffer data that trailed the identifier in Figure 25.  The notable thing here is the time stamp in the red box.  It is decimal 1548697321442 (Unix Epoch Time), which translate to 01/28/2019 at 12:42:01 (EST).  This would have lined up with when I sent the dictated text message.

Figure 26.PNG

Figure 26.  Time stamp from “No.”

Figure 27 shows the end of the vocal transaction here.  In the orange box is the Java data, with the velvet:query_state:search_result_id and the accompanying 16-byte string ending in 0x12 in the blue box.

Figure 27.PNG

Figure 27.  The end of a vocal transaction.

Figure 28 looks just like Figures 18-1, 21 & 25.  The only difference here is my vocal input (“no”).  This was the last thing I said to Google Assistant in this session, so I expect this last portion of the file (save the very end) to look similar to the top of the file.

Figure 28

Figure 28.  Look familiar?

Figure 29 contains a time stamp (red box), which appears after a bit of protocol buffer data.  It is decimal 1548697343077 (Unix Epoch Time), which converts to 12:42:23 (EST).  This is the same time stamp encountered in this session file seen in Figure 9.

Figure 29.PNG

Figure 29.  The last/first time stamp.

Figure 30 shows the end of the session file with the orange box showing the usual Java data.  The end of this file, as it turns out, looks very similar to end of session files generated via Android Auto.  Three things are present here that are also present in the end of the Android Auto session files.  First, the velvet:query_state:search_result_id and the accompanying 16-byte string ending in 0x18 in the blue box.  Second, the 9-byte string, 0x01B29CF4AE04120A10 in the purple box. Third, the string “and.gsa.d.ssc.” is present in the red box.

Figure 30

Figure 30.  A familiar ending.

So, right away I see quite a bit of similarities between this session file and the ones generated by Android Auto.  In order to have some consistency between these files and those from Android Auto, the next file I examined involved me asking for directions to my favorite coffee joint.

The next file I examined was 13128.binarypb.  This session occurred on 01/28/2019 at 12:43 PM (EST) and involved reading new text messages and dictating a response.  The session was initiated by “Ok, Google” with the phone sitting on my desk in front of me, unlocked, and displaying the home screen.  The session went like this:

Second Dialogue

The screen switched over to Google Maps and gave me the route and ETA.  I did not choose anything and exited Maps.

The top of 13128.binarypb looks identical to 13099.binarypb (Figure 6).  See Figure 31.

Figure 31

Figure 31.  A familiar sight.

The gang is all here.  The string 0x155951 (blue box), “opa” (orange box), com.google.android.googlequicksearchbox (green box), and a couple of BNDL’s (red box).

While no data of interest resides here, I am including Figure 32 just to show that the top of 13128 is just like 13099.

Figure 32.PNG

Figure 32.  Nothing to see here.

A quick note here: this file is just like 13099 in that there is no MP3 data at the beginning of the file. As before, I scanned the rest of the file and found no MP3 data at all. So, this is a definite difference between the Android Auto and non-Android Auto session files.

Figure 33 is something I had seen in the previous file (see Figure 16), but further down.  The blue and orange boxes contain the TRIGGERED_BY and QUERY_FROM_HOMESCREEN strings, respectively.  Just like my previous session, this session was started with the phone on, unlocked, and by using the hotword to invoke Google Assistant, which leads me to the string in the red box: “INITIAL_QUERY.”  This area seems to imply that whatever vocal input is about to show up is the phrase that was the initial query and that it was made from the home screen.

Figure 33.PNG

Figure 33.  Query From Home Screen, Triggered By, Launched On, & Initial Inquiry.

Figure 34 looks almost identical to Figure 18-1.  The red box contains a time stamp, which is decimal 1548697419294 (Unix Epoch Time).  When converted it is 01/28/2019 at 12:43:39 (EST).  The blue box contains the string voiceinteraction.hotword.HotwordAudioProvider /49.  The /49 is different than the one seen in Figure 18-1, though (/34).  Again, I am not sure what this is referring to, and I think it warrants more testing.

Figure 34.PNG

Figure 34.  The query source and a time stamp.

Scrolling down just a hair finds the area in Figure 35.  The orange box contains Java data we have seen before but with a small twist.  The string webj and.opa.hotword* search and.opa.hotword, with the twist being “search” in the middle.  As seen in the first file, it’s almost as if the term in the middle is being wrapped (my “no” was in wrapped as seen in Figure 14).

Figure 35.PNG

Figure 35.  Something old and something old.

The area in the red box is the same data seen in Figure 18.

Figure 36 also contains some familiar faces.  My vocal input is in the purple box, the 5-byte blue string that usually appears at the first vocal input of the session, 0xBAF1C8F803, is here.

Figure 36.PNG

Figure 36.  The first vocal input of the session.

An 8-byte string previously seen in 13099 is also here (see Figure 16).  Note that this string ends in 0x10.  In 13099 all of the 8-byte strings, save one, ended in 0x00.  The one that did end in 0x10 appeared with the first vocal input of the session (“read my new text messages”).  Here, we see the string ending in 0x10 with the only vocal input of the session.  I hypothesize that the 0x10 appears before the first vocal input of the session, with any additional vocal input appearing with the 8-byte string ending in 0x00.  More research is needed to confirm, which is beyond the scope of this article.

Figures 37 and 38 shows the same data as seen in Figure 33 and 34.

Figure 37.PNG

Figure 37.  Same ol’ same ol’.

 

Figure 38.PNG

Figure 38.  Same ol’, part deux.

Figure 39 shows the mysterious string with the speaker id (red box) and Figure 40 shows my vocal input inside of a Java wrapper (orange box), which is similar to what was seen in 13099 (Figure 14).

Figure 39

Figure 39.  Speaker identifier?

 

Figure 40.PNG

Figure 40.  A Java wrapper and a time stamp.

The time stamp seen in Figure 40 is the same as the other time stamps seen in this session file except for the first byte.  The other bytes are 0x1E, whereas the byte seen here is 0x08; this causes the decimal value to shift from 1548697419294 to 1548697419272.  Regardless, the time here is the same:  01/28/2019 at 12:43:39 PM (EST).  The millisecond value is different:  294 versus 272, respectively.

Figure 41 shows the end of the vocal transaction, which is marked by the  velvet:query_state:search_result_id and the accompanying 16-byte string ending in 0x12 in the blue box.

Figure 41.PNG

Figure 41.  The end of the vocal transaction.

The start of a new vocal transaction is seen in Figure 42.  The 8-byte value seen in the green box ends with 0x10, which keeps in line with my theory discussed earlier in this article.  My vocal input (the only input of the session) is seen in the purple box.  A BNDL is seen at the start of the transaction (red box) with another one at the end (red box).

Figure 42.PNG

Figure 42.  The start of another vocal transaction.

In the interest of brevity, I will say that the next bit of the session file is composed of what is seen in Figures 37, 38, and 39 (in that order).  The time stamp is even the same as the one seen in Figure 38.  The next area is the last part of the session file as seen in Figure 43.

Figure 46

Figure 43.  The end!

If Figure 43 looks familiar to you, that is because it is.  I color coded the boxes the same way as I did in Figure 31.  Everything that was there is here:   the Java data (orange box), the velvet:query_state:search_result_id and the accompanying 16-byte string ending in 0x18 in the blue box,  the 9-byte string, 0x01B29CF4AE04120A10 in the purple box, and the string “and.gsa.d.ssc.” is present in the red box.

So What Changed…If Anything?

At the beginning of this article I reviewed some consistencies between the Android Auto session files I examined.   After examining he non-Android Auto files, I thought it would be beneficial to revisit those consistencies to see what, if anything changed.  The original statements are in italics, while the status here is just below each item.

  • Each binarypb file will start by telling you where the request is coming from (car_assistant).

This is still correct except “car_assistant” is replaced by “opa” and “googlequicksearchbox.”

  • What is last chronologically is first in the binarypb file. Usually, this is Google Assistant’s response (MP3 file) to a vocal input just before being handed off to whatever service (e.g. Maps) you were trying to use.  The timestamp associated with this is also at the beginning of the file.

This is still correct, minus the part about the MP3 data.

  • A session can be broken down in to micro-sessions, which I call vocal transactions.

This is still correct.

  • Vocal transactions have a visible line of demarcation by way of the 16-byte string ending in 0x12.

This is still correct.

  • A BNDL starts a vocal transaction, but also further divides the vocal transaction in to small chunks.

This is still correct.

  • The first vocal input in the binarypb file is marked by a 5-byte string: 0xBAF1C8F803, regardless of when, chronologically, it occurred in the session.

This is still correct.

  • Each vocal input is marked by an 8-byte string:   While the 5-byte string appears at the first in the binarypb file only (along with the 8-byte string), the 8-byte string appears just prior to each and every vocal input in the file.

Eh…sorta.  While the values in the 8-bytes change between Android Auto and non-Android Auto, there is some consistency within the fact that there is a consistent 8-byte string.  Further, the last byte of the 8-byte string in the non-Android Auto version varies depending on whether or not the vocal input is chronologically the first input of the session.

  • When Google Assistant doesn’t think it understands you, it generates different variations of what you said…candidates…and then selects the one it thinks you said.

Unknown.  Because I was in an environment which was quiet, and I was near the phone, Google Assistant didn’t seem to have any trouble understanding what I was saying.  It would be interesting to see what would happen if I introduced some background noise.

  • In sessions where Google Assistant needs to keep things tidy, it will assign an identifier. There does not appear to be any consistency (as far as I can tell) as to the format of these identifiers.

This is correct.  In the 13099 file, there were multiple things happening, so an identifier with something that resembled a token was present.

  • The end of the final vocal transaction is marked by a 16-byte string ending in 0x18.

Still correct.

For those of you that are visual learners, I am adding some diagrams at the end that shows the overall, generalize structure of both a session and a vocal transaction.  See Figures 44 and 45, respectively.

OutOfCar-SessionFile.JPG

Figure 44.  Session file.

 

OutOfCar-VocalTransaction

Figure 45.  Vocal transaction.

Conclusion

There is way more work to do here in order to really understand Google Assistant.  Phil Moore, of This Week in 4n6 fame, mentioned the Part 1 of this article recently on the This Month in 4N6 podcast, and he made a very accurate statement:  Google Assistant is relatively under researched.  I concur.  When I was researching this project, I found nothing via Google’s developer forums, and very little outside of those forums.  There just isn’t a whole lot of understanding about how Google Assistant behaves and what it leaves behind on a device.

Google Assistant works with any device that is capable of running Lollipop (5.0) or higher; globally, that is a huge install base!  Additionally, Google Assistant can run on iOS, which adds to the install base, and is a whole other batch of research.  Outside of handsets there are the Google Home speakers, on which there has been some research, Android TVs, Google Home hubs, smart watches, and Pixelbooks/Pixel Slates. Google is making a push in the virtual assistant space, and is going all in with Google Assistant (see Duplex). With the all of these devices capable of running Google Assistant it is imperative that practitioners learn how it behaves and what artifacts it leaves behind.