How to Fail an AI Startup

Slava Smirnov returned to tell us about the cruel underbelly of AI startups for creators' economy.

Introduction

This could have been a fancy story about corporations killing startups, but the truth is brutal. Even if you've found your users, you receive a message from Blizzard during closed beta, competitors ask about your company valuation before your product has been released and in general, everything seems to be boiling hard, these things guarantee nothing and screwups are all yours. Many people talk about their successes, as a species, we are prone to survivorship bias and learn from those who've made it. I'm no different, but besides learning from these we can learn from those who've failed miserably. I'm gonna speak about a failed company with a neural-networks product and what've learned. Here's the story on machine learning for creators and some remarks on the creative economy. Let's go!

Once Upon a Time

It's been 12 years that I'm doing digital solutions for creative industries: 8 years for advertising and marketing, 4 years at game tech. 4 years ago I pushed myself hard into machine learning, learned Python, math, etc. After leaving an AI startup for gamers 1.5 years ago I started digging through AI motion capture from a video.

On one hand, most of the professional content is produced by the hard work of hundreds of thousands of modelers, animators, developers, and other great folks out there. It takes months to get those 5 seconds of best-looking shot or one artistic movement in a game. In short, it’s a hell of a work. With this type of manual labor, any solution for content makers to make their work faster or cheaper (while maintaining control over artistry) is set to be successful. Unsurprisingly while humanity both consumes more and the distribution is getting personalized, the professional content industry is growing rapidly by 15-20% per year. Dozens of thousands are coming to the industry and they are faced with time-consuming techniques and hard-to-get skills. The entry barrier is the highest it was for the last 30 years. Respectively all major 3D packages and engines are doing a bunch of work to ease the entry barriers. Say, Epic Games before going public are buying small tech companies every other day. They just want this "make it feel great" button inside Unreal Engine to power up all the crucial pieces for their metaverses.

On the other hand, machine learning has come a long way from academia to the creators' economy. Solutions that enable authors to produce content faster/cheaper with a help of machine learning and AI arise. As of 2021 a computer vision (a branch of machine learning discipline) managed to get an input of a person within an image/frame and output person’s skeletal keypoints coordinates in 3D. So I've looked around, asked my animators friends, and to my surprise, there was pretty much no one significant doing such product for the CG and game industries. We gotta do it, it should work:

  • there will be more and more content
  • everyone has their body and cameras
  • neural networks are not that easy to cook

So I found a co-founder Gleb who not just knows how to cook neural networks but who is specialized in a computer vision field with some great scientific publications at well-known international conferences on machine learning. And not just that, he turned to be really deep about empowering creativity in humanity. Bingo! We gotta build a democratized motion capture with AI.

So we’ve started to build it, grabbed some data here, pulled some neural networks there, learned to output that to 3D software packages, made a video about our software, and built a landing website. We came to a respectful 80 Level community and talked about how machine learning is powering its next wave of solutions for creators. We showed the video and invited readers for collaborations. In 3 days we received hundreds of applications from the whole world ranging from promising young talents to well-established professionals both independent and employees at various studios working for, say, next Star Wars and Guardians of the Galaxy franchises. It seemed we’ve hit the jackpot! Light version for the product-market fit as they say. We even thought to raise venture capital yet turned down this idea by being highly self-confident and backed by such community response. Surely, a fancy startup deserves a fancy name. That’s when CPTR.tech appeared.

Here come the collabs and pilots. We started to see what's great and what’s not for our users. First and foremost we were right about users being comfortable with the strict setup we required: mounted camera, full-body within a frame, etc. At the same time, first users were ok to deal with all inconveniences: minor errors in skeleton key points here and there, manual interaction with us. Yet they not just wanted the software, they needed that hard so they were reminding us we promised them to send something. Quite fast we’ve encountered an issue: when users were putting our software results to their 3D models something strange was happening. While with pure skeleton everything seems to be working fine, yet when they put their 3D models on our skeletons animations the 3D models' vertices were awkwardly get screwed. So we’ve started to look deeper. Turned out our software outputs keypoints locations and constructs bones gracefully but the bones rotations were missing from our neural networks predictions. For such real-time software to be really useful, it needs to have bones rotations. We thought this would limit the usefulness of the software a lot and we decided to fix that before the public release. I’m not sure if that decision was one of the biggest mistakes.

At this point, we went to Our-Own-All-Mighty-AI-Research to get the bones rotations. We've built our own neural network to learn those bones rotations from predicted 3D keypoints. Didn't work well. We've tried to make a neural network learn key points and rotations from 2D keypoints. We've tried some parametrized models and get rotations from the shape. Well, kinda, yeah, but questions on quality and speed were quite significant. We even went all-in with generating synthetic data based on other neural networks.

One Doesn't Simply Go and Launch

3 months have passed. The rotations task was still on. Co-founder Gleb started to lose reality from rolling multi-dimensional matrices and a number of experiments. Collabs applications were stacking, pilots were ongoing, journalists were interested to cover. We were getting to know more and more useful things about the industry through pilots. Obvious ones for those doing it daily for years, yet useful for us. For example, there's no one single standard skeleton for biped characters since every work context for animators and non-animators is slightly different. Plus every 3D software has its own default skeleton. Or, say, an ability to get one's shape from a video is not getting its traction since they cannot be easily put in place for professional’s custom 3D models. Most of the users obviously need to skin their own models for their own custom skeletons. Or say real-time preview is crucial for middle and corporate users coz they got used to that kind of features with pricey traditional solutions such as motion capture studios and suits. Yet the indie segment is happy with whatever but they are utterly hard to build a business with.

Alright, pilots are ok to verify the core hypothesis, but It's time to package the product and release it publicly. The initial thought was: ok, something is working, everyone is happy with the quality and don't even roll their eyes when we speak money. So while we finishing up bones rotations (kinda ml-based solver) we still gotta do the app itself. Design and build a user interface, design and build backend, work out the business model, connect the billing, draft some marketing channels. So I've called out my fellow full-stack developer Nik, one of the best backend developers I had a chance to work with previously. So it’s' three of us now. Nik did a tremendous job in 1.5 months. Production code, auto-scaling, tests, infrastructure, monitoring. It seemed ok to release, everything works a charm. Except there are no stable bone rotations for a full-body skeleton. Alright, it needs more time, ok, let's solve an even harder industry task – capture hands (or finger tracking). One gives a video, it gives you back finger motion in 3D real-time within Blender, Maya, and UE. It worked out with finger bones rotations for us with ease. And when body skeleton bones rotations will emerge we will switch it up on a backend and marketing.

And here come heavens bless: we received $100K support from Nvidia and Amazon Web Services. Quite the same time Epic Games US has sent us an email. You know, the one you don’t wanna miss. One of the hardware startups with suits for motion capture has called to chat around. You know, chat around, right.

At the same time, we stumble upon a fresh public repo called MediaPipe from Google. This repo lets 3rd party developers do some tasks, where computer vision somehow works with a person (e.g. hair segmentation, eye tracking, etc.). Other than that MediaPipe gives you the ability to track 3D keypoints from a video containing a person. So we've noticed and checked. Well, it kinda works. We've talked with the guys. Valentin, Ming Yong, and the team are perfectly aware of the repo's current limitations (say, speed and quality tradeoff; no bone rotations for now and many more). But here comes the best part – they serve it for free for 3rd party developers to be able to build their own services. Can you feel it, already?

So we've released hands motion capture (finger tracking). I've started to ping the market with the solution and the market was like "yeah, cool, we'll have look". VR guys politely said "cool, but we have it within an Oculus SDK". Animation studios were like "we are ok with what we have now". The website traffic is there, people are trialing, but the billing dashboard is not showing a hockey stick.

That's a miss. In fact, the number of collabs for finger tracking was a magnitude lower than for the body. But during the first reaction for body motion capture, I've missed that. At the same time, the body bone rotations are not working great, we've tried everything, it's not working and that's it.

So What?

One day a partner came up with an idea not to learn bone rotations in the neural networks but to solve those analytically. Some remarks on the quality of the input video are still needed, but 2-3 weeks and something seems to be working. We started to repack the solution and look closer at this MediaPipe thing. The first reaction on those was the following: the resulted capture quality is the lowest we saw + the neural network's inference speed is not optimal for the real-time sometimes + this library is not about content production. Yay, nothing to worry about. Then we started to see their community is boiling so we've looked closer.

First, this is a pose estimation library targeted at 3rd party developers. Roughly speaking it lets anyone with no prior understanding of neural networks develop apps with neural networks. No knowledge of machine learning? No worries, we've got you covered, we already have pre-trained neural networks and we even have a set of APIs for all your possible programming languages and platforms.

Second, this is a client's machine-ready solution. No need to build an expensive cloud infrastructure to support multiple connections, scaling, etc. Everything should just work on an end user's machine. Are you getting it?

It took me a while but finally, it hit me hard. Combine two factors: the lowest barrier possible and the end user's machine. During the next 1-1.5 years every major 3D software application now has up to 5 to 10 plugins ($5 priced or even free). We already see some appear. Or even more likely scenario: this is simply a built-in feature within 3D software. Now you have a small startup, competing with dozens of free solutions or with built-in features is like, I don't know, like putting a 9-to-5 ice cream wood kiosk next to a 24/7 shopping mall. One can do it for sure with a cutting edge neural-networks-based-ice-cream but this is not a sustainable fast-growing business as defined by a startup definition. Meaning this will not grow x2 per year, meaning there’ll be difficulties with funding, meaning we’ll not keep the pace, meaning the startup game is lost.

So the first-mover advantage for CPTR.tech is over. This is it, say goodbye to subscription-based SaaS. Say goodbye to a seashore house.

Now let's rewind and try to get some system-wide learnings from the story.

Something Went Wrong

Most true AI-based products are powered by research, which in turn is financed by either educational institutions or corporations. The research by its nature is shown publicly. And it’s the kind of guy who powers all these clickbaits “AI is gonna kill {put_your_most_favourable_job_here}”. So when it’s shown publicly I mean here’s a link with “fully” reproducible code which anyone (well, with enough technical proficiency) can just grab and run at their laptop. Surely there’s a huge engineering gap between the research and the product but this is manageable.

Within a research community, everyone is driving crazy about quality metrics. In fact, those are often measured on a subset of the real-world in-the-wild data. Main differences in these researches lay within 2 components:

  • Datasets (a collection of data where the neural network is trained on)
  • The architecture of the neural networks (in theory this represents a set of tricks and techniques to train a network with desired operational parameters; in practice, this represents a current fashion within architectures)

Now let’s look on the other side. Market’s main issue/inefficiency (an opportunity for startups) in most of the business domains lays either in a speed, a quality, or a cost of the current way of doing things in a real-world economy. Say to build spoons by hammer and hands is costly, so why don’t we build a machine to produce spoons. Without the soul for sure but those will be much quicker/cheaper and with fewer mistakes.

So this quality/costs/speed thing is usually solved not with a speed of the neural network (1ms per frame or 1 second per frame). This is not solved with computational power, which directly influences AI-company’s funds. For most domains with AI to be efficient for the consumer, the key lays within a neural network’s quality metrics. While network architecture (the first component of the network’s quality) is in fact freely open by public research the second component of the quality, data, determines the network’s quality the most. Thus when the AI products category matures the best solution is based on the best data. And the best data is hard to grasp. It’s either pricey or too moated to get.

Now, let's look back at the market. Who has the resources to get the best data? Right, corporations. And what do corporations do? Right, they either fiercely guard their data or put their data into the use of 3rd parties with pre-built neural networks. These 3rd parties in turn build apps and experiences for end-users within existing (AppStore, GooglePlay) and most importantly new platforms (Oculus Quest Store and others).

For example, the real-time hand and finger tracking research is a pure by-product of computer vision research being done to power multiple AR glasses. When you want AR glasses to be an extended version of reality you first gotta real-time understand what's happening within a frame and it starts with hands. The FB version of glasses is rumored to be launched in late 2021 signaling a start of the multiverse version for FB. Look around what Apple and Snap are doing and where they are headed with AR and why.

Plus don’t forget about the research has no signs of slowing down. Meaning most probably your sub-field of work is getting a new research paper every other month or two, new paper does the same as what The previous one was doing but with improved speed and quality. Sometimes though the quality metrics get boosted dramatically and this paper is called a brake-through for the sub-field. In half a year this paper is de-facto turned to be a default in the research community. And the cycle repeats.

Sometimes some fundamental breakthroughs occur which are mostly about new neural networks architectures. Last time it happened with so-called transformers.

So we have:

  • great speed of AI research
  • corporations powering 3rd party devs
  • low quality of public datasets, high cost of generating non-public datasets

These three factors combined result in the following fact. The speed of new research or pre-trained libraries being released which powers the next generation of solutions for your sub-field is already higher than the speed of building a sustainable fast-growing product with a moat from the competition (e.g. a startup). A startup cannot compete with a world community of research. That’s why building software as a service business powered only by the quality of neural networks already doesn’t make any sense.

Alright, fella, that’s a bold claim. What’s the catch? It used to be that way for ages. Tech solutions changed each other (autos vs horses; frameworks vs pure HTML, etc), but previously it took us decades, now with AI, it happens in 1-2 years. That’s why SaaS AI products are ultimately hard to build: you’ve done good research, probably a breakthrough one, you’ve built a software architecture around its characteristics, people, cloud, business model, even free traffic is there. Boom and in 6 months a research community produces another paper or a pre-trained library appears, which (thanks to being real-time or to the tremendous improvement of quality) slowly kills you. Not in a blink of an eye as you already have clients paying you but softly.

Now here comes the right question. What kind of startup with AI under the hood is worth building and at which subfield? Let’s think through for a moment. Right, in a perfect world there’s no such subfield.

Ok, what shall we do though? We are not talking 2 months freelance, we are talking years of dedicated job. To my understanding there are 2 options:

  • you either build a product powered with non-obvious hard-to-get-with-corporate-money data
  • or the one which helps people working with AI day-to-day. Remember shovels and gold rush.

If you know nothing about AI-shovels or you wanna build a first option with unique data representative enough for the whole market I’d suggest build it right now or sell your data right away. In 1.5-2 years this data won’t have any value in it as universities and corporations’ research is accelerating to the moon.

A Startup? Not Yet, Just an Experience

Brief summary:

  1. That’s quite traditional for modern software development but still an important fact. No need to dig into the perfect initial release. It’s better to release a bugged product but do it fast. First, we’d check if we are hitting an actual pain in the market and would not spend too much time possibly worthless. One truly cannot miss product-market fit. If you hit the pain (early adopters are trying to get their hands on a product and pay for that), the next question comes up. Does the product serve it well enough to jump to the mainstream segment? If we miss the target on pain assumption or if the product is not good enough then you’d have signals on it as well. Most importantly — the speed is still a true advantage for the startup to win.
  2. We’d better get funded early on. One can launch a product somehow on a pure belief in a bright future and some savings, but later on, you’d need money to eat and travel, everyone has families and loved ones, savings are finite. For an AI company not to go into unpredictable AI research means the company doesn’t own a proprietary tech and this leaves you with limited advantages and dramatically reduces m&a valuation. Btw external funding gives you another crucial factor — they give you obligations. And they move you sometimes even faster than your own savings.
  3. Do not overestimate corporations. Most likely they do not do the same thing you do but their job will affect yours. At least coz they are seen and heard more vividly. It’s worth the effort to spend some time and get yourself familiar with what they do and more importantly where they are headed and why. Coz if you battle 1-on-1 with them, you lose. No way either. Moreover, if a corporation started to dig something I’d argue it’s already late to dig in the same way. Can we overplay them by not playing head to head? For starters, it’s worth spending time to get your own datasets that cannot be easily obtained by huge tech companies. I know, it’s not obvious but the AI product will hit that wall anyway. It’s better to think through in advance.
  4. Last but not least. It’s worth spending time evaluating the market. Like in a schoolbook. The number of customers and potential bills and all these things. It’s then quite fast to understand if you are building a lifestyle business, VC-based startup, or acquihire story.

Alright, does this all give you a chance to build a great company? A startup that grows x3 per year and has a huge market ahead? A useful and exciting new product with tremendous perspectives within a company? An honest answer: it all comes down to if you learn from other’s mistakes. We’ve learned our own and we move forward. Stay tuned!

Slava Smirnov, Software Developer

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more