Original Source: Cyber Zen Heart
Image source: Generated by Unbounded AI
From the must-have GitHub for developers, to the young unicorns Scale and Cohere, to this year's high-profile Character.ai, when I was researching excellent companies in the field of artificial intelligence, I often saw two names appearing in the "early stage" Investors" - Nat (Nathaniel) Friedman and Daniel Gross.
Since 2017, Nat and Daniel have also started a partnership investment in the field of artificial intelligence and established an institution called AI Grant. From the initial academic research funding fund to today's early venture capital fund, AI Grant's operation and investment model has helped me in "how to be an early investor in the field of artificial intelligence that is more helpful to entrepreneurial teams" Brings a lot of inspiration and inspiration. Here, I also hope to share the growth and investment stories of Nat and Daniel with you.
Enjoy!
**Einstein was a patent clerk working in Bern. He has a lot of ideas that everyone thinks are crazy. **
**But often, it is the "outsiders" who have the freshest and best ideas. **
**Our goal is to find and fund them. **
Open Source Pioneer Nat Friedman
「It's hard to imagine myself doing something other than founding a startup. But you never know. I'm open to anything.」
On the personal website, Nat (Nathaniel) Friedman's self-introduction has this sentence: ** "I started surfing the Internet in 1991, and the Internet is my real hometown"** - he is not exaggerating.
Born in 1977, Nat Friedman learned software development at the age of six. In 1991, Linus Torvalds, a Finnish young man on the other side of the ocean, publicly released Linux. Nat, a young Virginia town boy who just started surfing the Internet, quickly discovered it and became a well-known hacker in the Linux community with his clever brain and infinite curiosity—— The open source community has thus become the starting point of his career and the basis of close friendships.
In 1999, Nat successfully graduated from the MIT Department of Computer and Mathematics. At the age of 22, he had a firm belief: he only wanted to work in the open source field. **So even penniless, he rejects all job offers and secretly lives on a stinky old red sofa in the common room of an MIT dorm because of the fast internet.
Fortunately, Tim Ney from the Free Software Foundation reached out in time and wrote a check for $350 to Nat. Ask him to do anything - so what the hell did Nat do with the money?
Back in 1996, Nat, who was still a college freshman, wrote the IRC [1] On the network LinuxNet, I met Miguel de Icaza, a Mexican youth who dropped out of the mathematics department and devoted himself to free software development. Then in the summer of 1997, Nat, who was an intern at Microsoft, formally met Miguel, who came to interview for a position on the Internet Explorer Unix team. However, Miguel, who could not obtain a work visa because he did not have a university degree, did not join Microsoft. Instead, he co-initiated the open source project GNOME with his friends in August of the same year. In April 1999, on the eve of **graduation, Nat proposed to Miguel to create a company to continue the development of GNOME, but they had no money - ** **Tim's check helped a lot. **
In October 1999, Nat and Miguel jointly established International GNOME Support (later known as Helix Code) to develop GNOME's infrastructure and applications. Eventually, the company was renamed Ximian and was acquired by Novell in August 2003.
Reflecting on their first entrepreneurial experience, Nat and Miguel write:
"Ximian is made up of like-minded friends. We started the company without any entrepreneurial, management or business experience. We learned on the job and were advised by friends who believed in our purpose and cared about our mission; 90% of Ximian's employees are Open source community contributors and people we met through mail or IRC; we had no management experience, which meant we made every textbook management mistake possible, but all of our friends and The employees support us."
** **After joining Novell, Nat was responsible for all Linux-related projects of the company and served as CTO of open source projects, freeing more than 6,000 employees from the shackles of Windows systems and Office suites, and turned to open source SUSE and OpenOffoce. In 2007, Nat moved to Munich, started the SUSE Studio project, and left the company after the product launch in 2009, when he was also newly married, "a natural break point and time to find something new" .
Judging from this experience, the working model of big companies has not digested Nat's creative soul. David Majda, who joined SUSE shortly after Nat's departure, blogged: "The application looks modern, is visually pleasing, and easy to use. It doesn't look like a product from a Linux company at all. More like a startup product. SUSE's uncanny ability to create such a first-class application convinced me to join the company and eventually the SUSE Studio team. … After joining SUSE, I was curious what the secret sauce was to build this product, very Quickly stumbled upon Nat Friedman's name - the whole project was clearly his idea. He convinced management, put together a team of the best developers he could find, and ran it as a startup , and built the product over two years. Remember, this is a big company with Novell big-corporate-style managers on the one hand, and hardcore hackers from the Linux community on the other—it's not easy thing."
Before embarking on a 2010 round-the-world trip with his new wife, Nat said, "When our travels are over, my next step will probably be to start a company in the U.S.** It's hard to imagine anything other than starting a What else would I do** outside of a startup — but I don’t know exactly what I would do, I’m open to anything.”
It didn't take long for the trip to stop - Novell was acquired by Attachmate at the end of April 2011, and the team led by Miguel was disbanded in early May. Nat returned to the United States, half a month later, on May 17th, the two with amazing execution ability cooperated again to establish Xamarin, and continued to run the open source cross-platform SDK Mono project abandoned by Attachmate. In 2016, this company Acquired by Microsoft for approximately $500 million.
After joining Microsoft, Nat still wanted to stick to being an "entrepreneur". At the beginning, he planned to leave after one or two years and start spending time on "side jobs" that he was interested in. During his tenure, in addition to initiating the AI Grant project, which will be described in detail later in this article, he also founded California YIMBY (Yes In My Back Yard), which is dedicated to solving the housing shortage in California.
**But it didn't take long for him to discover that Microsoft's new CEO, Satya Nadella, was a leader to learn from -- a manager who was open and always looking for something higher. **In 2017, Nat sent an email to Satya, proposing to acquire GitHub—even though Nat had just joined the job, Satya still made the largest developer-related acquisition in history a week later, and it was also Microsoft’s largest at the time. The large-scale acquisition case was fully authorized to him.
In 2018, Microsoft acquired GitHub amid doubts, and appointed Nat, who "makes developers most at ease"**, as CEO. On his first day in office, Nat wrote: "I will not ask for your trust, but I am committed to earning your trust." The result did not disappoint everyone: he not only kept the platform independent and neutral, but also maintained the independence and neutrality of the developer. The community has won a good reputation; created the star product GitHub Copilot, which continued to expand GitHub's influence; also acquired six companies including NPM, Semmle, Dependabot, and PullPanda, resulting in healthy growth in revenue and users, and finally handed over to Microsoft A good answer sheet.
Among them, I have to mention the building process of GitHub Copilot-this is also the best presentation of the efficient collaboration between Satya and Nat.
On June 11, 2020, OpenAI released GPT-3. Nat was shocked by its own capabilities and related demonstrations, and decided to do something about this model immediately. But at the time, he didn't know what it could be used for. Fortunately, the visionary Satya has already established a cooperative relationship with OpenAI, which gave Nat ample space to explore in uncertainty.
Soon, Nat found several outstanding developers in the GitHub community. With the question of “How to make a model that often makes mistakes”, everyone began to explore in two directions: Chatbots and Code generation. Two months later, they found that the direct application of GPT-3 to the chat scene was not enough - too large a model brought too high a delay, and it was difficult for users to really like such a chat object; and the development started in February of the following year Copilot is another way of thinking. It is like a "little assistant" sitting on the user's shoulders, solving problems with the user, appearing from time to time, helping the user to patch the code, and even generate a complete function-like a random winner Slot machines, while making users find it useful, are also a bit addictive-on June 29, 2021, GitHub Copilot was officially released, and has since been loved by millions of programmers.
On November 3, 2021, Nat sent an email to the GitHub team** "I am continuing my next adventure: providing support, advice to founders and developers who are using technology to create the future and seize some big opportunities and Investment"**, thus becoming a full-time investor.
Nat Friedman's life credo As humans, we have the right (perhaps our moral responsibility) to reshape the universe to our liking - technology, indeed knowledge, makes this possible - we should probably try to raise the ceiling, rather than the bottom line
** Enthusiasm matters! **- It is much easier to do something that interests you - Perhaps because of this, it is easier to do big things than small things - Progress requires energy as a necessary input
Moving Fast Matters- Because of more frequent contact with reality, we learn more per unit of time- Acting fast keeps us focused on what matters; no time for nonsense-"Slow is fake"- A week is 2% of a year - time is the denominator
The Efficient Market Hypothesis is a lie - at best it is a data loss badly revealed - the best things in life happen where EMH is wrong - in many cases modeling the world as 500 people vs 8 billion People are more accurate - "most people are other people"
WE KNOW LESS THAN WE THINK - Replication crisis is not an exception - Many things we believe are wrong - We often don't even ask the right questions
Cultural ban on micromanagement is detrimental - great individuals should be fully empowered to exercise their judgment - the goal is not to avoid mistakes; the goal is to achieve an irrelevant level of excellence in some dimension - the downsides are worth it
Small teams are better - faster decisions, fewer meetings, more fun - no need to split work for political reasons - no place for mediocrity (also pay more!) - large projects Projects are intellectually easier to solve than they appear - many tech companies are 2-10 times overstaffed
** Where do we get our own dopamine from? **- Answers can predict our behavior - It is better to get dopamine from improving your own ideas, not from validating them - It is OK to get dopamine from "making things happen"
We can do more than we can imagine- We are bound by invisible traditions- Laws of physics are the only limit
Genius Daniel Gross
「The most surprising part of the experience was how much it meant for someone to believe in me.」
Daniel Gross is undoubtedly a talented teenager.
Born in 1991, the year Nat Friedman started surfing the Internet, Daniel spent the first eighteen years of his life in Jerusalem until he graduated from high school. In his hometown, Daniel has always considered himself an "outlier". He has few friends and no enthusiasm for life, but programming is an exception-this is the only thing he loves, because in the world of programming, he Get maximum freedom to do what you want- the only limit is your imagination. **
In 2009, after graduating from high school, Daniel was admitted to Bnei David Academy, a famous pre-military academy in Israel, but he still did not find friends with similar interests and his own life goals. Not long after, Daniel's father reposted an article about Y Combinator (YC), an entrepreneurial project in Silicon Valley. Fundraising, but importantly, this 18-year-old boy discovered that maybe YC was the gathering place for the "outsiders" he had been looking for—so, in a deserted Israeli army camp, he used an old Nokia phone to communicate with A bulky laptop, completed the YC application, and thus unlocked a "whirlwind life journey".
In 2010, Daniel successfully passed the YC interview, arrived in Silicon Valley* (*Because "evading" service violated the local law, Daniel has never returned to Israel since then)*, founded a company called Greplin, and developed personal assistant application products - also It was at YC Demo Day that Nat noticed this particular young man.
Subsequently, Greplin received two rounds of investment from top investment institutions such as Sequoia Capital, changed its name to Cue, and was acquired by Apple for about 40 million US dollars in 2013. Daniel has since become Apple's technical director, responsible for machine learning With the search business -- and Daniel had just turned 23, and it all happened so quickly.
From the youngest founder supported by YC and Sequoia to being acquired by Apple, Daniel firmly believes that the first step to success is to find a community composed of "outsiders", and the second step is to find "outsiders" who dare to be unknown. "Outsiders" who bet, that is, early investors. **Thus, Daniel began to explore the field of early investment. Since 2013, he has successively invested in Uber, GitHub, Coinbase, Instacart, Opendoor, Airtable, Figma, Gusto, Notion, Cruise and other companies as an individual investor , This is indeed a brilliant report card.
But Daniel officially embarked on the road of early investment in January 2017-he resigned from all other positions and returned to YC as a partner, not only investing in the field of artificial intelligence, but also integrating artificial intelligence technology into this institution In July of the same year, he joined Nat Friedman to co-lead the AI Grant project; in August 2018, he left YC and established Pioneer, which aims to help underdogs from all over the world start projects quickly, Find more "Lost Einsteins" [2] 」。
Self-Reflection Skills by Daniel Gross The most important skill we can develop is a natural curiosity about ourselves. Once we develop the habit of constant self-reflection, we develop an appreciation and gratitude for both good and bad experiences. I want to discuss two aspects: interaction with others and interaction with yourself.
** INTERACTIONS WITH OTHER PEOPLE Our goal in life should not be to win any one particular game, but the sum of all games. **In order to do this, we need to be good at working with others: we can't be too sharp or we'll never be invited back to the team; we can't be nervous or we'll never produce anything.
When we interact with our environment, information output is produced. We tell things, and people have opinions about what we say. Some people are insensitive to the reactions of others, which is a serious mistake, and they are throwing valuable "training data" in front of us. If we don't retrain our models based on the input of the masses, we will never converge to the truth (close to the truth), we will become those who are too loud or not loud enough, and the social group will not give us the opportunity to cooperate further. Because they predicted we wouldn't contribute.
If we want to keep getting invited back to play, to be a likable player.
Interaction with Yourself We may all have long and short term goals to achieve. There are days when we feel great, our minds are clear, and we find ourselves making good progress, and there are days that are terrible—we all have those days. The trick is to treat each day as an opportunity to learn. If at any point we feel that we are not productive enough, ask ourselves: why? What are we doing wrong? Is lunch ready? Did someone say something annoying? Did we get any bad news?
Make sure you learn from your successes, not just your failures. What are the common ingredients that bring about a good day? sleep well? good weather? If weather is a factor, should we move to a sunnier location? etc.
It is worth noting that environmental factors occasionally have delayed feedback loops. For example, I find that what I eat affects my mood after about 96 hours - making sure we have a wide enough data collection window.
Another factor that has helped me a lot is meditation. Meditation is like installing a debugger in your brain. It allows us to inspect values (value) immediately - and even change them - instead of just making our code (mind) go wrong.
I'm now "grading" my day every night, trying to dissect what went well and what didn't. I'm fascinated by this act because I see my own progress. I hope everyone tries to force this habit on themselves for a few weeks and loves it, then we become addicted to self-improvement.
Over-reflection An extreme form of self-improvement is what some people call a "chip on your shoulder" and I'm suffering from it. I will be overly self-critical. For example, I ran the New York Marathon. When I crossed the finish line, my first reaction was: "I should run faster." I always feel like I should be doing better.
This is a dangerous propellant. It can push me forward, but if it's not controlled, it's hard for me to be happy. If you share this trait, force yourself to celebrate success. We may underinvest in creating happy memories: when something good happens, take time to celebrate; do something weird and funny so we remember it; add a room to our memory palace. Finally, make sure you surround yourself with supportive friends and family (environment) who will help you unwind.
2017 to 2022 - "Distributed Artificial Intelligence Laboratory"
In March 2016, AlphaGo defeated the top human Go player Lee Sedol in a game. Then 2017 was the famous "year of deep learning framework". The research, products, entrepreneurship and investment in the field of artificial intelligence were unprecedentedly active. The influential paper "Attention Is All You Need" was also published in this year. However, the technology at that time was generally far away from truly generating commercial and social value, and basic academic research seemed to have a variety of directions, but in fact it was introverted and ungrounded.
On April 12, 2017, aigrant.org was launched along with the upsurge and problems in the industry.
** **In the beginning, Nat was the only one who ran the entire AI Grant project. His idea was very straightforward: **Like Tim, provide opportunities for people who "sleeping on the smelly old red sofa" like themselves to realize their dreams . **
Applying to the program is simple: Fill out an application form, be shortlisted to receive a $5,000 grant* (initially limited to five)* to conduct research related to open source AI technologies. The whole process only takes a few minutes to fill out the form, this is the version 1.0 of AI Grant.
So, why choose "open source artificial intelligence technology"? Nat at the time was convinced of two things:
First, open source is the foundation of countless products and ideas that started with creators getting free code over the Internet. Before open source is popularized, the first step in creating something new means building or buying a baseline infrastructure, and with the emergence of new open source projects one after another, the entry price will continue to decrease, approaching zero;
Second, artificial intelligence will be the basis for countless new products, ideas and companies in the future. From automobiles to medicine to finance to education, artificial intelligence will drive a huge wave of innovation in all walks of life. And combined with the first, open source AI technology will lower the cost of entry, allowing more, or even anyone, to participate (but still need to pay for the GPU).
But as for what is artificial intelligence, and what is research related to artificial intelligence technology, Nat has always maintained an open mind-anything that feels like artificial intelligence or contributes to the field-just as we cannot define what is "artificial intelligence" at the moment. Like the original product (AI-Native Product), no one could define what "artificial intelligence" was at that time.
Regarding the review criteria, Nat specifically mentioned two:
1 Smart people with interesting ideas useful to the world;
Apparently Tim and Nat aren't the only ones willing to fund young people's futures.
Six days after the announcement of the AI Grant program, Ann Miura-Ko, founding partner of early fund Floodfgate, who also teaches at Stanford, joined and provided five additional places. She hopes to find "prime force" type of people through this funding plan, that is, people who start with open source projects and will carry out different types of exploration and even start a business in the future.
Just three days before the application deadline for the first round, tech companies joined:
Microsoft will provide these ten grantees with $1,000 in Azure credits redeemable for NVIDIA Tesla K80 virtual machines;
FloydHub will provide 250 hours of NVIDIA Tesla K80 hosting hours, Scale will provide a $1,000 manual data labeling credit, and CrowdFlower will also provide a $5,000 manual data labeling credit.
——This is not just an increase in the value of the grants received, in fact, the content of the grants has become more practical, more easily distributed and more diverse.
The first AI Grant recruitment was a great success. Nat received nearly 500 applications from 50 countries. More than 20 professional volunteers screened projects with him, and finally selected ten candidates a month later. .
In June 2017, Daniel Gross, who has been exploring how to invest in "unknown outsiders", officially joined Nat, became a project partner and positioned the project as "Distributed Artificial Intelligence Laboratory", AI Grant ushered in for another iteration:
Increased funding for technology companies providing infrastructure. On the basis of the previous period, Google replaced Microsoft and will provide each grantee with a virtual machine service credit of US$20,000;
Increased participation in the Network. In addition to the two donors, Andrej Karpathy, then director of artificial intelligence at Tesla, together with researchers from Google, formed the AI Grant expert group. At the same time, as we mentioned earlier, many professional volunteers applied to join Nat's screening team, and they also became part of the AI Grant network, working with experts to help funded researchers;
And despite the addition of the early fund CRV, the amount of the initial cash payout was reduced. For an early project, learning to plan resources and "doing big things with a small amount of money" is very important-$2,500 was the start-up capital required by most researchers at the time.
Since then, the funding model of AI Grant has been iteratively updated on top of this. Similar to the founders invested by Pioneer, the backgrounds of the funded here are also extremely diverse, from Africa to the United States, from high school students to researchers—although it was once Because the artificial intelligence industry is cold, it has to be released sporadically.
As of 2022, the AI Grant has funded more than 50 researchers, of which 36 have received full cash funding through two screenings. Many of them have also created their own companies, two of which have become unicorns, namely Cohere, a large language model company with a current valuation of 2.2 billion US dollars, and Cresta, an intelligent call center company with a value of 1.6 billion US dollars, and another video Helia, a real-time data processing company, was also successfully acquired by Scale.
Russell Kaplan, the founder of Helia, was the first batch of AI Grant recipients. At that time, he was about to graduate from Stanford and was studying the use of natural language to guide reinforcement learning. He built and open sourced a faster-learning deep reinforcement learning agent (agent) , and in Montezuma's Revenge [3] beats most other methods. After graduation, he initially chose to join Tesla and built Tesla's core vision model, HydraNet, a large-scale multi-task neural network, but less than two years later, he co-founded with Ashwin Sreenivas from Palantir and Daniel Berrios from Goldman Sachs Computer vision company Helia, aimed at real-time processing of video information data, was sold to Scale at the end of the following year.
In the same year, the founders of Cohere, Aidan Gomez and Ivan Zhang, who were funded in the second batch, were alumni of the University of Toronto. At that time, their research project was very hard-core - using generative adversarial neural networks for password cracking - which was more than 1,000 applicants at the time It is a very eye-catching existence. With the support of AI Grant, the two established For.ai to do related research. Two years later, Aidan (graduated in 2023), who had just gone to Oxford University for a Ph. D., and Ivan, who had dropped out of the University of Toronto, co-founded Cohere. For.ai is now Cohere For AI, the non-profit research center within Cohere* (*By the way, just a day after this year's AI Grant Batch 1 members were announced, Cohere For AI also started its own AI research grant program )*.
Among them, Zayd Enam, a Pakistani immigrant who was the latest to receive funding, tried Internet medical entrepreneurship in his hometown at the age of 16. Not long after receiving funding, he dropped out of Stanford Ph.D. and started a business with Tim Shi, who had just graduated with a Ph.D. and joined OpenAI for a year. Tour, established Cresta.
2022-Present - Shift to "Early Stage Venture Funds"
In 2022, the artificial intelligence boom will come again. Unlike the last time, the academic research in the field is already rich and colorful, and the related user experience and product innovation have just begun.
In an interview, Nat said: "Daniel and I spent a few years playing with GPT models and were blown away by their capabilities. I was very lucky to design and release GitHub Copilot. After that, I look forward to a series of New products — because maybe more people will go through the same process and find out that GPT-3 can do a lot of incredible things, and then think about whether this ability can be added to different products — but it didn’t happen. So by 2022 In the late summer and early fall of last year, we started to ask ourselves, where did people go? That’s why, we restarted the AI Grant, calling developers to act.”
On August 31, 2022, AI Grant will be restarted again, with a much more "generous" shot, and each investor will receive a cash investment of US$250,000. It is worth mentioning that although Nat has cooperated with many technology companies, only the cloud computing quota of Microsoft Azure has always been promoted by Nat -- **from "open source artificial intelligence technology" to "AI-first products" to "AI-first products" -native products”, from researchers to entrepreneurs, the cost of GPU is always unavoidable. **
2022 Nat Friedman Promotional Twitter Image
2023 Nat Friedman Promotional Twitter Image
In fact, starting from 2020, although Nat and Daniel still appear in the investor lists of various companies in the name of individual investors, they have already quietly raised a venture capital fund C2 Investments with a total amount of about 1.1 billion US dollars and two other investors. Small funds CTRY and ND2100 with a total investment of approximately US$142 million, and through them invest in startups related to artificial intelligence and infrastructure. As part of the investment strategy of the two, AI Grant has also officially completed the transformation from a non-profit organization to a venture capital institution, and is committed to investing in earlier AI-Native products.
As early investors in the vertical field of artificial intelligence, Nat and Daniel's investment strategy is more pragmatic, and they have made great efforts in infrastructure construction and support:
At the beginning of 2023, Nat built nat.dev, a platform that aggregates almost all common language models on the market, making it easy to try and compare different language models;
In June 2023, Nat and Daniel acquired 2,512 NVIDIA Tesla H100 server chips (worth about $100 million, about half the size of NVIDIA's in-house supercomputer) to form the Andromeda Cluster and will invest in them Open to startups — meaning these small startups will have access to computing resources that only well-funded larger companies can afford.
**A basic question: how to define Nat and Daniel, and how to screen AI-native products? **
As an important reference, the answer given by the AI Grant official website is as follows: "Any product that utilizes artificial intelligence models in a useful or interesting way. In particular, we are looking for technical and pragmatic founders who can build great products. If you’re excited about something other people enjoy using, and understand that building something new is only 1% of the idea and 99% of the iteration, then we want to support you.”
"Any product that leverages an AI model in a useful or interesting way" — again, the duo kept it open. In fact, although there is no clear scope, from their interviews, project investment, members of AI Grant Batch 1 and even the "Vesuvius Challenge" previously launched, their preferences for artificial intelligence products and even their attitudes towards technology use can be seen One spot.
C2 Investments' Invested Companies
▌2017|Retool 🦄️
Location - San Francisco, USA
Direction - no-code building tools for commercial software in enterprises
Founder - David Hsu, BSc in Philosophy and Computing from Oxford University in 2017
Investment time - 2017 (5 consecutive betting rounds until 2022)
Other Investors - Patrick Collison, John Collison, Elad Gil, YC, Sequoia, etc.
▌2022|Keen
Location - San Francisco, USA
Orientation - Artificial General Intelligence (AGI)
Founder - John Carmack, co-founded id Software in 1990, lead programmer on Commander Keen, Wolfenstein 3D, Doom, Quake and their sequels; joined Oculus in 2013, CTO
Timeframe for investment - 2022
Other Investors - Patrick Collison, Tobi Lutke, Sequoia, Capital Factory
▌2022|ElevenLabs
Location - London, UK
Directions - Voice Cloning and Generation
Founder - Piotr Dabkowski, graduated from Oxford University with a bachelor's degree in engineering in 2016, and graduated from Cambridge University with a graduate degree in computer science in 2017; he was a software engineer at Google Zurich before leaving to start his own business in 2022. Mati Staniszewskiv, graduated from the Department of Mathematics of Imperial College, UK, will be a deployment strategist for Palantir before leaving to start a business in 2022
Time of investment - 2023 - Other investors - a16z, SVA, Guillermo Rauch, etc.
▌2023|Lexica-Location-San Francisco, USA-Direction-Image Search and Generation Tools-Founder-Sharif Shameem, graduated from the University of Maryland in 2019, founded the P2P cloud game company Vectordash in the same year; established the language model in 2022 Driven low-code tools company Debuild - Time to invest - 2022 - Other investors - AI Grant
26 members of AI Grant Batch 1
Batch 1 member companies not only have diverse product directions, but also have diverse backgrounds of founders, including young people who have just graduated from college (Flair, WOMBO) and experienced serial entrepreneurs (Replicate, Chroma). Most of these outstanding products have been introduced in the previous Newletter. Due to the length of the article, we will not describe each company in detail here, but only list a brief introduction and URL:
infrastructure
Replicate - Cloud infrastructure for machine learning models
🔗
Chroma - Open source embedded database (more colloquially, programmable memory)
🔗
Application layer
🔠 Letters
Perplexity - Search Tool
🔗
ValueBase - Asset Valuation Modeling Tool for Government
🔗
Sameday - Appointment Scheduling Tool for Marketers
🔗
Ghostwrite - Automated email writing tool
🔗
Samaya AI - Knowledge Discovery Platform for Financial Services
🔗
Forefront - Enterprise Chatbot
🔗
Dust - Assistant for teamwork
🔗
Circle Labs - Discord contact generation
🔗 (The website is extremely crude, but I really like it!!)
🎨 Vision
Lexica.art - Image search and generation tool
🔗
Recraft - vector graphics and 3D model generation tool
🔗
Flair - Tool for branding content design (mainly product and model graphics)
🔗
Poly - Texture Generation Tool
🔗
WOMBO - Lip Sync video generation tool for consumers
🔗
Sieve - Video processing, understanding and search API cloud platform
🔗
Vizcom - Engineering/design drawing generation tool
🔗
Secret Weapons - Video tools for the film industry
🔗
Pixelcut - Product Photo Generator
🔗
AniML - NeRF-based product video generation tool
🔗
💻 Code
Cursor - code editing tool
🔗
Rowy - Low-code backend
🔗
🎙️ Voice
Play.ht - Speech generation and cloning
🔗
♾️ Multimodal and more
Animato (Call Annie) - video chat with virtual characters
🔗
Brich - Automation of Call Center Operations in High Compliance Industries
🔗
Minion.ai - Automated browser assistant* (product not released yet)*
🔗
Vesuvius Challenge - Artificial Intelligence for Human Civilization
If venture capital funds and AI Grant are Nat and Daniel's investment in promoting the progress of the business world with artificial intelligence, then the two and Brent Seales, a professor of computer science at the University of Kentucky and one of the co-founders of scrollprize.org, will be in 2023 The Vesuvius Challenge (Vesuvius Challenge), which was jointly launched in March 2009, is their exploration to use artificial intelligence to promote the development of human civilization.
The challenge asked entrants to read two unfurled scrolls (the Herculaneum Papyrus) that were carbonized and buried under 20 meters of earth and volcanic ash following the eruption of Mount Vesuvius in AD 79 It is undoubtedly a daunting task. The competition builds on work already done by third sponsor Brent Seales, back in 2015, when he and his team used X-ray tomography and computer vision to "read" the carbonized state of the carbon dioxide found in Israel's Dead Sea region. The N’Goldi Scroll – which presents the biblical text it contains without opening it. However, reading the contents of the Herculaneum papyrus is more challenging: Unlike the denser ink used in the En Gedi scrolls, the Herculaneum ink is carbon-based, while the papyrus is carbon-based. Paper, the two will not contrast in X-rays.
At the same time, the implications of this task for the study of human history are equally enormous - in fact, the amount of ancient literature possessed by humanity might increase if we could fully unfold and read the existing 1,814 scrolls and fragments more than double. However, many of them have been damaged by a large number of unreasonable unfolding attempts before. Except for some Greek philosophical scrolls, which took an Italian monk to carefully unfold and organize for decades, there are more than 600 scrolls that have not been unfolded.
According to the competition's official website, the $1 million grand prize will be awarded to the first team to make any of the fully scanned scrolls readable by 11:59 p.m. Presented in the form of a sketch image, the text is visible and clear, and needs to be accompanied by a detailed technical description of how the solution is reproducible and feasible.
However, there is an additional clause in the competition: "Reduce hallucinations." If there is any risk of hallucinatory results from the team's model, explain how this risk was mitigated in practice, and justify why the submitter himself is confident that the results he obtained are real.
**This is undoubtedly a good opportunity to use new technology to unlock the ancient secrets of mankind. **In addition to the numerous contestants, the rising donations also show us the enthusiasm of various groups for using new technologies to promote the development of human civilization: within a few days of the task release, including Stripe founder Collison brothers, Shopify founder Tobi Nearly 20 entrepreneurs, investors and anonymous individuals, including Lutke and Wordpress founder Matt Mullenweg, joined the list of donors, and the prize money of the competition also quadrupled 👇
Also, it's worth noting that even at the competition, Nat and Daniel are champions of the open source spirit that's rooted in their hearts. "All the organizers of the Vesuvius Challenge strongly believe in open source and incremental progress. We want to encourage open construction and benefit the community as a whole — something that is often inhibited in competition," the competition's official website reads. The competition featured three additional open source prizes worth $2,000.
In an interview with Ben Thompson in March 2023, regarding the development cycle of AI-native products, Nat put forward this point of view: **Due to the mature network infrastructure, the proliferation speed of AI products will be twice as fast as that of the previous generation of Internet products, but we still need time to figure out what real AI-native products look like, not just improving existing workflows and software. *His several interesting specific expressions are as follows (*I don’t think anyone can predict the future after two years and beyond, here is just for reference)*:
Even if researchers stop here and no longer iterate and increase functions, we still need five to ten years to digest the capabilities of GPT-4 and other advanced models and transform them into products. There are so many variations and variants, workflows, and user experiences that need to be invented, reinvented, or permuted, and we're just scratching the surface, trying to bundle these capabilities into existing products.
The operating system needs to be rebuilt around the capabilities of artificial intelligence. Different start-ups have demonstrated that artificial intelligence can have different surreal capabilities, and we can rebuild the entire computing platform in ten years. The current state in the field is that researchers are at the forefront, and there is still a lot of digestion work to be done at the commercial level, and it is difficult to accelerate this process.
The capabilities of artificial intelligence will not stop here, they will continue to develop, and the development trend of the past two years will likely continue. These are large steps of progress. So, even if we do have a native product design for AI capabilities in 2023, we may find ourselves dealing with completely different capabilities and tools at the same time in 2024 - this is a whole new wave of technology that will take time to digest for the product.
Based on this, he also raised a question: If the land (infrastructure) we step on has been changing rapidly, where do we choose to bet?
His answer was this: **To really excel in artificial intelligence today, you must have a deeper understanding of it. **Different from more than ten years ago, starting a business is now a very common thing. The selection effect in Silicon Valley is decreasing. With more people in the pool, it is even more difficult to choose a good direction, especially artificial intelligence. A popular direction, for any entrepreneur, starting a business in this direction will be even more difficult.
But don't be too pessimistic - **When did we really realize that the Internet has become an industry? After the bubble burst. **
As mentioned earlier, the "investment experiment" of Nat Friedman and Daniel Gross also brought me a lot of inspiration and motivation. Therefore, starting from August 1st, I have also launched our AI Grant together with my boss Yusen, my colleagues in operation and marketing, and AWS partners. Although the current support is far less perfect than Nat and Daniel's two predecessors, we hope to grow together with the Chinese developer community - the introduction registration method is at the end of the article.
Einstein was a patent clerk. In Bern
With ideas many thought were crazy
Outsiders often have the weirdest and best ideas
Our goal is to find and fund them.
[1] "IRC" literally translates to Internet Relay Chat, which is an application-layer protocol that is mainly used for group chat, but it can also be used for person-to-person chat.
[2] The term "Lost Einstein" comes from Raj Chetty's study: Despite scoring the same on intelligence tests in early childhood, kids from high-income (top 1%) families were just as likely to be inventors as those from below-middle-income families Ten times as much as a child. "Lost Einstein" was used by Raj to refer to a low-income genius who could have done great things if given the opportunity in the right way.
[3] "Montezuma's Revenge" is an Atari game that represents a broad class of challenging real-world problems known as "hard-exploration problems" in environments with sparse feedback, That is, artificial intelligence models/agents need to learn complex tasks and pass levels through few or deceptive feedback, so it is regarded as a challenge of reinforcement learning.
Reference List|References
Editor: Du Wei, Chen Ping
This paper provides a comprehensive introduction to the construction, potential application, and evaluation of agents based on large language models (LLMs), which is of great significance for a comprehensive understanding of the development of this field and for inspiring future research.
Image source: Generated by Unbounded AI
In today's AI era, autonomous agents are considered a promising path towards artificial general intelligence (AGI). The so-called autonomous agent is capable of completing tasks through autonomous planning and instructions. In early development paradigms, the policy function that determines the agent's actions is dominated by heuristics, which are gradually refined in the environment interaction.
However, in unconstrained open-domain environments, it is often difficult for autonomous agents to act with human-level proficiency.
With the great success of large language models (LLMs) in recent years, it has shown the potential to achieve human-like intelligence. Therefore, thanks to its powerful capabilities, LLM is increasingly used as the core coordinator for creating autonomous agents, and various AI agents have emerged successively. These agents offer a viable path to more complex and adaptable AI systems by mimicking human-like decision-making processes.
*A list of LLM-based autonomous agents, including tool agents, simulated agents, general agents, and domain agents. *
At this stage, it is very important to conduct a holistic analysis of the emerging LLM-based autonomous agents, and it is of great significance to fully understand the development status of this field and inspire future research.
In this paper, researchers from the Hillhouse School of Artificial Intelligence at Renmin University of China conducted a comprehensive survey of LLM-based autonomous agents, focusing on three aspects of their construction, application, and evaluation.
Paper address:
For the construction of the agent, they proposed a unified framework consisting of four parts, which are the configuration module to represent the attributes of the agent, the memory module to store historical information, the planning module to formulate future action strategies, and the action module to execute planning decisions. After introducing the typical agent modules, the researchers also summarize the commonly used fine-tuning strategies to enhance the adaptability of agents to different application scenarios.
The researchers then outline potential applications of autonomous agents, exploring how they could benefit the fields of social sciences, natural sciences, and engineering. Finally, evaluation methods for autonomous agents are discussed, including subjective and objective evaluation strategies. The figure below shows the overall structure of the article.
Source:
Construction of autonomous agents based on LLM
In order to make the LLM-based autonomous agent more efficient, there are two aspects to consider: first, what kind of architecture should be designed so that the agent can make better use of LLM; second, how to effectively learn parameters.
Agent architecture design: This paper proposes a unified framework to summarize the architecture proposed in previous studies. The overall structure is shown in Figure 2, which consists of profiling module, memory module, planning module and action module.
In summary, the analysis module aims to identify what role the agent is; the memory and planning module places the agent in a dynamic environment, enabling the agent to recall past behaviors and plan future actions; Decisions are translated into concrete outputs. Among these modules, the analysis module affects the memory and planning modules, and these three modules together affect the action module.
Analysis Module
Autonomous agents perform tasks through specific roles, such as programmers, teachers, and domain experts. The analysis module aims to indicate what the agent's role is, and this information is usually written into the input prompts to influence the LLM behavior. In existing works, there are three commonly used strategies to generate agent profiles: hand-crafted methods; LLM-generation methods; dataset alignment methods.
Memory module
Memory modules play a very important role in the construction of AI agents. It memorizes information perceived from the environment and uses the recorded memory to facilitate future actions of the agent. Memory modules can help agents accumulate experience, realize self-evolution, and complete tasks in a more consistent, reasonable, and effective manner.
Planning Module
When humans are faced with a complex task, they first break it down into simple subtasks, and then solve each subtask one by one. The planning module endows the LLM-based agent with the thinking and planning capabilities needed to solve complex tasks, making the agent more comprehensive, powerful and reliable. This article presents two planning modules: planning without feedback and planning with feedback.
Action Module
The action module aims to transform the decision of the agent into a specific result output. It directly interacts with the environment and determines the effectiveness of the agent in completing tasks. This section introduces from the perspective of action goal, policy, action space and action influence.
In addition to the above 4 parts, this chapter also introduces the learning strategies of the agent, including learning from examples, learning from environmental feedback, and learning from interactive human feedback.
Table 1 lists the correspondence between previous work and our taxonomy:
LLM-based autonomous agent application
This chapter explores the transformative impact of LLM-based autonomous agents in three distinct fields: social sciences, natural sciences, and engineering.
For example, LLM-based agents can be used to design and optimize complex structures such as buildings, bridges, dams, roads, etc. Previously, some researchers proposed an interactive framework in which human architects and AI agents work together to build structural environments in 3D simulations. Interactive agents can understand natural language instructions, place modules, seek advice, and incorporate human feedback, showing the potential of human-machine collaboration in engineering design.
In computer science and software engineering, for example, LLM-based agents offer the potential to automate coding, testing, debugging, and documentation generation. Some researchers have proposed ChatDev, which is an end-to-end framework in which multiple agents communicate and collaborate through natural language dialogue to complete the software development life cycle; ToolBench can be used for tasks such as code auto-completion and code recommendation; MetaGPT can play the role of product manager, architect, project manager and engineer, internally supervise code generation and improve the quality of the final output code, etc.
The following table shows representative applications of LLM-based autonomous agents:
LLM-Based Evaluation of Autonomous Agents
This article introduces two commonly used evaluation strategies: subjective evaluation and objective evaluation.
Subjective evaluation refers to the ability of human beings to test LLM-based agents through various means such as interaction and scoring. In this case, the people participating in the evaluation are often recruited through crowdsourcing platforms; and some researchers believe that crowdsourcing personnel are unstable due to individual ability differences, so expert annotations are also used for evaluation.
Besides, in some current studies, we can use LLM agents as subjective evaluators. In the ChemCrow study, for example, uatorGPT evaluates experimental results by assigning a rating that considers both the successful completion of the task and the accuracy of the underlying thought process. Another example is that Chat formed a LLM-based multi-agent referee team to evaluate the model's generation results through debate.
Objective evaluation has several advantages over subjective evaluation, which refers to the use of quantitative metrics to evaluate the capabilities of LLM-based autonomous agents. This section reviews and synthesizes objective evaluation methods from the perspective of metrics, strategies, and benchmarks.
We can combine these two methods during usage assessment.
Table 3 summarizes the correspondence between previous work and these evaluation strategies:
For more information, please refer to the original paper.
Original source: Xinzhiyuan
Image source: Generated by Unbounded AI
Recently, IBM launched a brand new 14nm analog AI chip, which is 14 times more efficient than the leading GPU, which can make H100 worth the money.
Paper address:
Currently, the biggest obstacle in the development of generative AI is its astonishing power consumption. The resources required for AI cannot grow sustainably.
IBM, on the other hand, has been researching ways to reshape AI computing. One of their achievements is the simulated memory computing/simulated artificial intelligence method, which can reduce energy consumption by using the key features of neural networks running in biological brains.
This approach minimizes the time and effort we spend on computation.
Is Nvidia's monopoly about to be subverted?
## IBM's latest blueprint for the future of AI: Analog AI chips are 14 times more energy efficient
According to a report by foreign media Insider, Dylan Patel, chief analyst of semiconductor research company SemiAnalysis, analyzed that the daily operating cost of ChatGPT exceeded 700,000 US dollars.
ChatGPT requires a lot of computing power to generate answers based on user prompts. Most of the costs are incurred on expensive servers.
In the future, the cost of training models and operating infrastructure will only soar more and more.
IBM published in Nature that this new chip can reduce the pressure of building and operating generative AI enterprises such as Midjourney or GPT-4 by reducing energy consumption.
These analog chips are built differently than digital chips, which can manipulate analog signals and understand gradients between 0 and 1, but only for different binary signals.
And IBM's new approach is to simulate memory computing, or simulate AI for short. It reduces energy consumption by exploiting a key feature of neural networks operating in biological brains.
In the brains of humans and other animals, the strength (or "weight") of synapses determines the communication between neurons.
For analog AI systems, IBM stores these synaptic weights in the conductance values of nanometer-scale resistive memory devices (such as phase-change memory PCM), and uses the laws of circuits to reduce the need to constantly send data between memory and processor, perform Multiply-accumulate (MAC) operation - the main operation in DNN.
Now powering many generative AI platforms are Nvidia's H100 and A100.
However, if IBM iterates on the chip prototype and successfully pushes it to the mass market, this new chip may very well replace Nvidia as a new mainstay.
This 14nm analog AI chip can encode 35 million phase-change memory devices for each component and can simulate up to 17 million parameters.
And, the chip mimics the way the human brain works, with the microchip performing calculations directly in memory.
The chip's system can achieve efficient speech recognition and transcription, with an accuracy close to that of digital hardware.
This chip achieves about 14 times, and previous simulations show that the energy efficiency of this hardware is even 40 times to 140 times that of today's leading GPUs.
PCM crossbar array, programming and digital signal processing
This generative AI revolution has just begun. Deep Neural Networks (DNNs) have revolutionized the field of AI, gaining prominence with the development of fundamental models and generative AI.
However, running these models on traditional mathematical computing architectures limits their performance and energy efficiency.
While progress has been made in developing hardware for AI inferencing, many of these architectures physically separate memory and processing units.
This means that AI models are typically stored in discrete memory locations, and computing tasks require constant shuffling of data between memory and processing units. This process can significantly slow down calculations, limiting the maximum energy efficiency that can be achieved.
Performance characteristics of PCM devices, using phase configuration and admittance to store analog-style synaptic weights
IBM's phase-change memory (PCM)-based artificial intelligence acceleration chip gets rid of this limitation.
Phase-change memory (PCM) can realize the integration of calculation and storage, and directly perform matrix-vector multiplication in the memory, avoiding the problem of data transmission.
At the same time, IBM's analog AI chip realizes efficient artificial intelligence reasoning acceleration through hardware-level computing and storage integration, which is an important progress in this field.
In order to bring the concept of simulated AI to life, two key challenges need to be overcome:
The computational precision of the memory array must be comparable to that of existing digital systems
The memory array can seamlessly interface with other digital computing units and the digital communication structure on the analog artificial intelligence chip
IBM makes the phase-change memory-based artificial intelligence accelerator chip at its technology center in Albany Nano.
The chip consists of 64 analog memory computing cores, and each core contains 256×256 cross-strip synaptic units.
And, integrated into each chip is a compact time-based analog-to-digital converter for converting between the analog and digital worlds.
The lightweight digital processing unit in the chip can also perform simple nonlinear neuron activation functions and scaling operations.
Each core can be thought of as a tile that can perform matrix-vector multiplication and other operations associated with a layer (such as a convolutional layer) of a deep neural network (DNN) model.
The weight matrix is encoded into the simulated conductance value of the PCM device and stored on-chip.
A global digital processing unit is integrated in the middle of the core array of the chip to implement some more complex operations than matrix-vector multiplication, which is critical for certain types of neural network (such as LSTM) execution.
Digital communication paths are integrated on-chip between all cores and global digital processing units for data transfer between cores and between cores and global units.
a: electronic design automation snapshot and chip micrograph, you can see 64 cores and 5616 pads
b: Schematic diagram of the different components of the chip, including 64 cores, 8 global digital processing units, and data links between cores
c: Structure of a single PCM-based in-memory computing core
d: The structure of the global digital processing unit for LSTM related calculations
Using the chip, IBM conducted a comprehensive study on the computational accuracy of analog memory computing and achieved an accuracy of 92.81% on the CIFAR-10 image dataset.
a: ResNet-9 network structure for CIFAR-10
b: the way to map this network onto the chip
c: hardware-implemented CIFAR-10 test accuracy
This is the highest accuracy reported so far for a chip using similar technology.
IBM also seamlessly combines analog in-memory computing with multiple digital processing units and digital communication structures.
The chip's 8-bit input-output matrix multiplication has a unit area throughput of 400 GOPS/mm2, which is more than 15 times higher than previous multi-core memory computing chips based on resistive memory, while achieving considerable energy efficiency.
In the character prediction task and image annotation generation task, IBM compared the results measured on the hardware with other methods, and demonstrated the network structure, weight programming and measurement results of related tasks running on the simulated AI chip.
LSTM measurements for character prediction
LSTM Network Measurements for Image Annotation Generation
weight programming process
**Nvidia's moat is bottomless? **
Is Nvidia's monopoly so easy to break?
Naveen Rao is a neuroscience-turned-tech entrepreneur who tried to compete with Nvidia, the world's leading artificial intelligence maker.
"Everyone is developing on Nvidia," Rao said. "If you want to launch new hardware, you have to catch up and compete with Nvidia."
Rao worked on chips designed to replace Nvidia’s GPUs at a start-up acquired by Intel, but after leaving Intel, he used Nvidia’s chips in MosaicML, a software startup he led.
Rao said that Nvidia not only opened up a huge gap with other products on the chip, but also achieved differentiation outside the chip by creating a large community of AI programmers ——
AI programmers have been using the company's technology to innovate.
For more than a decade, Nvidia has built an almost unassailable lead in producing chips that can perform complex AI tasks such as image, facial and speech recognition, as well as generate text for chatbots such as ChatGPT.
The once-industry upstart was able to achieve dominance in AI chipmaking because it recognized trends in AI early on, custom-built chips for those tasks, and developed critical software that facilitated AI development.
Since then, Nvidia co-founder and CEO Jensen Huang has been raising the bar for Nvidia.
This makes Nvidia a one-stop supplier for AI development.
While Google, Amazon, Meta, IBM and others also make AI chips, Nvidia currently accounts for more than 70% of AI chip sales, according to research firm Omdia.
In June of this year, Nvidia's market value exceeded $1 trillion, making it the world's most valuable chip maker.
"Customers will wait 18 months to buy Nvidia systems instead of buying off-the-shelf chips from startups or other competitors. It's incredible," FuturumGroup analysts said.
NVIDIA, reshaping computing methods
Jensen Huang co-founded Nvidia in 1993, making chips that render images in video games. Standard microprocessors at the time were good at performing complex calculations in sequence, but Nvidia produced GPUs that could handle multiple simple tasks simultaneously.
In 2006, Jensen Huang took the process a step further. He released a software technology called CUDA that helps GPUs be programmed for new tasks, transforming GPUs from single-purpose chips into more general-purpose chips that can take on other jobs in fields like physics and chemistry simulations.
In 2012, researchers used GPUs to achieve human-like accuracy in tasks such as identifying cats in images, a major breakthrough and a precursor to recent developments such as generating images from text cues.
The effort, which Nvidia estimates cost more than $30 billion over a decade, makes Nvidia more than just a parts supplier. In addition to collaborating with top scientists and start-ups, the company has assembled a team that is directly involved in AI activities such as creating and training language models.
In addition, the needs of practitioners led Nvidia to develop multiple layers of key software beyond CUDA, which also included libraries of hundreds of lines of pre-built code.
On the hardware side, Nvidia has earned a reputation for consistently delivering faster chips every two or three years. In 2017, Nvidia began tuning GPUs to handle specific AI calculations.
Last September, Nvidia announced it was producing a new chip called the H100, which had been improved to handle so-called Transformer operations. Such calculations are proving to be the basis of services such as ChatGPT, which Huang called generative artificial intelligence’s “iPhone moment.”
Today, unless the products of other manufacturers can form a positive competition with Nvidia's GPU, it is possible to break the current monopoly of Nvidia on AI computing power.
Is it possible for IBM's analog AI chip?
References:
Original Source: Light Cone Intelligence
Author: Liu Yuqi
Image source: Generated by Unbounded AI
It may be hard for you to imagine that in a space without a display screen or a mouse, you can complete a 5,000-word article with just a pair of AR glasses and a pocket host.
That's right, on August 26, at the 2023 Rokid Jungle new product launch conference, such a scene is actually happening. At the meeting, Rokid released Rokid AR Studio, a consumer-grade OST (optical see-through) personal spatial computing platform, including two major hardware products, Rokid Max Pro (4,999 yuan) and Rokid Station Pro (3,999 yuan).
Zhu Mingming, founder and CEO of Rokid, said at the press conference: "Spatial computing can be more naturally integrated into daily life and work, and let Rokid AR Studio become your first spatial computer."
This is very different from people's perception of AR glasses in the past. Before this, AR glasses have been "locked" in the entertainment scene, relying on the two pillar industries of film and television and games to survive, but Rokid AR Studio has truly become a personal productivity tool, IM software, writing articles, writing code, searching information, etc. And other work scenarios can be completed with the latest hardware.
**The expansion of usage scenarios allows AR devices to shift from marginalized scenarios to more practical use values. When consumers are willing to pay, the entire AR industry chain will enter the positive cycle of the consumer market. **
Zhu Mingming, the boss who said he is a "social fear", is a complete product and technology control. He once killed two versions of the first draft of the product design internally, which almost drove the product department "crazy". But when the product department secretly took out the designed product, Zhu Mingming immediately ordered all resources to be devoted to this product. "I only care about one statistic, which is the user's usage time. At present, our real user's usage time is close to one and a half hours, and the weekly retention rate exceeds 20%. If this is done, users will grow naturally."
**The accumulated number of users has reached the million level, which also means that the AR industry has entered the second stage of software system and ecological construction. In recent years, more and more system vendors, application software vendors, and content vendors have joined the construction of the AR ecosystem. **
"A group of lunatics, a dream, ten years."
As Zhu Mingming said, it took Rokid 10 years to go from entertainment scenes to productivity tools. Behind this is not only a leap in thinking, but also a big step forward from hardware technology to software technology, and even the entire industry chain. Apple and Rokid have started the second stage of the AR competition, and the competition in the industry is also accelerating.
In the entire press conference, the most surprising thing was not the body of Rokid Max Pro 76g, but only one camera, which was able to complete SLAM (spatial positioning technology), micro-gesture interaction, first-person perspective sharing, Visual positioning VPS capabilities and other integrated interactive methods. **
After experiencing physical interaction (handle), voice interaction, and gesture interaction, AR/VR devices are developing toward eye tracking and the current multi-sensory fusion interaction solution.
However, the interaction of multi-sensory integration has higher requirements for hardware. In addition to meeting the basic needs, it is also necessary to capture user actions and gestures from all directions and from multiple angles in order to accurately complete the interaction.
**How difficult is it to complete SLAM interaction with a single camera? **
The visual SLAM method consists of two modules, one is Tracking, known 3D point position, basic positioning; the other is Mapping, update the position of 3D point. Regardless of which link or method, monocular means that only one camera can be selected, as well as a fixed position and fixed angle, which poses great challenges to the recognition range, tracking speed and accuracy.
"The industry believes that monocular SLAM is unbelievable and difficult to achieve," Zhu Mingming jokingly said, "This may also be an affirmation of Rokid."
At present, the few AR glasses with spatial interaction on the market will be equipped with at least three cameras to undertake algorithm functions. **The difference in visual routes has also formed two camps: VST (video perspective) represented by Apple and OST (optical perspective) represented by Rokid. **
Still taking the Apple Vision Pro as an example, it uses 12 cameras to "stack" fast positioning capture, high-precision panoramic perception, and precise tracking, and uses VST to display the external world on the terminal screen through the cameras. The camera shoots in real time to see the outside world.
However, the method of stacking hardware for interaction has increased the cost and doubled the price at the same time, which has caused two major landing problems: the weight of the machine and the difficulty of mass production. This is the fundamental reason why Apple Vision Pro is priced at $3,499 and will not be mass-produced until 2024.
However, the OST solution that Rokid insists on has certain technical barriers. Due to the complex pipeline design, the limited viewing angle of the display screen, and the high cost of optical components, Rokid can only pass Technological breakthroughs to reduce superimposed costs.
And how does the monocular SLAM that makes the industry think "unbelievable" do it? After the meeting, Lightcone Intelligence had an in-depth exchange with Zhu Mingming, and found that Rokid's "unique trick" is to use AI algorithms to break through the barriers of hardware. **
Zhu Mingming introduced that although the monocular SLAM technology has existed for a long time, it has never been applied to AR glasses. The front camera of the mobile phone also applies this kind of technology. The only difference is: the algorithm.
From AI to AR, this is a road that seems to span but is actually integrated, but it is also because of Rokid’s accumulation in the AI field in the past few years, through the multi-dimensional visual algorithm model, including visual positioning and enhancement, digital human technology , 2D/3D gesture recognition, OCR recognition and other technologies allow AI to land in specific scenarios.
For example, the AR visual positioning and enhancement function is to solve and break through the single-purpose limitation. By constructing a centimeter-level visual map, the virtual information can be accurately superimposed and fused in the real object world to achieve high-precision 3D reconstruction of objects and scenes.
Wang Junjie, vice president of Rokid and head of the XR center, said: "Spatial positioning is based on SLAM technology, and then stable and natural interaction can be performed in space. It takes 1 to 2 seconds to quickly initialize through the algorithm to establish a mapping space."
On the market, most devices still use binocular solutions, but binocular fusion also has many problems. In addition to the cost of adding an extra camera, it is also necessary to continuously use algorithms to fit the data of the two cameras in real time. This leads to more complex issues.
From this point of view, if the monocular solution can be carried out smoothly, Rokid will take the lead in stepping on a technological trend. Previously, Rokid was also the industry's first manufacturer of Station hosts. The solution of separating glasses and hosts has been proven to be the optimal solution for industry experience.
In addition, in the gesture recognition, Rokid adopts the interactive mode of micro-gestures, and you can click and select with a pinch of your fingers; you can also switch the interface or content you are browsing by moving the gesture left and right. Logical definitions such as simple pinch and slide gestures are more natural and get started faster.
According to our on-site test results, Rokid can realize bare-hand space interaction with both hands. At present, Rokid’s gesture recognition algorithm supports complex scene recognition such as horizontal/spatial axis rotation, bright/dark light, etc. At the same time, there are many types of recognizable gestures. , The algorithm is precise, the overall recognition rate is about 90%, and it has millisecond-level recognition response capability and 99% reliability guarantee.
According to Rokid, based on the deep learning algorithm and a large amount of experimental data, the monocular 3D gesture algorithm can reconstruct hand posture parameters in real time on the mobile terminal, including hand 6DoF, hand joint point 6DoF, and Hand Mesh information, providing AR gesture interaction. Good algorithmic basis.
At present, Rokid's gesture recognition can realize a variety of operations in 3D space, including point, pinch, grasp, hold, drag, pull, etc., which can fully meet the needs of AR interactive applications. For example, put on the Rokid Max Pro, stretch out your hand, and open your palm in front of your eyes to call out the menu.
After all, to support such a complex algorithm structure, the hero behind it is not only the camera, but also closely related to the computing power and performance of the "brain", that is, the Rokid Station Pro.
** For a long time, the entire VR/AR industry has had an impossible triangle of "computing power, comfort, and price". Devices with higher computing power tend to be heavier and more expensive, and lightweight devices with high comfort cannot meet the needs of use. **
Judging from the actual situation, there is no "perfect" solution at present. The mainstream manufacturers are trying to find a balance between the two. There are two types of mainstream solutions in the current market: one is represented by Apple. The display and computing are integrated, and the battery is externally connected; the other is the display and computing split design represented by Rokid.
Apple's integrated design integrates two micro-OLED screens, multiple cameras, sensors, speakers and other components, which is more efficient in terms of display effects and calculations, but it will also increase the weight of the body itself, resulting in only Connect the battery externally.
The split design that Rokid insists on maximizes wearability. Compared with Vision Pro’s weight of 454g, the weight of 76g glasses is almost the same as that of ordinary glasses. At the same time, the computing power of the host can also be less limited by space resources, while avoiding to a certain extent Discomfort caused by heat dissipation.
**In general, the split-type route can achieve the two-way ultimate development of the portability of glasses and the computing power of the host, and is also more flexible. The iteration of computing power and the technical route of glasses can be carried out asynchronously. **
Based on the split design, Rokid Station Pro has upgraded its computing power to create an All in One terminal integrating computing, imaging, communication and other functions. It can also be called a "productivity tool". HyperTerminal.
According to Lightcone Intelligence, Rokid Station Pro is equipped with Qualcomm Snapdragon XR2+, 12G RAM + 128G ROM, and supports WIFI6/6E and BT5.1. With heat dissipation and higher performance, it can achieve centimeter-level 6DoF tracking accuracy and extremely low MTP (Motion to Photon) rendering delay.
According to public information, Snapdragon XR2+ is the latest flagship XR platform launched by Qualcomm, which can achieve 50% battery life and 30% improvement in heat dissipation performance, thus enabling a richer and more immersive experience in a smaller and thinner device shape. . At the same time, the Snapdragon XR2 + platform introduces a new image processing pipeline, which can achieve a delay of less than 10 milliseconds and open a full-color video see-through MR experience.
Judging from the on-site experience of Light Cone Intelligence, whether it is watching movies, playing games, or calling keyboards for work and production processes, especially under the high-frequency interaction and fighting of games, the smoothness and response speed of the screen are very silky. slip.
It is worth mentioning that the core algorithm currently on the market is still 3DoF (three-degree-of-freedom tracking), which means that the device can detect the rotation in the three directions of upward, forward and downward, but it cannot detect the spatial displacement of the head, front, back, left, and right. .
The 6DoF algorithm adopted by the upgraded Station Pro can not only detect the change of the field of view angle caused by the rotation of the head, but also detect the six kinds of displacement changes of "up, down, front, back, left, and right" caused by the body movement.
The upgrade of this algorithm is more important in the player's degree of freedom. For example, when fighting zombies under the 3DoF algorithm, the shooting range is at a certain angle in front, but after the upgrade, the zombies appear from 360 degrees, and when you turn around, the body feeling of the zombies behind you is beyond the reach of the former.
In other words, not only is the computing power higher and the experience smoother, but the expansion of the computing power space has also brought about a huge difference in the sense of body.
Said Bakadir, senior director of XR product management at Qualcomm Technologies, said: "The first-generation Snapdragon XR2+ platform is the best choice to enable the next generation of XR experiences. Qualcomm Technologies provides the industry-leading platform for Rokid Station Pro, supporting it to create Its own unique AR application ecosystem."
Of course, the reason why Apple's mobile phone can dominate the mobile phone market all the year round is not only because of its hardware, but also because of its system and ecology. The barriers built by cultivating user habits through software systems are often stronger than the hardware itself.
**This is part of the reason for Rokid's self-developed AR space operating system - YodaOS-Master, but not the whole reason. **
On Rokid Open Day in March this year, Rokid officially launched YodaOS-Master, and released the "AR Space Creation Platform Lingjing", which allows everyone to create AR content in 3D space, and everyone can participate, completely breaking the barriers of AR creation. Threshold, let the ecological potential energy explode.
**If monocular SLAM, 3D gesture recognition, Snapdragon XR+, and Lingjing platform are all sharp blades, then YodaOS-Master can release these tricks through a self-developed system. **
To put it simply, Rokid is taking a road that no one has ever traveled, and Rokid's philosophy is "software defines everything". All software needs to be carried and provided by the system in order to exert its value.
Focusing on the five aspects of perception, understanding, interaction, presentation, collaboration, and digital creation, YodaOS-Master has made a huge upgrade in terms of chip optimization, hardware design, software architecture, AR algorithm, and creation tools. It may be the most complete at present. A set of spatial operating systems for the AR era.
At the press conference, Rokid also demonstrated the openness and convenience brought by the self-developed system. To give a few obvious examples, Based on the self-developed system and the Snapdragon XR+ platform, Rokid has developed a multi-task parallel mode, breaking the previous constraints of only a single task, enabling chatting, writing code, and The scene of viewing documents can be realized at the same time and give full play to the advantages of the large screen in space, so that the production efficiency can be maximized.
**Another extremely innovative case is that Rokid redefines spatial search based on its self-developed system. **Zhu Mingming introduced that this breaks the previous way of displaying search information, and the presentation of search results is no longer a two-dimensional plane effect, but exists in a three-dimensional space. "The results that are most relevant to the question will be the closest to you, and the results that are somewhat relevant are on the secondary page. The farther away, the less relevant. Of course, you can also cross out the previous results and dynamically select the results you want."
In this way, the sense of the future is instantly full, and it also shows the essential difference from the first-stage AR equipment.
**It can be seen that the open ecology of the AR industry has begun to enter the second stage. Apple and Rokid not only move left and right in the direction of hardware, but also in the development of industry system software and ecology. Through the co-creation of hardware, algorithms, software ecology, developers, users and platforms, AR will move towards the second stage of rapid development in a completely open ecology. **
Shi Wenfeng, chief engineer of Rokid system research and development, said, "The YodaOS-Master operating system integrates multiple core technologies of Rokid voice recognition, gesture recognition, SLAM, etc. into system services through a service-oriented approach, and provides a variety of client SDKs for development Developers can develop efficiently, such as SDK for Unity, which allows Unity developers (developer application channel: open platform URL (ar.rokid.com)) to quickly use Rokid core technology for development.”
From hardware to software, from system to ecology, Rokid's development path is a bit like Apple in the Jobs era.
"The AR industry is just before dawn," Zhu Mingming said.
Original source: Xinzhiyuan
Image source: Generated by Unbounded AI
Only 2 days after its release, Code Llama once again ignited the revolution of AI coding.
Remember the mysterious version Unnatural Code Llama that Meta appeared in the Code Llama paper that can fully equalize GPT-4?
Big guy Sebastian explained in his blog:
It is a fine-tuned version of Code Llama-Python 34B using 15,000 non-natural language instructions.
By hiding such a very hidden information in the paper, Meta seems to want to hint to the open source community that Code Llama has great potential, so let’s fine-tune it!
So just now, WizardCoder 34B, which was fine-tuned based on Code Llama, directly defeated GPT-4 on the Human benchmark.
Specifically, WizardCoder crushed the March version of GPT-4 (67%) with a winning rate of 73.2%.
In addition, the performance of WizardCoder 34B exceeds the latest version GPT-3.5, and Claude 2.
The programming model WizardCoder was released in June by Microsoft and Hong Kong Baptist University. A fine-tuned 13B/7B version is said to be coming soon.
According to Jim Fan, a top scientist at Nvidia, this is basically an open version of "Unnatural Code Llama".
While the benchmark data looks good, Human only tests a narrow distribution and may overfit. Data testing in natural scenarios is really important. Coding benchmarks need a major upgrade.
## **A mysterious version of Code Llama was born? **
On Friday, Meta officially open-sourced three versions of Code Llama.
In the Human and MBPP benchmarks, many people found a version not mentioned in the official Meta - Unnatural Code Llama.
This mysterious version achieved 62.2% performance on Human pass@1.
The fine-tuned WizardCoder 34B released today has a performance of 73.2% on Human pass@1.
According to the introduction, WizardCoder 34B is a fine-tuned version of the Code Llama model using the synthetic dataset Evol-Instruct.
The following is a visualization of the performance comparison with all open source and closed source models.
In comparison with the OpenAI model, the researchers pointed out that GPT4 and ChatGPT-3.5 have two Human results:
The results provided by OpenAI's official GPT4 report (2023/03/15) are: 67.0% and 48.1%, respectively. The results of the researchers using the latest API (2023/08/26) test are 82.0% and 72.5%.
In addition, the researchers stress that this performance result is 100% reproducible!
A demo of WizardCoder 34B is open for anyone to test it out.
It has been pointed out that overfitting to public leaderboards is one of the main reasons why open source models struggle in practice. Here is an example of wizard-coder data preparation using Human pass@1 scores to decide whether to further develop the dataset. Optimizing only on the test set defeats the purpose of the test set.
Also just yesterday, researchers from the Phind organization fine-tuned Code Llama-34B to beat GPT-4 in the Human evaluation.
ChatGPT vs. Code Llama
How does Code Llama perform in actual coding tasks?
A netizen did a comparative test of GPT-3.5 and Code Llama Instruct-34B. It was tested with access to Code Llama 34B provided by Perplexity.AI.
It feeds 8 identical code tasks to the two models respectively, and compares the quality of their generated codes.
The result is that GPT-3.5 wins by 8:5.
The following are the specific test results.
first question
Use Python to accomplish this task, given two strings word1 and word2. Merge strings by adding letters in alternating order, starting with word1. If one string is longer than the other, append additional letters to the end of the merged string.
Finally output the merged string.
For example:
Input: word1 = "abc", word2 = "pqr" Output: "apbqcr"
Both GPT-3.5 and Code Llama can complete - 1:1
Second question
Use Python to accomplish this task, given a string s, just reverse all vowels in the string and return it.
The vowels are "a", "e", "i", "o", and "u", which can appear multiple times in both lowercase and uppercase.
For example: input: s = "hello" output: "ello"
GPT-3.5 completed, Code Llama not completed - 2:1
The third question
Use Python to accomplish this task, given an integer array nums, move all 0s to the end of it while maintaining the relative order of the non-zero elements.
Note that you have to do this in-place, without making a copy of the array.
For example: Input: nums = [0,1,0,3,12] Output: [1,3,12,0,0]
GPT-3.5 completed, Code Llama not completed - 3:1
Question 4
Using Python for this task, you have a long flowerbed, some plots are planted with flowers, and some are not.
However, adjacent plots cannot be planted with flowers. Given an integer array of 0 and 1 for a flowerbed, where 0 is empty and 1 is not empty, and an integer n, output true if n new flowers can be planted in the flowerbed without violating the no-adjacent flower rule, Otherwise, false is output.
Example 1: Input: Flowerbed = [1,0,0,0,1], n = 1 Output: true Example 2: Input: Flowerbed = [1,0,0,0,1], n = 2 Output: false
Both models are done - 4:2
Question 5
Using Python, given an input string s, reverse the order of the words. A word is defined as a sequence of non-whitespace characters. Words in s will be separated by at least one space.
Output a string of words joined by single spaces in reverse order. Note that s may contain leading or trailing spaces or multiple spaces between two words.
The returned string should have only one space to separate words. Do not include any extra spaces.
Example: Input: s = "the sky is blue" Output: "blue is sky the"
Both models completed - 5:3
Question 6
Use Python to accomplish this task. Given a string s and an integer k, return the maximum number of vowels in any substring of length k in s.
The vowels in English are "a", "e", "i", "o" and "u". Example: Input: s = "leetcode", k = 3 Output: 2
Explanation: "lee", "eet" and "ode" contain 2 vowels.
Both models are done - 6:4
Question 7
Use Python to accomplish this task, given a string s that contains asterisks *. With one operation, you can: Select an asterisk in s.
Removes the nearest non-asterisk character to its left, and removes the asterisk itself. Output the string after removing all asterisks. Example: Input: s = "leet**cod*e" Output: "lecoe"
GPT-3.5 is done, but Code Llama is not - 7:4
Question 8
Use Python to accomplish this task, given an integer temperature array representing the daily temperature, return an array answer, where answer [i] is the number of days after day i you have to wait for warmer temperatures.
If there is no day in the future to do this, keep the answer [i] == 0. Example: Input: Temperature = [73,74,75,71,69,72,76,73] Output: [1,1,4,2,1,1,0,0]
Both models completed - 8:5
Regarding the performance of the two models, this netizen believes that this is not a rigorous study, but a simple test. Every time the model is regenerated to generate code, it can basically get a better answer, but there is no test.
So the conclusion of the test is not the performance of the final two models.
Since the release of Llama and Llama 2, the machine learning community ChatGPT has exploded, and various fine-tuning models have sprung up.
OpenAI researcher Jason Wei said that he learned from Meta GenAI social activities that Llama 3 and Llama 4 will also be open source in the future.
We have the computing power to train Llama 3 and 4. Our plan is to make Llama-3 as good as GPT-4. Wow, if Llama-3 is as good as GPT-4, will you open source it? Yes, we will. Sorry, alignment staff.
Another netizen said that Meta hopes to open source a GPT-5 level model, and it seems to have insisted on open source before AGI.
I want to be clear about what this means: no kill switch.
If something goes wrong—an agent goes out of control, or a bad actor weapons it—there's no easy way to shut it down. It can run on any small cluster. There is no security at all.
Security research becomes meaningless.
All the work people have done to make AI systems honest, consistent, ethical, etc. becomes meaningless. The world's AI systems will evolve toward whichever system yields the greatest economic benefit, regardless of their values or motivations. There are no guardrails. Anyone can change the AI's values or capabilities at will, for better or worse.
If Meta continues to be open-sourced while we get smarter AI, then it's clear to me that things will get messy. The arrival of these extraterrestrial intelligences is already messing up the world, but it will be even worse if we give up what little control humans have.
As far as I know, Meta's hope for open source is mainly derived from the "open source community dogma", that is, "open source is good". And as far as I know, they weren't that pro-open source until the accidental leak of their first model, the Llama, and they've been pretending to be open source ever since.
In this regard, Musk said that, however, the LLM using autoregressive Transformer has extremely poor energy efficiency, not only in training, but also in reasoning. I think it's off by several orders of magnitude.
## Llama 2 coding ability soars
Llama 2 is a very strong model in all aspects.
However, it has a very obvious weakness - the ability to code.
According to the data in the paper published by Meta on Llama 2, Llama 2's performance in Hum (a benchmark test for evaluating LLM and coding) is even worse than GPT-3.5, not to mention worse than GPT-4 how much.
Annotated figure from the original Llama 2 paper
But code ability will definitely be an important direction for the open source community to use Llama 2 in the future. Naturally, Meta cannot be poor in this direction, so there is Code Llama, which is greatly optimized for code ability.
Two days ago, Meta officially released the Code Llama family: Code Llama (7B, 13B and 34B), and 3 variants: the general code model Code Llama, the instruction follow model Code Llama-instruct and the Python code-specific version Code Llama- Python.
These models are free academic and commercial, as are the Llama 2 licenses.
The code ability of Code Llama 34B model is almost twice that of Llama 2, greatly narrowing the gap with GPT-4.
Remember the Unnatural Code Llama that Meta appeared in the Code Llama paper, which can fully equalize the GPT-4 version?
Big guy Sebastian explained in his blog:
It is a fine-tuned version of Code Llama-Python 34B using 15,000 non-natural language instructions.
By hiding such a very hidden information in the paper, Meta seems to want to hint to the open source community that Code Llama has great potential, so let’s fine-tune it!
Why is there no 70B Code Llama model?
Interestingly, Code Llama only has 7B, 13B and 34B parameter versions, which is 70B less than Llama 2.
Although Meta did not explain why this is the case in the paper, technology guru Sebastian offered two possible reasons:
Since the training data of Code Llama is only 1/4 compared with that of Llama 2, it may be because there is not enough training data, coupled with the limitation of LLM's Scaling Laws, the performance of CodeLlama70B is not good.
In contrast, Llama 2 only supports input lengths up to 4k. If the 70B model is to support an input length of 100k tokens, it may make the model's computational requirements too exaggerated.
References:
Source: Xinzhai Business Review, Author: Xin Yi
Image source: Generated by Unbounded AI
The launch of ChatGPT at the end of November last year ignited a melee related to the field of AI such as global large models and AIGC. In the past 200 days or so, from big Internet companies to ordinary users, they have all set foot in large models, trying to seize the latest trend of technological development and achieve a "turnaround against the wind".
The anxiety of large-scale models from big manufacturers has also spread to ordinary users. On the one hand, they began to try to learn AI large-scale models, and on the other hand, they also used AI in their daily lives and work.
On social platforms such as Douyin, Xiaohongshu, and Station B, the search volume and playback volume of AI tutorial content such as ChatGPT and Wenxin Yiyan have already exceeded 1 billion.
Obviously, these social platforms have accelerated the spread of large models in third- and fourth-tier cities, allowing the use of large-scale model AI tools to penetrate from first- and second-tier cities to third- and fourth-tier cities.
Some young people have tried to start their own work with large models at home and abroad.
Xiao Hao is a designer of an e-commerce company in Linyi, and at the same time began to use AI tools to create some unique paintings. Cute and lovely 3D cartoon character avatars; ultra-realistic, multi-perspective AI landscape photos; cyberpunk scene paintings full of futuristic sense of technology; paper-cut art design incorporating national style...
Some time ago, Xiao Hao also used AI drawing tools to design a set of product renderings for a cross-border e-commerce company that makes furniture in Shandong.
Under normal circumstances, the birth of a set of product renderings requires multiple processes. First, draw the product plane, and then use 3DMAX to model the plan and render the picture. It usually takes two to three hours for a rendering, and it takes 5-7 days; in terms of cost, according to the price list given by Zhubajie.com, a piece of furniture renderings is about 200-300 yuan.
Xiao Hao said that the company started to get in touch with AI tools for the purpose of reducing costs and increasing efficiency, and found him under the introduction of a friend, and asked him to help produce pictures with AI tools.
"A total of about 60 different types of renderings have been generated, including scene diagrams with different details generated based on the product diagrams provided by the company, reference diagrams for various furniture style designs, promotional diagrams for factory introduction, etc." Xiaohao said, It took about two hours to complete this set of drawings.
"The company is very satisfied and believes that it can completely replace the previous renderings."
More and more small merchants in third- and fourth-tier cities value the development potential of AI painting. AI tools can not only reduce time and cost, but also avoid some infringement risks for merchants.
For Xiaohao, his use of AI tools is undergoing a transition from "innovation" to "practicability". At the beginning, he paid more attention to the innovation and uniqueness brought by AI tools, staying at the level of interest. Now, more and more companies are looking for him to do projects. "Now I have certain requirements for controllability." Xiao Hao said.
In addition to helping furniture companies use AI to draw pictures, Xiao Hao was also approached by an e-commerce brand that sells cat food.
All their previous cat pictures, due to some copyright reasons, need to be taken by themselves. Going to a pet store to rent a cat, professional pet photography, long-term shooting, retouching, etc., consume a lot of time and labor costs.
"By changing the keywords, I can change the cat's breed, shooting angle, and the cat's movements and expressions, and design exaggerated scenes and pictures to achieve an effect that is almost impossible to achieve in daily shooting. It took 3 hours."
**Using AI can not only reduce costs, but also avoid the infringement of cat pictures. **
Xiao Hao said that currently Midjourney stipulates that if you are a paying user, the copyright of the generated pictures belongs to the individual, and the platform has no copyright, but it will also involve some other infringements, which need to be avoided artificially.
"For example, if you want to generate a Doraemon or a Mickey Mouse image, because the big model must know these two cartoon images, if you use AI to generate a well-known image, then there will be a risk of image infringement, but the generated Cats are generally not at risk of copyright infringement, after all, there are so many cats in the world, if a cat is randomly generated, it cannot be exactly the same as a copyrighted cat.”
**In addition to the need to pay attention to infringement issues during the AI creation process, consistency in certain generation scenarios is another problem. **
For AI, it cannot maintain a unified style in different scenes. How to ensure that every picture you generate is the same little boy is difficult for you to control. You can only try to get as close as possible through multiple debugging. There is no way to be exactly the same. This is the limitation of current AI.
In addition, another limitation is that it does not recognize some special scenarios, or it is not involved in the pre-training process.
"Some time ago, during the Dragon Boat Festival, I wanted to use AI to generate a picture of a dragon boat, but it didn't know the dragon boat. The generated picture was a dragon, a boat was a boat, and it couldn't draw a gourd." Xiao Hao said , compared with Midjourney, domestic large-scale models such as Wenxin Yige are more accurate in recognizing this more Chinese-style scene. After optimizing these advantages, it is a good selling point for domestic large-scale models.
At present, AIGC still has a long way to go, but its advantages such as high efficiency, low cost and avoidance of infringement have won the favor of many merchants in third- and fourth-tier cities.
Programmer Xiaofan seems to have an advanced perception of the new technology development track. After ChatGPT became popular, he wanted to use ChatGPT as a children's picture book to realize cash.
"I want to do a lot of things. When I graduated from university, I wanted to do smart home, ride-hailing and so on. Didi didn't exist at that time." Xiaofan said that these were all his ideas 10 years ago.
With the explosion of ChatGPT, the powerful capabilities demonstrated by AIGC made Xiaofan smell business opportunities again.
Xiaofan said that now ChatGPT is not as intelligent as we imagined. It can be used to write event planning and story outlines, but the answers given are too naive. Writing novels is not logical enough, but writing simple children's stories is completely sufficient.
Currently, Xiaofan's children's picture books have been launched on the Amazon platform.
In February this year, more than 200 new books signed by ChatGPT have been listed on Amazon, the world's largest book sales platform.
Brett Schickler, who works in sales in New York, USA, used AI tools such as ChatGPT to spend hours creating a children's book "Smart Little Squirrel: The Story of Saving and Investing". The book, which went on sale in January through Amazon's own website, has already earned him hundreds of dollars.
Xiaohao mentioned above is also using AI tools to produce children's picture books in addition to receiving business orders, but Xiaohao's AI picture book publication has encountered some troubles. Due to funding problems, Xiaohao's partners can only support him in a preliminary exploration. Publishing is not intended to proceed.
This AI children's picture book is a story about mythical beasts.
It took Xiao Hao an hour to complete this set of picture books from writing the outline of the story to drawing and typesetting. Although AI tools can greatly reduce time costs, there are still many problems.
"Using ChatGPT, Wenxinyiyan and other large language models can only provide a story outline, and the details inside need to be improved. In addition, after the first image of the animal is generated, if you want to control the image of the animal that appears in the next scene and the It will be very difficult for the first one to be consistent, and it will require a little bit of manual debugging, but it cannot guarantee that the style and tonality are completely consistent.” Xiaohao said that the emergence of AI tools is also lowering the threshold for painting.
In the past, creating a painting required the original artist to spend a long time exploring one direction, but now with the assistance of AI tools, works of different styles will appear in a few seconds.
On Zhihu, Bilibili, CSDN and other platforms, many netizens also shared their stories of using AI tools to create children's picture books, and there are even detailed tutorials so that Xiaobai can easily complete the picture book production in a few hours.
In the context of artificial intelligence, book production is no longer the exclusive right of professional institutions, and the book creation of ordinary users with the blessing of AI is reshaping the entire industry ecology.
Qianqian graduated from Henan Normal University in Xinxiang in June this year, and is now applying for schools in the UK.
When applying for a school, because she forgot her email password, Qianqian needed to write an email to tell the school her new email address, so as not to miss future messages.
"Because most of the things were written by the intermediary teacher for me, I applied for this school myself, and it was not under the scope of my intermediary management, so I was super collapsed at the time, thinking about how to solve this matter, my sister told Me, they will have a fixed format, and then fill in the rest of the content by myself, and give me her ChatGPT account, this is my first contact with ChatGPT.”
"Tell ChatGPT directly what you want. It will help you set up the framework and explain the details that need attention. You can rewrite it according to your actual situation. Personally, it is better than the email sent by the intermediary teacher to the school. "Qianqian said.
Qianqian is now using ChatGPT to write emails to contact the landlord.
"Because the intermediary is too expensive, I need to find social housing on the website by myself, so I need to write an email to contact the landlord. ChatGPT has saved me a lot of time. Now I have developed the habit of checking my email three times a day."
"Convenience" is Qianqian's evaluation of AI tools. "Before, you had to ask a lot of people, check a lot of information and organize the information you can get. Now AI tools can help you organize it and ensure the fluency of the language. In our In the process of use, the main function of ChatGPT is actually more to build a framework, propose some novel ideas, and deal with some tedious text work.”
In addition to using AI to write emails, Qianqian is also doing some part-time writing of manuscripts.
Sometimes when encountering some strange text manuscripts, she will choose to use AI to build a framework, and then continue to fill in the content.
"At that time, I received an order to write a lesson plan, and I used Bing and ChatGPT to write it at the same time, but ChatGPT will give you a writing idea, and Bing will give you a fully formed lesson plan. After that, you only need to write the lesson plan. , will use Bing."
Qianqian also said that many students around her have access to large-scale model tools such as ChatGPT and Wenxinyiyan. Some students even use ChatGPT as a search engine, which has become one of the most commonly used apps in daily life.
Chen Sizhe is a graduate student at school and also an automobile R&D engineer in a company in Hefei.
The first time he came into contact with ChatGPT was in a smart social governance class. The teacher showed the use of ChatGPT to the students through a video connection with foreign friends. Chen Sizhe asked ChatGPT: What is the internal and external environment that T3 company is facing now? . But ChatGPT does not seem to know T3 company, and the specific answer has nothing to do with T3 company.
Although the first experience of ChatGPT was not good, Chen Sizhe was still "into the pit".
"I often use ChatGPT and Wenxin Yiyan. They can give accurate and quick answers to some popular science and common sense questions. I sometimes use them to help me write small papers on topics. It will help me quickly build a paper framework. There will be some novel viewpoints that can be used for reference.” Sizhe also said that although these AI tools can be used to quickly collect a large amount of academic materials, the viewpoints in it need to be carefully screened, and the source of information needs to be verified.
Not only Chen Sizhe believes that AI tools have bugs in the field of academic research, but Qianqian mentioned above also said that it is very unreliable to use ChatGPT for scientific research. When helping her teacher to prepare for scientific research, Qianqian used ChatGPT as an auxiliary background investigation, "All the references that ChatGPT gave me are fake, and I am really curious about how it can be edited like the real one. "
When asked if she would use AI tools to write a thesis, she clearly answered no.
The school’s plagiarism checking system has an AI similarity rate screening. Articles generated with AI have obvious related words and can be easily identified. Even some articles that are not written with AI will be judged as high-risk.
Qianqian’s graduation thesis got stuck in the screening of AI similarity rate. “My classmate was almost driven crazy at that time. He didn’t use ChatGPT, and he didn’t even know about ChatGPT. Some structures may be similar to those generated by AI, but in the end it was The AI is judged to be highly similar."
Qianqian finally said that the teacher often told them to use AI tools correctly and not be taken away by AI.
**After all, AI tools are crutches, not legs. **It can help us solve a lot of repetitive and cumbersome operations, but AI can't answer what it hasn't seen, just like you can't dream of something you haven't seen, human creations and thoughts, AI still can't substitute.
The scope of use of large models is wider than we imagined. From a global perspective, technology giants represented by Microsoft and Google and start-ups such as OpenAI and Anthropic focus on C-end users. In China, Baidu Wenxin, Xunfei General large-scale models represented by Internet giants such as Xinghuo, Alibaba Tongyi, and Huawei Pangu focus on B-side applications.
**Large models are quickly implemented and applied in first-tier cities, and they are also beginning to take root among young people in third- and fourth-tier cities. **
It is foreseeable that the large model after the disenchantment is completed will continue to be deeply connected with people's life and work. At the same time, AI is also impacting the last imaginative territory of human beings. For us, the best way is not to wait for it to mature, but to take the initiative to understand, be familiar with and even master it, so that we can become the driver of AI and ensure that we will not be left behind by the times.
AIGC applications are profoundly changing our work and life, and interior design is also deeply affected. Through innovative AI technologies, we are able to reimagine and design our interior spaces like never before, opening up entirely new possibilities for the living experience.
In this issue of Unbounded Talk, Jason, the manager of "Designer's Toolbox", AIGC creator, algorithm designer, and architectural designer, was invited to let him show us the application and mystery of AI in the field of interior design!
The following is the exciting content shared in this field-
AI-assisted interior design, the ideal state is that AI can help designers complete the design process from 0 to 1. For example, a simple wall layout drawing can directly generate a three-dimensional drawing with various information such as construction and home decoration for designers through AI tools, forming an information model.
But the actual implementation will be more complicated. Taking the input terminal in the early stage as an example, in addition to the basic wall layout and space layout, there are also things like the owner's preferences, project budget, and even specific ground paving, hard decoration materials, soft decoration options, etc., each of which is different. type of output. How to better combine these AI tools with input and output?
According to the technical level, the interior design can be divided into 3 different stages:
1. Consulting stage: This is a stage where the commissioning and undertaking of the plan has not been finalized. Using large language models such as ChatGPT will be very good.
2. Conceptual Design: Customers hope to see as many customized solutions in different styles as possible at this stage that meet their personal preferences. Use AI drawing tools such as Stable Diffusion to assist designers and maximize work efficiency.
3. Conceptual modeling: For a conversion from image to 3D model, based on the unique scene of interior design, a 3D model can be generated through a simple plane sketch or layout drawing, but it has not been realized yet AI solution for fine modeling.
4. Design deepening: The current mainstream AI tools cannot assist designers very well, and traditional CAD or BIM models can be selected.
The Jason team tried to make several small tools, mainly focusing on the early stage of design consultation, concept design and concept model, and made some attempts in the later stage of design development.
In the consulting stage, for design companies, the large language model represented by ChatGPT has strong generalization. But when implemented at the application level, companies will have their own databases and corresponding needs, which can be organized into a vector library, and at the same time cooperate with AI agents to build an internal marketing or data retrieval platform.
Designers can also use such small tools to better organize their own resources, convert it into a vector knowledge base, and then use AI, a more efficient retrieval method, to improve the time cost of the information retrieval stage.
Jason shared some application technology routes based on the extension of the large language model: prompt word engineering, enterprise knowledge base, AI agent, large model fine-tuning and other four aspects:
**(1) Prompt word project: **The construction cost is relatively low. Based on the large model, it can be constrained by inputting prompt words, so that this model can be used as an auxiliary design tool to assist the designer to complete some work. But at present, it may be more suitable for some less rigorous and divergent scenarios, such as writing novels, or doing some simple analysis.
(2) Enterprise Knowledge Base: The cost will be slightly higher, and some professional R&D teams are required. It is necessary to have a certain understanding of the internal knowledge structure of the enterprise, and be able to efficiently convert them into a vector knowledge base for easy retrieval. It is very suitable for the application scenario of sales.
(3) AI agent: A simple understanding is an agent, or an agent, each agent will do one thing he is good at. Through the set rules and constraints, these different agents can be combined to form a complete workflow. Compared with the former two, it requires R&D personnel who understand business scenarios very well.
(4) Large model fine-tuning: The direction with the highest cost, and the cost of computing power is the bulk of it. For companies that are not in the direction of AI or the Internet, it is not suitable for fine-tuning in this direction, and you can consider cooperating with corresponding companies.
There is a pain point in the profession of designers - there are a lot of specifications and data that need to be memorized.
Faced with this situation, designers can use AI to build a platform that can be used on the web or even on mobile phones, and input design data accumulated within the company or personally, such as "What are the commonly used sizes for restaurants?" and so on. The AI model can combine the input knowledge base to output a very accurate answer. For some design specifications, AI can also give designers a more accurate reminder.
Because large models are trained on previously informative data, their strength lies in generalization. But for internal company documents, documents that have not even been released to the Internet, these AI models cannot be accurately consulted. Therefore, it is necessary to use the unique knowledge base of enterprises or individuals for training based on these large models in order to implement the application scenarios of information retrieval.
This extends a new application scenario, AI agent, which has a very promising prospect.
For example, in an e-commerce scenario, it can realize question-and-answer dialogues with customers in the form of sales, and can also search through different tools in the background to help users retrieve information, assist in generating floor plans, images, and even rendering models.
In fact, generating floor plans, generating images, and generating rendering models are three different AI agents, and each workflow is responsible for different tasks.
Based on the large model and the professional sorting out of industry business scenarios by enterprises and practitioners, these AI agents can be integrated and combined into an efficient system.
Users can input different information through this interface, and let AI recommend different schemes and furniture combinations, etc. Even the furniture combination given by AI can be set and selected from the input product library to truly help users or enterprises integrate their own business with AI in-depth scenarios.
Back to the interior design scene. The designer will first determine the design style, because the style itself is very diverse, and the needs of users are often changing. At the same time, the style itself will also affect the overall color composition of the interior, the choice of furniture and soft decoration, etc. If the designer does not allow users to determine the style in the early stage, the subsequent workload will be very heavy.
Therefore, when designing in the industry at present, a plane is determined first, and then a style is determined, which is more detailed. Therefore, various styles are trained first to create an AI model. When the user brings a floor plan, the designer uses the AI tool Assisted Rendering to quickly generate and switch styles.
This tool will help customers have a quick feedback and experience in the early stage, increase the work efficiency of enterprises and practitioners, and at the same time improve the user experience and help the company get business.
In addition, the e-commerce scene also has a need for publicity of furniture or decorations, or a need for reference pictures for customers or designers.
In this scenario, the pictures quickly generated using SD still need to be optimized before they can be used. It can be combined with product tonality, or even adjusted according to product target customers, to create a customized AI model to meet different scenarios and business needs.
In addition to assisting concept rendering, AI can also assist modeling. At this point, the design requires that the information be accurate and feasible, and many supply chains will be involved.
First of all, design a multi-modality model, which can be simply understood as a model that can be generated through text or voice, or a model that can be generated through pictures, images, videos, etc. Conversely, this model can also deduce these different forms of information.
AI image generation is a process from scratch. For example, using the concept design of multiple V-shaped flower pots, using open source algorithms, you can generate different rough models in the early stage. Its shape is relatively accurate and can reflect the Image object structure. At the same time, through some other traditional optimization algorithms, this relatively rough model can be converted into a 3D mesh that can be edited again, and even retopology.
In this way, the designer can generate several AI concept designs in 10 seconds, and then use this small tool to quickly convert them into an editable thing in 30 seconds, which will greatly save the time to start modeling from 0 , Improve the designer's work efficiency.
So, what can AI do in terms of deepening maps?
Perhaps the current open source technology has no way to efficiently combine with the later implementation of interior design. Because the drawing technology that AI is currently good at is still based on 2D space. The real design scene has very high requirements on the scale and accuracy of the overall space, as well as many details.
"I believe that with the maturity of 3D large models and multi-modality, this efficiency will gradually be built, and it will become more and more mature." Jason said that at present, it still needs to be combined with traditional technology to assist in the completion Deepen the drawing.
The first application scenario is somewhat similar to the AI marketing scenario shown at the beginning, combined according to the company's internal furniture materials and products. For example, AI can be used to produce cost images, but there will be corresponding product links in this image. This can be regarded as using AI images to convert the information contained in it into product information.
The second scene is about material migration. Interior design can be simply understood as being composed of visual layer, geometric layer and even other finer layers. The advantage of AI is that it can quickly generate an overall space with a sense of atmosphere. Although this kind of light and shadow may be inaccurate, it can still help us quickly find the color scheme and space composition.
If the information contained in the image generated by AI can be extracted and transformed into our model, it can speed up the modeling process of the designer.
Ruiya: What do you think will be the first field where AI will land in the interior design work? Is it design creative rendering or something else?
Jason: It must be the design creativity, and it is more concentrated in the early stage. Now we are talking about cost reduction and efficiency increase, which is a matter that can increase the efficiency potential of enterprises to a certain extent. Whether it is based on online marketing or SD-based quick drawing, it can upgrade the customer experience of customers. There is also a greater chance of attracting more customers.
But rendering may not become the mainstream in the future, because AI or SD images are inaccurate in terms of indoor light and shadow generated. Based on these deviations, there will be greater deviations in the landing effect.
**Ruiya: Have you considered making a tool that allows C-end users to make their own design drawings in a very simple way? **
Jason: It is actually quite difficult to make such a product. In particular, there are companies in the industry that have accumulated over the years, such as Kujiale and Sanweijia. They use AI technology to make it easier to make better products in this area. I think small teams have almost no chance in this regard.
But when it comes to this, such products in the future will definitely become simpler and simpler, and the role of designers in the concept design stage will be greatly reduced. I think that the core value of designers may need to be transferred in the future. It is more about how to better implement the concept map that the owner likes.
**Ruiya: In your current observations, are there any examples of designs using AI that have been implemented? **
Jason: If it is completely from concept to completion, I haven't seen this one yet. But if it is an AI-assisted concept rendering, the owner can determine the style through the concept drawing and other applications.
**Ruiya: Do you think that AI is easily affected by the bias of the training set when performing assisted design, resulting in a lack of diversity in design results? **
Jason: There must be such a situation. Although we are adding our own data set training model on the basis of the large model. But these training maps themselves may also be maps that can be found on the Internet, and other alchemists can also find them.
However, the problem of homogenization may also be affected by trends or trend factors from a design perspective. Under a certain trend, image materials on the entire Internet may be biased towards this style.
**Ruiya: How can designers still have the ability to control and adjust the final result during AI rendering? **
Jason: For a designer, it's not that hard for you to control the outcome. I think the difficulty lies in whether the plan that the designer thinks can meet the needs of the owner.
**Rui Ya: Do you think AI will change our understanding of design aesthetics? **
Jason: Definitely. With the popularity of AI tools, everyone's aesthetics will actually be raised to a higher level. Therefore, under this trend, designers must improve their own aesthetics. At the same time, this will also increase the innovation requirements of designers. One source of innovation is that everyone needs to change the traditional way of looking for reference pictures to do design. So I think future designers, or those who want to be high-end designers, will have to change this way.
**Ruiya: Through the continuous iteration of the training set, do you think that in the future, will AI have the ability to perceive and predict the future needs and trends of users? **
Jason: Yes. What AI is good at is processing a large amount of data to summarize and even make some predictions.
**Ruiya: Until now, there are still a large number of traditional interior designers who are worried that AI may replace their jobs. In this situation, how would you argue to them that designers are irreplaceable in terms of creativity and human touch? **
Jason: The problem of AI replacing traditional designers is actually the pursuit of reducing costs and increasing efficiency in the entire economic society. The AI revolution is actually replacing some of the most repetitive and non-creative work.
Creative designers can never be replaced by tools. If you are a more technical interior designer, it is recommended to embrace AI technology and learn more about its underlying principles, because no layman can do industry model training or packaging of high-quality data sets.
In addition, designers who are good at communication can actually strengthen their own abilities. Of course, you may need to change your thinking, that is, how to use these tools to help you find more customers better, or build your personal IP or brand.
Various industries are talking about reducing costs and increasing efficiency. At present, it is not difficult to achieve cost reduction with the assistance of AI. But if AI tools are used to achieve better efficiency, it is another long-term topic, and many new opportunities will also arise from it.
Source: AI dark horse
Author: Unfinished Research by Qiming Venture Partners
AI dark horse recommendation:
Previously, we shared the research reports of Tsinghua University and Tencent Research Institute (see previous articles). The report shared this time comes from Qiming Venture Partners. This artificial intelligence report has clarified ChatGPT, Gen-AI, large models, multimodal...
一| In 2022, ChatGPT was born and a breakthrough was made in the application of diffusion models. This year is also known as the year of generative artificial intelligence.
二| In 2023, the large model will reach a peak, and generative artificial intelligence will enter the stage of innovative application of general artificial intelligence.
三| In 2024, China will have a multilingual general-purpose model comparable to GPT-4.
四| Before 2025, modals such as Video and 3D will usher in milestone models, greatly improving the generation effect.
五| The ecology of generative artificial intelligence includes infrastructure layer, model layer and application layer. Innovation is initiated at each level, and competition is also launched among technology giants, industry leaders and start-up companies. In the face of this revolutionary technology, whether actively or passively, enterprises are involved. Whether it is a technology incumbent, an innovator or an adopter, the business model will change, which in turn will affect the development of the enterprise.
Lu| At present, generative AI is still in the early stage of technological development, and the infrastructure and core technologies are immature; technology giants are busy developing large models, and have not taken into account the depth of specific application scenarios. But when the giants add similar functions is always the sword of Damocles hanging over the head of start-ups, and the expansion of the boundaries of large-scale model capabilities may also crowd out the development space of start-ups in the future. In the blue ocean, there are also hidden reefs on the road of development.
百| The large model is not only used to generate articles and pictures, but also can be used as an intelligent agent to help manage and perform more complex tasks. The open-source model realizes low-cost, miniaturized, and professional training, and competes and complements the closed-source basic model, which jointly promotes the application of generative artificial intelligence technology and accelerates the deployment of models to the edge and mobile terminals.
八| Large models of generative artificial intelligence are increasingly developing towards multimodality, and embodied intelligence has also become an important research direction, helping generative artificial intelligence to better understand and deal with the complexity and diversity of the real world.
九| Before the emergence of a more promising large language model, in order to achieve better results in the vertical field, the following three methods will coexist: 1) Use more general data for General large-scale model pre-training without special introduction of industry data; 2) Use industry-specific data to fine-tune (Fine-Tuning) general large-scale models; 3) Use data sets with a higher proportion of industry data for vertical model pre-training.
** Pick | ** Embodied AI represented by PALM-E shows great potential in the direction of robot perception, understanding and decision-making, but there are still big challenges in training and reliability. In the short term, Transformer is becoming the mainstream network structure of multiple modalities, but a general method for compressing the entire digital world has not yet appeared, and Transformer is not the end of artificial intelligence technology. The current generative AI market is in the early stage of technology dominance, and there are opportunities for platform companies with a market value of hundreds of billions of dollars.
Image source: Generated by Unbounded AI tool
Source: Titanium Media
Text: Photon Planet
Author: Wu Kunyan; Editor: Wu Xianzhi
Unknowingly, NetEase has become one of the most popular Internet targets in the eyes of investors.
According to Wind data, Netease-S (9999.HK) recorded a 48% increase in 2023, surpassing JD.com-SW (9618.HK) to become the fourth Internet target market value in Hong Kong stocks. "Win numb" is not an exaggeration to describe NetEase.
As for Netease's performance in the latest quarter, it does not live up to market expectations. The financial report shows that NetEase’s revenue in the second quarter was 24 billion yuan, a year-on-year increase of 3.4%. Compared with the steady growth of overall revenue, what may excite the market is that the net profit from continuing operations attributable to the company’s shareholders under non-GAAP reached 9 billion yuan. Yuan, a year-on-year increase of 67%.
In addition to the data in this financial report, the most bright spot is the application of Netease AI in its multiple business lines. The financial report shows that in addition to self-developed games that have allowed AI to penetrate into the workflow, NetEase Cloud Music, NetEase Media, NetEase Youdao and other business lines have AI applications to varying degrees: NetEase Cloud Music launched AIGC-assisted creation led by NetEase Tianyin , NetEase Media has digital people dialogue with Musk to innovate the news scene, and NetEase Youdao has launched a large model in the field of education.
It’s just that the emergence of AI has not changed the status quo that NetEase has been a game company “cloaked in the skin of an Internet company” in the past few years. The game business has long accounted for more than 70% of revenue.
This may also be the main reason why NetEase can achieve good market performance-the game business provides it with strong certainty. What's more, in this quarter's financial report, the possibility of the mobile game "Naishuihan" that NetEase has placed high hopes on and the possibility of switching to "Eternal Tribulation" that breaks the paywall for free has not yet been shown.
Relying on the revenue and development of the game business to drive the overall business, NetEase's path is unique and difficult to replicate.
It is inevitable that there will be followers under the trend of AIGC and large-scale models, whether it is the large-scale Chat bot application that spread from Chat GPT, or a large number of Vincent graph companies that have emerged since Stable Diffusion, most of them are start-up companies Or the waist enterprises that have failed to form a consistent technological development path.
In contrast, leading companies with more mature technological development and stable business often have their own paradigms. They not only focus on embedding AI capabilities into existing businesses to help business development, but also promote the development of business to feed back the development of AI technology. Not small. For example, Dingding Slash, Volcano Engine, etc., and the same is true for Netease.
Netease's AI application can be traced back to 2017, when AlphaGo defeated Li Shishi and opened the first round of AI outlets. Netease founded the AI research institution Fuxi Lab, focusing on the implementation and development of AI in the game and pan-entertainment industries. In the same year, the NetEase Interactive Entertainment AI Lab was also established, gradually spreading the scope of AI applications from games to other businesses, and the AI development path with games as the "experimental field" has begun to emerge.
The so-called "experimental field" means that NetEase's AI technology will first try to implement in the game field, expand more possibilities of implementation while increasing the efficiency of the game business, and then export to other fields. For example, the AI painting platform "Danqingyue" jointly developed by NetEase Fuxi and Leihuo Art Center, and the early AR applications verified in games such as "Onmyoji" and "Decisive Battle in Ping'an Jing" have been expanded to marketing, cultural tourism and other scenarios.
In fact, video games, regarded as the "ninth art", are one of the most information-intensive categories in human society. They not only include art, music, film and television and other art forms and drive their development, but also Inseparable from AI, leading the evolution of cutting-edge technologies represented by AI.
It’s just that the traditional game AI is limited by the times, basically it is the integration of the rules formulated by the developers, and it fails to show strong intelligence and interaction capabilities. A typical example is the NPC in the RPG game that has a fixed dialogue with the player. Today, the combination of AI and games has already broken through the existing extension and entered more links of game production. This has been since domestic mainstream game manufacturers such as NetEase, Tencent, Mihayou, ByteDance, etc. have established their own games. It can be seen in AI Lab.
The difference between NetEase and other competitors is that the game business occupies an absolute majority of revenue. Not only does it have more scenarios and room for trial and error, but the game AI can also gain more investment in research and development funds.
Today, Netease game AI has gradually demonstrated its landing value on the way to multi-lane development. For example, technologies such as voice-driven mouth animation, video motion capture, and automatic frame insertion in the field of computer vision have been successfully implemented in "Fantasy Westward Journey 3D Edition" and many second-dimensional games under NetEase, and even exported in cooperation with non-NetEase games. Value, or the first domestic unmanned loader launched by Netease Fuxi in July this year.
Nvidia CEO Huang Renxun, who dominates the computing power dividend in the current large-scale model era, once mentioned in his speech "The Power of Vision" that at the beginning of Nvidia's founding, "the only purpose was to use technology to make video games possible." The Chinese Academy of Sciences also published major research According to the results, "Game technology is a hard-core technology that is 'delayed' by fun." Over the past 50 years, video games and cutting-edge technologies have always been in a relationship of mutual growth and mutual promotion.
Now Netease's performance just further proves the importance of games in the evolution of technology. However, unlike Nvidia, Nvidia's products are the underlying "infrastructure" of computing power, which can be applied everywhere, but Netease's AI capabilities and technologies need to face the problem of crossing scenarios.
Since the game is the first scene for the application of AI technology, the landing itself is to serve the game content and match the requirements of the game content.
For example, the AI capability embedded in the mobile game "Naishuihan" released by Netease on June 30, whether it is AI-assisted face pinching, AI lyrics, or the "GPT experience" brought by intelligent NPC, its application purpose is for players to experience and The improvement of game quality is naturally oriented to the C-end in terms of product orientation.
Functions such as face pinching, lyric writing, and smart NPC are out of the game context. They are only unsurprising content generation functions in the era of large models, which can be presented after investing in underlying computing power and a certain period of training. In other words, the sparks of these AI capabilities are only "dazzling" on the premise of colliding with the game, because players are only pursuing a more immersive experience. As for the function itself, there is no obvious advantage over many players in the AI section.
A typical example is AR, which appeared a few years ago. Except for games, most of the practical scenarios are concentrated in the marketing field. The fact that users can obtain gamified experience in marketing just proves that although AI capabilities are verified by games, its The scene is also likely to be limited by the game.
More importantly, there is a certain distance between the embedding of AI capabilities and commercialization - many AI capabilities are not the core of the game content and do not require players to pay for the experience.
As for Netease’s 2023Q1 financial report, “Self-developed AI technology has been applied to the entire process of game industrialization, and AI technology can improve the work efficiency of key links by as much as 90%”, which is relatively a more B-side story. It's just that the application scenarios are also more restricted, and the front of the value output is more peers. Moreover, the core Vincent graph circuit in the game content workflow has not yet reached the height of industrialization, and a large number of open source high-quality models are also threatening the establishment of its business model.
In other words, NetEase's AI capabilities have taken root in game production, but in the face of serious business scenarios from the outside world, simply reusing its functions is obviously not enough. So how can the AI capabilities that have been implemented bridge the scene gap?
NetEase seems to be unable to give a clear answer to this, but on the basis of its own business, it forcibly expands new lanes for AI landing. This quarter's financial report mentioned that NetEase has built an "application-oriented" R&D system to promote the transformation of cutting-edge achievements such as AI large models into specific products and services, which also positively demonstrates NetEase's R&D investment orientation.
The unmanned loader mentioned above is the pilot product jointly developed by Fuxi and China Construction Eighth Bureau. This product attempts to gamify the engineering workflow through human-computer interaction, image generation and other capabilities. However, according to public information, it is expected that the product will take three years to actually land.
A former employee of NetEase said: "The game has become the umbrella of NetEase AI. Product orientation makes NetEase AI more likely to reproduce existing achievements in the game, but lacks the ability to innovate. Naturally, it is difficult to be competitive after leaving the game."
In addition to the main task of the game, NetEase has no shortage of "branches" such as music and education, and AI naturally "reaches from here to there".
At the 2023 World Artificial Intelligence Conference (WAIC) on July 6, 2023, NetEase presented a variety of large models, covering content creation (music, painting), education, and newly expanded industries. Except for education, NetEase has already launched Youdao Dictionary Pen equipped with the relevant capabilities of the "Ziyue" large-scale model in the form of products, and other capabilities have not yet shown a clear path to commercialization.
Taking the content creation section as an example, Netease has opened to the public the application "Danqingyue" with the ability to carry the Wensheng graph model "Danqing". And support for multiple rounds of multi-modal corrections, etc.; and the AIGC music creation function "Netease Tianyin" that is open to Netease musicians in Netease Cloud Music.
None of the above products have been commercialized, but have taken the lead in the community field under Netease's consistent product and user path. Among them, Danqingyue only opens up the AI painting experience on WAIC, and the further platform product, the intelligent art platform, has not yet been launched; while the AIGC music creation function has set off a wave of AI creation in the NetEase cloud community, but the audience for the creation works is very small. Among NetEase Tianyin's works, only the most popular music has reached 999+ in terms of comments.
What's more worth mentioning is that, or affected by the strong "landing" demand, the large models released by NetEase so far are all vertical large models, and the capabilities and parameters of their bases are still in the state of black boxes. The progress in the field of AI large models can only be determined by the implementation of the application.
At present, NetEase has a large-scale model Ziyue family that shows signs of commercialization, and is also facing fierce competition from players such as Xunfei Xinghuo and Xiaodu in the track.
Ding Lei, CEO of Netease, once said that if there is a man-made aircraft carrier in the Hundred Models War, it is natural for someone to drive an aircraft carrier. Currently NetEase's top priority is to explore innovative applications of AI large models at the fastest speed.
And the application needs the underlying support after all, otherwise it is just a castle in the air. Since the evolution of the 100-model war, music, education, painting, office and other tracks have become increasingly crowded. Whoever can show the ability to turn AI from a toy into a tool earlier and start commercialization will be able to seize the lead at halftime. machine and mind. At present, it seems that NetEase is still far from this goal.
Of course, NetEase can also rely on the game to continue to iterate AI capabilities and feed back the game business, chasing the "Oasis" in "Ready Player One", the holy grail of the game industry. The market is looking forward to NetEase's answer whether the large model is for NetEase to start the next era of games or the next era of NetEase.
Me from yesterday (August 25): Open source LLM will beat GPT-4 in a few months for code generation. Me now: Today, actually.
Yesterday, Meta open source code Llama, a basic model specializing in code generation, is free for research and commercial purposes.
There are three parameter versions of the Code Llama series of models, the number of parameters is 7B, 13B and 34B. And supports multiple programming languages, including Python, C++, Java, PHP, Type (Java), C# and Bash.
Code Llama versions provided by Meta include:
In terms of its effect, different versions of Code Llama have a generation pass rate (pass@1) on Human and MBPP datasets that surpasses GPT-3.5.
In addition, the pass@1 of Code Llama's "Unnatural" 34B version on the Human dataset is close to GPT-4 (62.2% vs 67.0%). However, Meta did not release this version, but achieved significant performance improvements through training with a small amount of high-quality encoded data.
Source:
Just after a day, some researchers challenged GPT-4. They come from Phind (an organization that aims to build an AI search engine for developers), which beat GPT-4** in human evaluation with **fine-tuned Code Llama-34B.
Phind co-founder Michael Royzen said: "This is just an early experiment aimed at reproducing (and surpassing) the "Unnatural Code Llama" results in the Meta paper. In the future, we will have an expert portfolio of different CodeLlama models that I think will be competitive in real-world workflows. "
Both models are open-sourced:
The researchers published these two models on Huggingface, and everyone can go to check them out.
Next, let's see how this research was implemented.
Let's look at the results first. This study fine-tuned Code Llama-34B and Code Llama-34B-Python with Phind's internal dataset, and obtained two models, Phind-CodeLlama-34B-v1 and Phind-CodeLlama-34B-Python-v1, respectively.
The two newly obtained models achieved 67.6% and 69.5% pass@1 respectively on Human.
For comparison, CodeLlama-34B pass@1 is 48.8%; CodeLlama-34B-Python pass@1 is 53.7%.
And GPT-4 pass@1 on Human is 67% (data released by OpenAI in the "GPT-4Technical Report" released in March this year).
Source:
Source:
When it comes to fine-tuning, data sets are naturally indispensable. The study fine-tuned Code Llama-34B and Code Llama-34B-Python on a proprietary data set containing about 80,000 high-quality programming problems and solutions.
Instead of code completion examples, this dataset uses instruction-answer pairs, which is different from the Human data structure. The study then trained the Phind model for two epochs, with a total of about 160,000 examples. The researchers said that LoRA technology was not used in the training, but local fine-tuning was used.
In addition, the research also adopted DeepSpeed ZeRO3 and Flash Attention2 technologies. It took them three hours to train these models on 32 A100-80GB GPUs, with a sequence length of 4096 tokens.
In addition, the study applied OpenAI's decontamination method to the dataset to make the model results more effective.
As we all know, even the very powerful GPT-4 will face the dilemma of data pollution. In layman's terms, the trained model may have been trained on the evaluation data.
This problem is very tricky for LLM. For example, in the process of evaluating the performance of a model, in order to make a scientifically credible evaluation, the researcher must check whether the problem used for evaluation is in the training data of the model. If so, the model can remember these questions, and when evaluating the model, it will obviously perform better on these specific questions.
It's like a person already knows the exam questions before taking the exam.
In order to solve this problem, OpenAI disclosed how GPT-4 evaluates data pollution in the public GPT-4 technical document "GPT-4Technical Report". They disclose strategies for quantifying and assessing this data pollution.
Specifically, OpenAI uses substring matching to measure cross-contamination between the evaluation dataset and the pre-training data. Both evaluation and training data are processed by removing all spaces and symbols, leaving only characters (including numbers).
For each evaluation example, OpenAI randomly selects three 50-character substrings (if less than 50 characters, the entire example is used). A match is determined if any of the three sampled evaluation substrings is a substring of the processed training example.
This produces a list of tainted examples, which OpenAI discards and reruns to obtain an untainted score. But this filtering method has some limitations, substring matching can lead to false negatives (if there are small differences between evaluation and training data) as well as false positives. Thus, OpenAI uses only part of the information in the evaluation examples, only using questions, context or equivalent data, but ignoring answers, responses or equivalent data. In some cases, multiple choice options were also excluded. These exclusions may lead to increased false positives.
For this part, interested readers can refer to the paper for more information.
Paper address:
However, there is some controversy over the Human score Phind used when benchmarking GPT-4. Some people say that the latest test score of GPT-4 has reached 85%. But Phind replied that the relevant research that derived this score did not conduct pollution research, and it was impossible to determine whether GPT-4 had seen Human's test data when undergoing a new round of testing. Considering some recent research on "GPT-4 becoming stupid", it is safer to use the data in the original technical report.
However, considering the complexity of large-scale model evaluation, whether these evaluation results can reflect the true capabilities of the model is still a controversial issue. You can download the model and experience it yourself.
Reference link:
Recently, the combination of large models and robots has been very eye-catching, from the "embodied intelligence" proposed by the team of Professor Li Feifei of Stanford University to the "domestic light" of Yushu, Zhiyuan and other brands recently. At the 2023 World Robot Conference a few days ago, humanoid robots performed various skills on the spot, bringing science fiction into reality, and various companies moved their robot-related products to the scene to show off their skills. Afterwards statistics found that there were 160 domestic and foreign robot companies participating in the conference, and 600 robots, among which humanoid robots were the most popular. The famous founder of Boston Dynamics, Marc Raibert, and the famous Japanese roboticist Hiroshi Ishiguro traveled thousands of miles to attend the conference.
There is no doubt that humanoid robots are becoming a new hot spot in the capital market. In the primary market, BV Baidu Ventures, Jingwei, Hillhouse, Gaorong, Zhenge, etc. are actively researching on the front line. The venture capitalists who talked about big models in the first half of the year are now concerned about general-purpose robots. In the secondary market, humanoid robot concept stocks have been hyped for several waves. In May, Musk’s remarks directly caused the A-share robot concept “Saimo Smart” to rise to the limit, and “Fengli Smart” also rose by more than 150% within six trading days, and even attracted the attention of the Shenzhen Stock Exchange. It is required to explain the reason and rationality of the large increase in the stock price.
At this year's Tesla shareholder meeting, Musk said: "The humanoid robot Optimus has significantly enhanced the control of motion and force, as well as environmental perception, and the technology is rapidly iterating. It is estimated that the demand for robots in the future may reach 10 billion , or even more. If the ratio of humans and robots is calculated as 2:1, then the demand for humanoid robots may be much greater than that of electric vehicles.” With these few words, Musk’s confidence, Enthusiasm and devotion quickly ignited the flame of the humanoid robot track, and the trend has intensified to this day. From the perspective of practical application, it may also be Tesla that promotes the maturity and perfection of the entire industrial chain.
Tesla's humanoid robot Optimus made its debut at the official 2022 AI Day event last year, and completed autonomous walking, turning around, stopping, waving and other actions on the spot. Most of Optimus' technologies are consistent with those of Tesla vehicles, such as machine vision, and the "brain" that processes visual data, makes action decisions, supports communication, and most importantly, is consistent with Tesla vehicles The chip is also equipped with the same FSD computer as the Tesla vehicle and the neural network technology related to Autopilot. It is expected that the final price will not exceed 20,000 US dollars, or about 144,000 yuan.
** From the perspective of practical technology, companies like Tesla have inherent advantages in making robots, because many basic principles of robots and AI are the same, and it can also be regarded as a natural extension of electric vehicles. Electric vehicles can Considered the first generation of four-wheeled robots**. Two years ago, when Musk proposed to make a robot, he attracted a lot of ridicule that he was "not doing business properly". However, at the beginning of this year, Li Auto's vision was to become the best AI and robotics company, not the world's largest electric vehicle company. It can be seen that dreams will also spread.
Robots are not uncommon now. It is the task generalization ability that determines how far a humanoid robot can go. This is also the direction that many companies are making efforts. The "domestic lights" such as Yushu and Zhiyuan that have been mentioned before will not be repeated here. There are other start-ups that are also worthy of attention, such as Yuequan Bionics. The dexterity of its robotic products is already comparable to that of human hands. The human-like bionic intelligent dexterous hand independently developed by the company can be adjusted by active movement, flexion, flipping and other actions under external interference to keep the held object from falling. The official description of it is: "In addition to basic grasping, pressing and other actions, you can also complete 27 different complex and fine hand operations, such as using chopsticks to pick up small objects, apply skin care products, stir coffee, swipe mobile phones, unlock Buttons, etc." The dexterous hand adopts the drive technology of tension and compression body, which has a very high degree of freedom. At the same time, a flexible sensor is built in, with tactile neurofeedback.
In addition, there is the "Wukong-4" humanoid robot from the robotics team of the School of Control, Zhejiang University. According to reports, "Wukong-4" can adapt to various terrains such as outdoor roads, grass, and muddy roads, and the fastest forward speed can reach 6 kilometers per hour. , can also jump 0.5 meters high, and can also pass up and down 25-degree slopes and 10 cm high steps. Under unknown disturbances such as slippery road surfaces and external thrust disturbances, it can quickly restore balance and maintain stable walking. "Wukong-4" realizes the robot's three-dimensional environment map construction and autonomous dynamic navigation by integrating leg-foot movement technology and environmental perception technology.
Behind these new robots are a series of software and hardware technologies independently developed by enterprises and universities: humanoid robot bodies based on proprioceptive drives; self-developed high torque density modular joints and integrated structural design; high-strength alloys, carbon fibers and Advanced materials such as engineering plastics retain the beautiful appearance and improve the strength and stability of the structure; coupled with the large language model and advanced force control algorithm, it has high dynamic performance and can better understand human beings.
From the pursuit of investors, the entry of leading technology companies, to the innovative research of start-up companies and universities, humanoid robots are advancing in many directions and ushering in a new stage of development. For example, integrating a large model and equipping it with a brain is another key technical variable driving the upsurge of humanoid robots.
As I said before, "embodied intelligence" is "an intelligent body that has an actual body and supports physical interaction", which is equivalent to adding a body to AI, which belongs to a new development path for general-purpose robots. Capital is going to add more firewood to the industry. After five to ten years of large-scale investment, it will burn more vigorously, and finally realize the commercialization of general-purpose robots. ** For a long time in the past, the development of robots was limited to a specific type of work, just like AI, which was once equally limited. The large model is gradually developed to AGI, and the combination with the robot can naturally expand the application range of the robot. **
Compared with China, the progress abroad is one step faster, and the ability of the large model has been upgraded from the language to the execution layer. In July, the robots of Li Feifei's team could pull drawers, unscrew bottle caps, and weigh apples. The Robotics Transformer 2 (RT-2) launched by Google DeepMind at the end of the month continued to conduct in-depth research in the same direction. RT-2 is a brand new Vision + Language + Action (VLA) model that can learn from network and robot data and translate this knowledge into general instructions for robot control. RT-2 demonstrated better generalization, understanding beyond the semantic and visual domains of the robotic data it was exposed to, and was able to interpret new instructions and respond to commands by performing basic reasoning.
As the saying goes, “Ideals are full, but reality is skinny.” While robots and AI are accelerating their development, there are still many practical technical and commercial challenges. It is said that the field of general robotics also needs technologies or products such as GPT-4 in order to combine multi-modal capabilities and truly unify the development of embodied intelligence. But this is not an easy task. The combination of robots and large models shown in current papers and some demos still focuses on solving interaction problems, but it does not mean that robots can become general-purpose robots after solving interaction problems. From the perspective of the development of robots, the large model does contribute but has limited impact on the underlying control and execution. At present, the academic community adopts AI-driven methods, and generally hopes to use the method of reinforcement learning as the underlying control implementation, but this has no direct relationship with the large model, and the control methods of reinforcement learning are not mature, and most of them are still in the academic research stage. .
Another difficulty lies in the co-evolution of software and hardware capabilities. Although many people believe that the combination of large models and robots will bring disruptive opportunities, Marc Raibert, founder of Boston Dynamics, said that hardware engineering and software are equally important in the development of robotics in the future. "Some people think that software can overcome all the problems and limitations of hardware. I don't agree with this view. Only the best hardware designers and software designers work together to design the best robots in the world."
In addition, security needs to be improved. The "AI illusion" of the large model may not have any substantial impact, but once it enters life, robots need to ensure accuracy and safety. These are the directions in which technology needs to be improved. Technology, scenarios, costs, safety, opportunities and challenges are coming at the same time, and humanoid robots are taking a crucial step towards the future.
Source: Qubit
Ali open source large model, and a new one~
Following Tongyi Qianwen-7B (Qwen-7B), Alibaba Cloud launched the large-scale visual language model Qwen-VL, and it will be directly open sourced as soon as it goes online.
Specifically, Qwen-VL is a multi-modal large model based on Tongyi Qianwen-7B, which supports multiple inputs such as images, text, and detection frames, and supports the output of detection frames in addition to text.
For example 🌰, we input a picture of Arnia, through the form of question and answer, Qwen-VL-Chat can not only summarize the content of the picture, but also locate the Arnia in the picture.
In the test task, Qwen-VL demonstrated the strength of the "hexagonal warrior". In the standard English assessment of four types of multi-modal tasks (Zero-shot Caption/VQA/DocVQA/Grounding), it has achieved SOTA .
As soon as the open source news came out, it attracted a lot of attention.
Let's take a look at the specific performance~
Let’s take a look at the characteristics of the Qwen-VL series models as a whole:
In terms of scenarios, Qwen-VL can be used in scenarios such as knowledge question answering, image question answering, document question answering, and fine-grained visual positioning.
For example, if a foreign friend who cannot understand Chinese goes to the hospital to see a doctor, facing the guide map with one head and two big ones, and does not know how to get to the corresponding department, he can directly throw the map and questions to Qwen-VL, and let it follow the Image information acts as a translator.
Let's test the multi-image input and comparison:
Although he didn't recognize Arnia, his emotional judgment was indeed quite accurate (manual dog head).
In terms of visual positioning ability, even if the picture is very complicated and there are many characters, Qwen-VL can accurately find Hulk and Spiderman according to the requirements.
In terms of technical details, Qwen-VL uses Qwen-7B as the base language model, introduces a visual encoder ViT into the model architecture, and connects the two through a position-aware visual language adapter, so that the model supports visual signal input.
The specific training process is divided into three steps:
The researchers tested Qwen-VL on standard English assessments in four categories of multimodal tasks (Zero-shot Caption/VQA/DocVQA/Grounding).
The results show that Qwen-VL achieves the best results of open source LVLM of the same size.
In addition, the researchers built a test set TouchStone based on the GPT-4 scoring mechanism.
In this comparison test, Qwen-VL-Chat achieved SOTA.
If you are interested in Qwen-VL, there are demos on Modak Community and huggingface that you can try directly, and the link is at the end of the article~
Qwen-VL supports researchers and developers to carry out secondary development, and also allows commercial use, but it should be noted that for commercial use, you need to fill in the questionnaire application first.
Project link:
-Chat
Paper address:
According to news from IT House on August 26, Midjourney, a well-known AI painting tool, recently launched a new feature "Inpainting", which can be experienced by users who subscribe to Midjourney at a price of $10 per month. It is said that this function enables users to modify parts or details of the generated pictures without recreating a new work. Foreign media PCMag reported that, for example, users can first use this tool to generate a picture of "a fish jumping out of the water", and then through the "Vary (Region)" button, they can enter a new prompt, such as the previously generated fish Replaced with a shark.
Midjourney says the feature works best when the selected area occupies 20%-50% of the image size. In addition, if the local details of the changes are more closely matched to the picture, the effect will be better.
Source: Xinzhiyuan
EDIT: Aeneas is so sleepy
[Introduction] Some media broke the news that as early as last year, the Japanese government began to use AI tools to detect remarks related to the discharge of Fukushima nuclear sewage, and responded within a few hours.
In the past few days, the news that Japan has officially started to discharge nuclear-contaminated water into the Pacific Ocean has attracted widespread attention.
Just before the discharge, some media reported that the Japanese government had been using AI tools since last year to monitor any remarks related to the Fukushima nuclear power plant's plan to discharge nuclear sewage.
In June of this year, the AI discovered a South Korean media report claiming that senior officials of the Japanese Ministry of Foreign Affairs had made huge political donations to the International Atomic Energy Agency (IAEA).
Within hours, the Japanese government responded, dismissing the report as "groundless" in both English and Japanese.
According to previous reports by Nikkei Asia, the Ministry of Foreign Affairs of Japan will launch a brand new AI system in 2023 to collect and analyze information on social media and other platforms, as well as track the impact of public opinion in the medium and long term.
It is worth noting that this framework includes not only information intended for Japanese audiences, but also information intended for Japan in other countries and regions.
In March 2011, an earthquake and tsunami knocked out the cooling system at the Fukushima Daiichi nuclear power plant, causing nuclear fuel in three reactors to melt down and leak radioactive material. The ensuing massive pollution forced tens of thousands of people to evacuate.
More than 1.3 million cubic meters of seawater has since been used to cool the reactor core, which overheated after the explosion.
This contaminated water is also collected and stored in more than 1,000 stainless steel tanks on the site.
Among the 64 radioactive elements that caused pollution, the radioactive elements that mainly pose a threat to human health are: carbon-14, iodine-131, cesium-137, strontium-90, cobalt-60 and tritium-3.
In order to treat these nuclear sewage, Tokyo Electric Power Company (TEPCO) adopted a self-developed advanced liquid treatment system (ALPS), the process is divided into five stages of coprecipitation, adsorption and physical filtration.
However, such large quantities of water also make sustainable storage increasingly difficult.
In April 2021, the Japanese government officially approved the discharge of these treated nuclear sewage into the sea.
Despite concerns expressed by various countries and international organizations, this has not stopped Japan from advancing the plan.
At the same time, the Japanese Ministry of Foreign Affairs has also begun to use AI to monitor online reports about radioactive substances contained in nuclear sewage, and to dilute the concentration of such information by producing a large number of promotional materials.
On July 21, the Ministry of Foreign Affairs of Japan released an animated video on Twitter, explaining the safety protection in the process of nuclear sewage treatment in Japanese, English, French, Spanish, Russian, Arabic, Chinese and Korean. measure.
The video explains how the plant's water is purified to regulatory standards through the Advanced Liquid Treatment System (ALPS). And emphasized that before being released into wider ocean areas, the discharged nuclear sewage has been diluted 100 times by seawater.
In fact, this technology of monitoring Internet public opinion has already been deeply and extensively explored in the field of AI.
One of the most popular is the use of a combination of algorithms, machine learning models, and humans to deal with "fake news" published in social media.
A 2018 Twitter study showed that fake news stories are 70% more likely to be retweeted by humans than real news.
Meanwhile, real news takes about 6 times longer to reach a group of 1,500 people, and most of the time it rarely reaches more than 1,000 people. By contrast, popular fake news can reach as many as 100,000 people.
To this end, Meta has launched a brand new AI tool Sphere in 2022 to ensure the accuracy of information.
Sphere is the first AI model capable of scanning hundreds of thousands of citations at once to check whether they support the corresponding claims.
Sphere's dataset includes 134 million public web pages. It relies on the internet's collective knowledge to quickly scan hundreds of thousands of web citations for factual errors.
Meta said Sphere has scanned all pages on Wikipedia to see if it can identify sources of citations that do not support the claims made in the pages.
When Sphere finds suspicious sources, it can recommend stronger sources or corrections to help improve the accuracy of an entry.
Previously, many AI systems were able to identify information that lacked citation sources, but researchers at Meta said that picking out dubious claims and determining whether citation sources actually support them requires "deep understanding and analysis by AI systems." .
The development of Sphere marks Meta's efforts to address misinformation on the platform.
Meta has faced harsh criticism from users and regulators for several years for misinformation spread on Facebook, Instagram and WhatsApp. CEO Xiao Zha was even called before Congress to discuss the issue.
In Europe, there is also the Fandango project, which is building software tools to help journalists and fact-checkers detect fake news.
Whether it’s PS or DeepFake, Fandango’s system can reverse engineer the changes, using algorithms to help journalists spot doctored content.
In addition, the system looks for web pages or social media posts with similar words and opinions based on fake news that has been flagged by fact-checkers.
Behind this system is the support of various AI algorithms, especially natural language processing.
Bronstein, a professor at the University of Lugano in Switzerland and Imperial College London in the United Kingdom, took an atypical AI approach to detecting fake news.
The project, called GoodNews, upends traditional fake news AI detection tools.
In the past, these tools have analyzed the unique semantic characteristics of fake news, but they have often encountered obstacles, such as WhatsApp, which is encrypted and does not allow access.
In addition, many times fake news may be images, which are difficult to analyze using natural language processing techniques.
So Professor Bronstein's team turned the traditional model on its head to study how fake news spreads.
The results suggest that fake news can get far more shares than likes on Facebook, while regular posts tend to get more likes than shares. By spotting such patterns, GoodNews attaches credibility scores to news items.
The team's first model, using graph-based machine learning, was trained on data from Twitter, some of which were proven false by journalists.
From this, they trained the AI algorithm, teaching the model which stories were false and which were not.
### Multi-modal DeepFake detection, so AIGC has nowhere to hide
In addition to pure text, the rapid development of visual generation models such as Stable Diffusion has also made the DeepFake problem more and more serious.
In multimodal media tampering, the faces of important people in pictures of various news reports (the face of the French president in the picture below) are replaced, and key phrases or words in the text are tampered with (the positive phrase "is welcome to ” was altered to the negative phrase “is forced to resign”).
In order to meet the new challenges, the researchers proposed a multi-modal hierarchical tampering inference model, which can detect the cross-modal semantic inconsistency of tampered samples by fusing and inferring semantic features between modalities.
Currently, this work has been accepted by CVPR 2023.
Specifically, the author proposes a multimodal hierarchical tampering reasoning model HierArchical Multi-modal Manipulation rEasoning tRansformer (HAMMER).
This model is based on the model architecture of multi-modal semantic fusion and reasoning based on the double-tower structure, and realizes the detection and location of multi-modal tampering in a fine-grained and hierarchical manner through shallow and deep tampering reasoning.
The HAMMER model has the following two characteristics:
In shallow tampering reasoning, Manipulation-Aware Contrastive Learning is used to align the semantic features of image and text unimodality extracted by image encoder and text encoder. At the same time, the single-modal embedding feature is used for information interaction through the cross-attention mechanism, and the Local Patch Attentional Aggregation mechanism (Local Patch Attentional Aggregation) is designed to locate the image tampering area;
In deep tampering reasoning, multimodal semantic features are further fused using the modality-aware cross-attention mechanism in the multimodal aggregator. On this basis, special multi-modal sequence tagging and multi-modal multi-label classification are performed to locate text tampering words and detect finer-grained tampering types.
Experimental results show that the HAMMER proposed by the research team can detect and locate multimodal media tampering more accurately than multimodal and single-modal detection methods.
Judging from the visualization results of multi-modal tamper detection and localization, HAMMER can accurately perform tamper detection and localization tasks simultaneously.
In addition, the model attention visualization results on tampered words further demonstrate that HAMMER performs multimodal tampering detection and localization by focusing on image regions that are semantically inconsistent with the tampered text.
References:
Me yesterday: Open source LLM will beat GPT-4 in a few months for code generation. Me now: Today, actually.
Yesterday, Meta open source code Llama, a basic model specializing in code generation, is free for research and commercial purposes.
There are three parameter versions of the Code Llama series of models, the number of parameters is 7B, 13B and 34B. And it supports multiple programming languages, including Python, C++, Java, PHP, Type (Java), C#, and Bash.
Code Llama versions provided by Meta include:
In terms of its effect, different versions of Code Llama have a generation pass rate (pass@1) on Human and MBPP datasets that surpasses GPT-3.5.
In addition, the pass@1 of Code Llama's "Unnatural" 34B version on the Human dataset is close to GPT-4 (62.2% vs 67.0%). However, Meta did not release this version, but achieved significant performance improvements through training with a small amount of high-quality encoded data.
Source:
Just after a day, some researchers challenged GPT-4. They come from Phind (an organization that aims to build an AI search engine for developers), which beat GPT-4** in human evaluation with **fine-tuned Code Llama-34B.
Phind co-founder Michael Royzen said: "This is just an early experiment aimed at reproducing (and surpassing) the "Unnatural Code Llama" results in the Meta paper. In the future, we will have an expert portfolio of different CodeLlama models that I think will be competitive in real-world workflows. "
Both models have been open-sourced:
The researchers published these two models on Huggingface, and everyone can go to check them out.
Next, let's see how this research was implemented.
Let's look at the results first. This study fine-tuned Code Llama-34B and Code Llama-34B-Python with Phind's internal dataset, and obtained two models, Phind-CodeLlama-34B-v1 and Phind-CodeLlama-34B-Python-v1, respectively.
The two newly obtained models achieved 67.6% and 69.5% pass@1 respectively on Human.
For comparison, CodeLlama-34B pass@1 is 48.8%; CodeLlama-34B-Python pass@1 is 53.7%.
And GPT-4 pass@1 on Human is 67% (data released by OpenAI in the "GPT-4 Technical Report" released in March this year).
Source:
Source:
When it comes to fine-tuning, datasets are a must, and this study fine-tuned Code Llama-34B and Code Llama-34B-Python on a proprietary dataset containing ~80,000 high-quality programming problems and solutions.
Instead of code completion examples, this dataset uses instruction-answer pairs, which is different from the Human data structure. The study then trained the Phind model for two epochs, with a total of about 160,000 examples. The researchers said that LoRA technology was not used in the training, but local fine-tuning was used.
In addition, the research also adopted DeepSpeed ZeRO 3 and Flash Attention 2 technologies. They spent three hours on 32 A100-80GB GPUs to train these models with a sequence length of 4096 tokens.
In addition, the study applied OpenAI's decontamination method to the dataset to make the model results more effective.
As we all know, even the very powerful GPT-4 will face the dilemma of data pollution. In layman's terms, the trained model may have been trained on the evaluation data.
This problem is very tricky for LLM. For example, in the process of evaluating the performance of a model, in order to make a scientifically credible evaluation, the researcher must check whether the problem used for evaluation is in the training data of the model. If so, the model can remember these questions, and when evaluating the model, it will obviously perform better on these specific questions.
It's like a person already knows the exam questions before taking the exam.
In order to solve this problem, OpenAI disclosed how GPT-4 evaluates data pollution in the public GPT-4 technical document "GPT-4 Technical Report". they made it public
Strategies for quantifying and evaluating this data pollution.
Specifically, OpenAI uses substring matching to measure cross-contamination between the evaluation dataset and the pre-training data. Both evaluation and training data are processed by removing all spaces and symbols, leaving only characters (including numbers).
For each evaluation example, OpenAI randomly selects three 50-character substrings (or uses the entire example if there are fewer than 50 characters). A match is determined if any of the three sampled evaluation substrings is a substring of the processed training example.
This produces a list of tainted examples, which OpenAI discards and reruns to obtain an untainted score. But this filtering method has some limitations, substring matching can lead to false negatives (if there are small differences between evaluation and training data) as well as false positives. Thus, OpenAI uses only part of the information in the evaluation examples, only the question, context or equivalent data, and ignores the answers, responses or equivalent data. In some cases, multiple choice options were also excluded. These exclusions may lead to increased false positives.
For this part, interested readers can refer to the paper for more information.
Paper address:
However, there is some controversy over the Human score Phind used when benchmarking GPT-4. Some people say that the latest test score of GPT-4 has reached 85%. But Phind replied that the relevant research that derived this score did not conduct pollution research, and it was impossible to determine whether GPT-4 had seen Human's test data when undergoing a new round of testing. Considering some recent research on "GPT-4 becoming stupid", it is safer to use the data in the original technical report.
However, considering the complexity of large-scale model evaluation, whether these evaluation results can reflect the true capabilities of the model is still a controversial issue. You can download the model and experience it yourself.
Reference link:
According to a report by Golden Ten Data on August 26, Robin Li, founder, chairman and CEO of Baidu, revealed that Baidu is developing Wenxin Model 4.0 and plans to launch it by the end of this year. During this transition period from Wenxin Model 3.5 to 4.0, Baidu is working hard to build Wenxin Model-driven applications and solutions for different industries and scenarios. Li Yanhong also mentioned that the basic model and the vertical model of the industry are not competing products, and the vertical model of the industry should be built on the most powerful basic model: "The basic model iterates rapidly, but the vertical model of the industry is difficult to keep up with the innovation. pace."